AI learns from a child’s impressions

AI learns from a child’s impressions

Toddler with head camera. Their recordings formed the basis for language learning by an AI system. © Wai Keen Vong

When artificial intelligences learn languages, they typically have trillions of examples available from the Internet. In young children, on the other hand, language acquisition is based solely on what they pick up in their environment. Researchers have now trained an AI using video recordings from the perspective of a single toddler. In fact, this is how the AI ​​learned to associate words with objects and generalize concepts. The findings provide insights into human language acquisition from a new perspective.

Children begin to learn their first words around the age of six to nine months. In doing so, they connect what they see and experience with corresponding names that they hear from the people around them. But is that really enough to learn a language from scratch? Or do we humans perhaps already have innate knowledge that helps us better grasp the concepts of language?

Toddler as a research assistant

To get to the bottom of this question, a team led by Wai Keen Vong from New York University trained an AI that only received as input what a toddler sees and hears. To do this, the researchers equipped a six-month-old toddler with a lightweight, head-mounted video camera. Until shortly after his second birthday, the child regularly carried the camera during everyday activities, such as on the playground, while eating or while looking at picture books with his parents.

Around 61 hours of video material were collected in this way. “Although these recordings only account for about one percent of the child’s waking hours, they still provide a detailed insight into the child’s experiences from their own perspective,” the researchers write. They used this data to feed an artificial neural network. They divided the video into individual images and added a transcription of what was said during each. “This gives the model a clue as to which words should be associated with which objects,” explains Vong. “The combination of these clues makes it possible to gradually determine which words belong to which images.”

Linking words and images

But would this information be enough for the algorithm to learn what certain words mean, just like the toddler? The researchers tested this by giving the AI ​​tasks that are also used with children to assess their linguistic skills. For example, they presented four pictures and asked which one showed a ball. And indeed: Given numerous words from the toddler's world of experience, the AI ​​accurately selected the right image.

“Our study shows for the first time that a neural network trained on the developmentally realistic input of a single child can learn to associate words with their visual equivalents,” says Vong. Similar to small children, the AI ​​was also able to generalize concrete examples. For example, she recognized a picture of a real butterfly, even though she had previously only seen butterflies as drawings in a children's book.

Basics of language acquisition

From the researchers' perspective, the results can contribute to a better understanding of children's language acquisition. “By using AI models to study real-world language learning in children, we are contributing to the classic debate about what ingredients children need to learn words – for example, whether innate knowledge is required or whether associative learning is sufficient,” explains Vong's colleague Brenden Lake. “Our results suggest that associative learning can achieve more than expected.”

The researchers caution that the AI ​​has only learned the names of concrete objects, but not other dimensions of language, such as connections to beliefs and intentions. However, she also missed many of the experiences that a toddler naturally has - from touching objects to her own emotional life. “But even with these limitations, the model shows how in-depth word learning is possible from snippets of an individual child’s experience,” write Vong and his team.

Source: Wai Keen Vong (New York University) et al., Science, doi: 10.1126/science.adi1374

Recent Articles

Related Stories