
When a person loses the ability to speak, it is a drastic cut in their ability to communicate. However, the combination of artificial intelligence with high-resolution derivations of brain waves could return such speech to such patients in the future. Now researchers have developed such a brain-to-text system that reads brain signals with astonishing precision and translates them into words. With an average word error rate of three percent, this technology based on coupled neural networks is significantly more accurate than previous systems of this type.
Language is essential for our communication – it is all the more devastating when people lose their ability to speak due to injury or illness. However, modern technology and, above all, direct interfaces between the brain and the computer create new opportunities to read out and interpret brain signals. This can also be used for so-called brain-to-text systems. Because when we hear or speak words, this creates characteristic activity patterns in the brain. Computer systems capable of learning can recognize these patterns and thus assign signals and words. In fact, scientists have already succeeded in recognizing syllables and words spoken with such systems based solely on the accompanying brain signals and in some cases converting them into intelligible spoken language. However, the vocabulary of such attempts has so far mostly been limited to less than 100 words and the error rate in recognition was still relatively high at around 25 percent.
Two linked networks as “translators”
Now Joseph Makin from the University of California at San Francisco and his colleagues have developed a system that achieves significantly higher accuracy – and this with relatively little training. Four subjects took part in their experiment and had a network of electrodes implanted in their cerebral cortex. Originally, these electrodes were used to locate the foci of their epileptic seizures, but they also offered Makin and his team the opportunity to derive high-resolution speech-related brain signals. The experiment started with the participants reading aloud simple English sentences that were shown to them on a monitor. “The sentences were on average nine words long and resulted in a vocabulary of 250 different words,” the researchers report. At the same time, they recorded the brain signals that resulted.
Makin and his team then used this combination of brain signals and the associated acoustic voice recordings to train a system made up of two adaptive neural networks. The first network, the so-called encoder, serves as a kind of filter that searches the recorded brain signals for recurring patterns – patterns that could be related to the spoken words. Through repeated comparison with the voice recordings, this system improved its accuracy during the training. The second system, the decoder, uses this data from its predecessor to generate words again from the cleaned signals. “This neural network is trained to either output a suitable word at every step or the stop signal for the end of a sentence,” explains Makin and his colleagues.
Word error rate below five percent
The experiments showed that the coupled AI systems achieved a relatively high level of precision after just a few training sessions. “Even if at least 15 repetitions of a sentence were available for training, the word error rates could be reduced to below 25 percent – this is the upper limit for an acceptable speech transcription,” the researchers report. If the subjects had repeated the individual sentences more than 15 times, this increased the accuracy significantly: the systems achieved an average word error rate of only three percent. “Error rates of five percent are already considered a professional level,” says Makin and his team. In a supplementary test, they found that the training success of the AI systems could even be transferred from one subject to another. If the encoder network was trained on one patient, it was then much easier for him to recognize the characteristic brain signals of a second patient – the training then took a correspondingly shorter time. According to the researchers, the system could therefore be optimized to such an extent that it is trained on a type of generalized language model before it is used on a patient.
According to Makin and his team, such AI-based decoder systems could in future contribute to rendering patients’ speech ability by a computer translating their brain signals directly into speech. As the researchers emphasize, their experiment still includes a significantly reduced vocabulary of only around 250 words. However, voice and brainwave recordings of only 30 minutes in length were enough for the AI systems. “Our results indicate that increasing the amount of data beyond these 30 minutes would allow the vocabulary to be expanded and the sentence structures to be more flexible,” say the researchers. “In addition, a few hundred words could be very helpful for a patient who otherwise cannot speak.”
Source: Joseph Makin (University of California, San Francisco) et al., Nature Neuroscience, doi: 10.1038 / s41593-020-0608-8