AI gives paralysed women a voice
A new Brain Computer Interface (BCI) has granted the power of speech to a patient with severe paralysis following a brainstem stroke.
This development marks the first instance of AI technology successfully synthesising both speech and facial expressions.
Decoding neural signals
Pioneered by a collaborative team of researchers from UC San Francisco and UC Berkeley, the system has the ability to decode neural signals and then translate them into text at a rate of nearly 80 words per minute.
The ultimate objective of this endeavour is to facilitate natural and comprehensive communication by decoding these cerebral signals and translating them into the nuanced art of speech, complemented by the range of facial movements that are intrinsic to conversations.
The approach adopted by the researchers involved the implantation of a wafer-thin array of over 200 electrodes over crucial areas of the patient's brain responsible for speech. These electrodes intercepted the neural signals that would otherwise have controlled the patient's muscles, influencing vital components such as her tongue, jaw, larynx, and even her facial muscles, which would have otherwise played a role in her speech, were it not for her debilitating stroke.
To realise this feat, the patient was equipped with a cable that was connected to a port fixed on her head, linking her to a powerful bank of computers. By working closely with the research team, in a matter of weeks the system’s AI algorithms were trained to discern the unique brain signals associated with the woman’s speech.
Training AI
This training regimen involved the patient repeatedly vocalising various phrases from a conversational lexicon of more than 1,000 words. Over time, the computer became adept at recognising her brain activity patterns which corresponded to these vocal sounds.
The researchers implemented a system that decoded words from phonemes, the elemental building blocks of speech, akin to how letters construct written words. This approach significantly boosted efficiency, as the computer only needed to learn 39 phonemes to decipher any English word, rendering the system three times faster than conventional methods and remarkably accurate.
Creating a voice using AI
To infuse a personal touch into the synthesised voice, an algorithm was crafted to emulate the patient's original speech patterns. This was achieved by employing a pre-stroke recording of the woman's voice, taken from her wedding video, which served as a template for the computer to replicate her distinctive rhythms and cadences.
In addition to voice synthesis, the research team collaborated with Speech Graphics, a company specialising in AI-driven facial animation, to create an avatar. Through tailored machine learning processes, the company merged the signals transmitted from the patient's brain as she attempted to speak and translated them into facial movements on the avatar.
Looking ahead, the researchers are poised to develop a wireless iteration of this technology, negating the need for a physical BCI connection, thereby enhancing the accessibility and autonomy of this pioneering system.