Scientists have developed a system that can display the movements of the tongue in real time and could be used for speech therapy in people who have articulation disorder. Captured using an ultrasound probe placed under the jaw, movements of the tongue are processed by a machine-learning algorithm that controls an "articulatory talking head." This avatar, developed by researchers from the GIPSA-Lab and INRIA Grenoble Rhone-Alpes in France, shows the tongue, palate and teeth, which are usually hidden inside the vocal tract.
This "visual biofeedback" system, which produces better correction of pronunciation, could be used for speech therapy and for learning foreign languages. For a person with an articulation disorder, speech therapy partly uses repetition exercises: the practitioner qualitatively analyses the patient's pronunciations and orally explains, using drawings, how to place articulators, particularly the tongue.
How effective therapy is depends on how well the patient can integrate what they are told. It is at this stage that "visual biofeedback" systems can help. These systems let patients see their articulatory movements in real time, and in particular how their tongues move, so that they are aware of these movements and can correct pronunciation problems faster. The image of the tongue is obtained by placing under the jaw a probe similar to that used conventionally to look at a heart or foetus.
This visual feedback automatically animates an articulatory talking head in real time from ultrasound images. This virtual clone of a real speaker, in development for many years at the GIPSA-Lab, produces a contextualised - and therefore more natural - visualisation of articulatory movements. The strength of this new system lies in a machine learning algorithm that researchers have been working on for several years. This algorithm can process articulatory movements that users cannot achieve when they start to use the system. This property is indispensable for the targeted therapeutic applications.
The algorithm exploits a probabilistic model based on a large articulatory database acquired from an "expert" speaker capable of pronouncing all of the sounds in one or more languages. This model is automatically adapted to the morphology of each new user, over the course of a short system calibration phase, during which the patient must pronounce a few phrases.
This system, validated in a laboratory for healthy speakers, is now being tested in a simplified version in a clinical trial for patients who have had tongue surgery. The researchers are also developing another version of the system, where the articulatory talking head is automatically animated, not by ultrasounds, but directly by the user's voice.