A research team has designed a system that can exhibit the tongue movements in real time and can be utilized for speech therapy in individuals who suffer from articulation disorder. Recorded using an ultrasound probe situated beneath the jaw, the tongue movements are processed by the machine learning algorithm, which directs an “articulatory talking head.”
This system, developed by the research team from INRIA Grenoble Rhone-Alpes and the GIPSA Lab, demonstrates the teeth, tongue, and palate, which are normally concealed within the vocal tract. This system of “visual biofeedback,” which creates better rectification of pronunciation, can be utilized for speech therapy as well as for studying foreign languages.
Speech therapy, for an individual with an articulation disorder, in part utilizes repetition exercises—the professional qualitatively scrutinizes the pronunciations of the patient and orally elucidates, using illustrations, how to set articulators, predominantly, the tongue. How useful the therapy relies on how well the patient can incorporate what they are told. And this is the phase where “visual biofeedback” systems can assist.
These systems enable the patients to perceive their articulatory movements in instantaneously and, specifically, how their tongues move about, so that they are conscious of these movements and can rectify pronunciation issues rapidly. The picture of the tongue is acquired by introducing beneath the jaw a probe analogous to that utilized conventionally to observe a fetus or heart.
The articulatory talking head is animated automatically by the visual feedback instantaneously from ultrasound pictures. This practical replica of a real speaker, in progress for several years at the GIPSA-Lab, creates a contextualized—and thereby more natural—revelation of articulatory movements. The potency of this new method lies in a machine learning algorithm that the research team has been functioning on for numerous years.
Articulatory movements can be processed by this algorithm that users can’t attain when they begin to employ the system. This attribute is vital for the targeted therapeutic purposes. The algorithm uses a probabilistic model founded on a large articulatory database obtained from an “expert” speaker skilled of enunciating all of the sounds in single or more languages.
This prototype is automatically adjusted to the morphology of every new user, within a short phase of system calibration, during which the individual must articulate a few phrases. At present, the system—already tested in a lab for healthy speakers—is now being validated in a clinical trial in a simplified version for individuals who have had tongue surgery.
The team is also building another version of this system, wherein the articulatory talking head is animated automatically, not by ultrasounds, but by the voice of the user directly.