Search CORE

14 research outputs found

A Silent-Speech Interface using Electro-Optical Stomatography

Author: Stone Simon
Publication venue: Thelem Universitätsverlag & Buchhandlung GmbH & Co. KG
Publication date: 21/06/2022
Field of study

Sprachtechnologie ist eine große und wachsende Industrie, die das Leben von technologieinteressierten Nutzern auf zahlreichen Wegen bereichert. Viele potenzielle Nutzer werden jedoch ausgeschlossen: Nämlich alle Sprecher, die nur schwer oder sogar gar nicht Sprache produzieren können. Silent-Speech Interfaces bieten einen Weg, mit Maschinen durch ein bequemes sprachgesteuertes Interface zu kommunizieren ohne dafür akustische Sprache zu benötigen. Sie können außerdem prinzipiell eine Ersatzstimme stellen, indem sie die intendierten Äußerungen, die der Nutzer nur still artikuliert, künstlich synthetisieren. Diese Dissertation stellt ein neues Silent-Speech Interface vor, das auf einem neu entwickelten Messsystem namens Elektro-Optischer Stomatografie und einem neuartigen parametrischen Vokaltraktmodell basiert, das die Echtzeitsynthese von Sprache basierend auf den gemessenen Daten ermöglicht. Mit der Hardware wurden Studien zur Einzelworterkennung durchgeführt, die den Stand der Technik in der intra- und inter-individuellen Genauigkeit erreichten und übertrafen. Darüber hinaus wurde eine Studie abgeschlossen, in der die Hardware zur Steuerung des Vokaltraktmodells in einer direkten Artikulation-zu-Sprache-Synthese verwendet wurde. Während die Verständlichkeit der Synthese von Vokalen sehr hoch eingeschätzt wurde, ist die Verständlichkeit von Konsonanten und kontinuierlicher Sprache sehr schlecht. Vielversprechende Möglichkeiten zur Verbesserung des Systems werden im Ausblick diskutiert.:Statement of authorship iii Abstract v List of Figures vii List of Tables xi Acronyms xiii 1. Introduction 1 1.1. The concept of a Silent-Speech Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2. Structure of this work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Fundamentals of phonetics 7 2.1. Components of the human speech production system . . . . . . . . . . . . . . . . . . . 7 2.2. Vowel sounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3. Consonantal sounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.4. Acoustic properties of speech sounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.5. Coarticulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.6. Phonotactics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.7. Summary and implications for the design of a Silent-Speech Interface (SSI) . . . . . . . 21 3. Articulatory data acquisition techniques in Silent-Speech Interfaces 25 3.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.2. Scope of the literature review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.3. Video Recordings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.4. Ultrasonography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.5. Electromyography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.6. Permanent-Magnetic Articulography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.7. Electromagnetic Articulography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.8. Radio waves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.9. Palatography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.10.Conclusion and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4. Electro-Optical Stomatography 55 4.1. Contact sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.2. Optical distance sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.3. Lip sensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.4. Sensor Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.5. Control Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 4.6. Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 5. Articulation-to-Text 99 5.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 5.2. Command word recognition pilot study . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 5.3. Command word recognition small-scale study . . . . . . . . . . . . . . . . . . . . . . . . 102 6. Articulation-to-Speech 109 6.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 6.2. Articulatory synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 6.3. The six point vocal tract model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 6.4. Objective evaluation of the vocal tract model . . . . . . . . . . . . . . . . . . . . . . . . 116 6.5. Perceptual evaluation of the vocal tract model . . . . . . . . . . . . . . . . . . . . . . . . 120 6.6. Direct synthesis using EOS to control the vocal tract model . . . . . . . . . . . . . . . . 125 6.7. Pitch and voicing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 7. Summary and outlook 145 7.1. Summary of the contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 7.2. Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 A. Overview of the International Phonetic Alphabet 151 B. Mathematical proofs and derivations 153 B.1. Combinatoric calculations illustrating the reduction of possible syllables using phonotactics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 B.2. Signal Averaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 B.3. Effect of the contact sensor area on the conductance . . . . . . . . . . . . . . . . . . . . 155 B.4. Calculation of the forward current for the OP280V diode . . . . . . . . . . . . . . . . . . 155 C. Schematics and layouts 157 C.1. Schematics of the control unit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 C.2. Layout of the control unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 C.3. Bill of materials of the control unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 C.4. Schematics of the sensor unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 C.5. Layout of the sensor unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 C.6. Bill of materials of the sensor unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 D. Sensor unit assembly 169 E. Firmware flow and data protocol 177 F. Palate file format 181 G. Supplemental material regarding the vocal tract model 183 H. Articulation-to-Speech: Optimal hyperparameters 189 Bibliography 191Speech technology is a major and growing industry that enriches the lives of technologically-minded people in a number of ways. Many potential users are, however, excluded: Namely, all speakers who cannot easily or even at all produce speech. Silent-Speech Interfaces offer a way to communicate with a machine by a convenient speech recognition interface without the need for acoustic speech. They also can potentially provide a full replacement voice by synthesizing the intended utterances that are only silently articulated by the user. To that end, the speech movements need to be captured and mapped to either text or acoustic speech. This dissertation proposes a new Silent-Speech Interface based on a newly developed measurement technology called Electro-Optical Stomatography and a novel parametric vocal tract model to facilitate real-time speech synthesis based on the measured data. The hardware was used to conduct command word recognition studies reaching state-of-the-art intra- and inter-individual performance. Furthermore, a study on using the hardware to control the vocal tract model in a direct articulation-to-speech synthesis loop was also completed. While the intelligibility of synthesized vowels was high, the intelligibility of consonants and connected speech was quite poor. Promising ways to improve the system are discussed in the outlook.:Statement of authorship iii Abstract v List of Figures vii List of Tables xi Acronyms xiii 1. Introduction 1 1.1. The concept of a Silent-Speech Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2. Structure of this work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Fundamentals of phonetics 7 2.1. Components of the human speech production system . . . . . . . . . . . . . . . . . . . 7 2.2. Vowel sounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3. Consonantal sounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.4. Acoustic properties of speech sounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.5. Coarticulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.6. Phonotactics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.7. Summary and implications for the design of a Silent-Speech Interface (SSI) . . . . . . . 21 3. Articulatory data acquisition techniques in Silent-Speech Interfaces 25 3.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.2. Scope of the literature review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.3. Video Recordings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.4. Ultrasonography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.5. Electromyography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.6. Permanent-Magnetic Articulography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.7. Electromagnetic Articulography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.8. Radio waves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.9. Palatography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.10.Conclusion and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4. Electro-Optical Stomatography 55 4.1. Contact sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.2. Optical distance sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.3. Lip sensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.4. Sensor Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.5. Control Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 4.6. Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 5. Articulation-to-Text 99 5.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 5.2. Command word recognition pilot study . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 5.3. Command word recognition small-scale study . . . . . . . . . . . . . . . . . . . . . . . . 102 6. Articulation-to-Speech 109 6.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 6.2. Articulatory synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 6.3. The six point vocal tract model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 6.4. Objective evaluation of the vocal tract model . . . . . . . . . . . . . . . . . . . . . . . . 116 6.5. Perceptual evaluation of the vocal tract model . . . . . . . . . . . . . . . . . . . . . . . . 120 6.6. Direct synthesis using EOS to control the vocal tract model . . . . . . . . . . . . . . . . 125 6.7. Pitch and voicing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 7. Summary and outlook 145 7.1. Summary of the contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 7.2. Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 A. Overview of the International Phonetic Alphabet 151 B. Mathematical proofs and derivations 153 B.1. Combinatoric calculations illustrating the reduction of possible syllables using phonotactics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 B.2. Signal Averaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 B.3. Effect of the contact sensor area on the conductance . . . . . . . . . . . . . . . . . . . . 155 B.4. Calculation of the forward current for the OP280V diode . . . . . . . . . . . . . . . . . . 155 C. Schematics and layouts 157 C.1. Schematics of the control unit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 C.2. Layout of the control unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 C.3. Bill of materials of the control unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 C.4. Schematics of the sensor unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 C.5. Layout of the sensor unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 C.6. Bill of materials of the sensor unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 D. Sensor unit assembly 169 E. Firmware flow and data protocol 177 F. Palate file format 181 G. Supplemental material regarding the vocal tract model 183 H. Articulation-to-Speech: Optimal hyperparameters 189 Bibliography 19

Technische Universität Dresden: Qucosa

Neural Speaker Embeddings for Ultrasound-based Silent Speech Interfaces

Author: Csapó Tamás Gábor
Gosztolya Gábor
Markó Alexandra
Shandiz Amin Honarmandi
Tóth László
Publication venue
Publication date: 01/01/2021
Field of study

Articulatory-to-acoustic mapping seeks to reconstruct speech from a recording of the articulatory movements, for example, an ultrasound video. Just like speech signals, these recordings represent not only the linguistic content, but are also highly specific to the actual speaker. Hence, due to the lack of multi-speaker data sets, researchers have so far concentrated on speaker-dependent modeling. Here, we present multi-speaker experiments using the recently published TaL80 corpus. To model speaker characteristics, we adjusted the x-vector framework popular in speech processing to operate with ultrasound tongue videos. Next, we performed speaker recognition experiments using 50 speakers from the corpus. Then, we created speaker embedding vectors and evaluated them on the remaining speakers. Finally, we examined how the embedding vector influences the accuracy of our ultrasound-to-speech conversion network in a multi-speaker scenario. In the experiments we attained speaker recognition error rates below 3%, and we also found that the embedding vectors generalize nicely to unseen speakers. Our first attempt to apply them in a multi-speaker silent speech framework brought about a marginal reduction in the error rate of the spectral estimation step.Comment: 5 pages, 3 figures, 3 table

arXiv.org e-Print Archive

Repository of the Academy's Library

Adaptation of Tacotron2-based Text-To-Speech for Articulatory-to-Acoustic Mapping using Ultrasound Tongue Imaging

Author: Csapó Tamás Gábor
Gosztolya Gábor
Honarmandi Shandiz Amin
Markó Alexandra
Németh Géza
Tóth László
Zainkó Csaba
Publication venue: 'International Speech Communication Association'
Publication date: 01/01/2021
Field of study

SZTE Publicatio Repozitórium - SZTE - Repository of Publications

Adaptation of Tacotron2-based Text-To-Speech for Articulatory-to-Acoustic Mapping using Ultrasound Tongue Imaging

Author: Csapó Tamás Gábor
Gosztolya Gábor
Honarmandi Shandiz Amin
Markó Alexandra
Németh Géza
Tóth László
Zainkó Csaba
Publication venue: 'International Speech Communication Association'
Publication date: 01/01/2021
Field of study

For articulatory-to-acoustic mapping, typically only limited parallel training data is available, making it impossible to apply fully end-to-end solutions like Tacotron2. In this paper, we experimented with transfer learning and adaptation of a Tacotron2 text-to-speech model to improve the final synthesis quality of ultrasound-based articulatory-to-acoustic mapping with a limited database. We use a multi-speaker pre-trained Tacotron2 TTS model and a pre-trained WaveGlow neural vocoder. The articulatory-to-acoustic conversion contains three steps: 1) from a sequence of ultrasound tongue image recordings, a 3D convolutional neural network predicts the inputs of the pre-trained Tacotron2 model, 2) the Tacotron2 model converts this intermediate representation to an 80-dimensional mel-spectrogram, and 3) the WaveGlow model is applied for final inference. This generated speech contains the timing of the original articulatory data from the ultrasound recording, but the F0 contour and the spectral information is predicted by the Tacotron2 model. The F0 values are independent of the original ultrasound images, but represent the target speaker, as they are inferred from the pre-trained Tacotron2 model. In our experiments, we demonstrated that the synthesized speech quality is more natural with the proposed solutions than with our earlier model.Comment: accepted at SSW11. arXiv admin note: text overlap with arXiv:2008.0315

arXiv.org e-Print Archive

Repository of the Academy's Library

Adaptation of Tacotron2-based Text-To-Speech for Articulatory-to-Acoustic Mapping using Ultrasound Tongue Imaging

Author: Csapó Tamás Gábor
Gosztolya Gábor
Honarmandi Shandiz Amin
Markó Alexandra
Németh Géza
Tóth László
Zainkó Csaba
Publication venue: 'International Speech Communication Association'
Publication date: 01/01/2021
Field of study

Repository of the Academy's Library

Ultrasound-Based Articulatory-to-Acoustic Mapping with WaveGlow Speech Synthesis

Author: Csapó Tamás Gábor
Gosztolya Gábor
Markó Alexandra
Tóth László
Zainkó Csaba
Publication venue: 'International Speech Communication Association'
Publication date: 01/01/2020
Field of study

For articulatory-to-acoustic mapping using deep neural networks, typically spectral and excitation parameters of vocoders have been used as the training targets. However, vocoding often results in buzzy and muffled final speech quality. Therefore, in this paper on ultrasound-based articulatory-to-acoustic conversion, we use a flow-based neural vocoder (WaveGlow) pre-trained on a large amount of English and Hungarian speech data. The inputs of the convolutional neural network are ultrasound tongue images. The training target is the 80-dimensional mel-spectrogram, which results in a finer detailed spectral representation than the previously used 25-dimensional Mel-Generalized Cepstrum. From the output of the ultrasound-to-mel-spectrogram prediction, WaveGlow inference results in synthesized speech. We compare the proposed WaveGlow-based system with a continuous vocoder which does not use strict voiced/unvoiced decision when predicting F0. The results demonstrate that during the articulatory-to-acoustic mapping experiments, the WaveGlow neural vocoder produces significantly more natural synthesized speech than the baseline system. Besides, the advantage of WaveGlow is that F0 is included in the mel-spectrogram representation, and it is not necessary to predict the excitation separately.Comment: 5 pages, accepted for publication at Interspeech 2020. arXiv admin note: substantial text overlap with arXiv:1906.0988

arXiv.org e-Print Archive

Crossref

SZTE Publicatio Repozitórium - SZTE - Repository of Publications

Repository of the Academy's Library

Silent Speech Interfaces for Speech Restoration: A Review

Author: González López José Andrés
Gómez Alanís Alejandro
Gómez Ángel M.
Martín Doñas Juan M.
Pérez Córdoba José Luis
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 24/09/2020
Field of study

This work was supported in part by the Agencia Estatal de Investigacion (AEI) under Grant PID2019-108040RB-C22/AEI/10.13039/501100011033. The work of Jose A. Gonzalez-Lopez was supported in part by the Spanish Ministry of Science, Innovation and Universities under Juan de la Cierva-Incorporation Fellowship (IJCI-2017-32926).This review summarises the status of silent speech interface (SSI) research. SSIs rely on non-acoustic biosignals generated by the human body during speech production to enable communication whenever normal verbal communication is not possible or not desirable. In this review, we focus on the first case and present latest SSI research aimed at providing new alternative and augmentative communication methods for persons with severe speech disorders. SSIs can employ a variety of biosignals to enable silent communication, such as electrophysiological recordings of neural activity, electromyographic (EMG) recordings of vocal tract movements or the direct tracking of articulator movements using imaging techniques. Depending on the disorder, some sensing techniques may be better suited than others to capture speech-related information. For instance, EMG and imaging techniques are well suited for laryngectomised patients, whose vocal tract remains almost intact but are unable to speak after the removal of the vocal folds, but fail for severely paralysed individuals. From the biosignals, SSIs decode the intended message, using automatic speech recognition or speech synthesis algorithms. Despite considerable advances in recent years, most present-day SSIs have only been validated in laboratory settings for healthy users. Thus, as discussed in this paper, a number of challenges remain to be addressed in future research before SSIs can be promoted to real-world applications. If these issues can be addressed successfully, future SSIs will improve the lives of persons with severe speech impairments by restoring their communication capabilities.Agencia Estatal de Investigacion (AEI) PID2019-108040RB-C22/AEI/10.13039/501100011033Spanish Ministry of Science, Innovation and Universities under Juan de la Cierva-Incorporation Fellowship IJCI-2017-3292

arXiv.org e-Print Archive

Repositorio Institucional Universidad de Granada

Retainer-Free Optopalatographic Device Design and Evaluation as a Feedback Tool in Post-Stroke Speech and Swallowing Therapy

Author: Wagner Christoph
Publication venue
Publication date: 21/11/2023
Field of study

Stroke is one of the leading causes of long-term motor disability, including oro-facial impairments which affect speech and swallowing. Over the last decades, rehabilitation programs have evolved from utilizing mainly compensatory measures to focusing on recovering lost function. In the continuing effort to improve recovery, the concept of biofeedback has increasingly been leveraged to enhance self-efficacy, motivation and engagement during training. Although both speech and swallowing disturbances resulting from oro-facial impairments are frequent sequelae of stroke, efforts to develop sensing technologies that provide comprehensive and quantitative feedback on articulator kinematics and kinetics, especially those of the tongue, and specifically during post-stroke speech and swallowing therapy have been sparse. To that end, such a sensing device needs to accurately capture intraoral tongue motion and contact with the hard palate, which can then be translated into an appropriate form of feedback, without affecting tongue motion itself and while still being light-weight and portable. This dissertation proposes the use of an intraoral sensing principle known as optopalatography to provide such feedback while also exploring the design of optopalatographic devices itself for use in dysphagia and dysarthria therapy. Additionally, it presents an alternative means of holding the device in place inside the oral cavity with a newly developed palatal adhesive instead of relying on dental retainers, which previously limited device usage to a single person. The evaluation was performed on the task of automatically classifying different functional tongue exercises from one another with application in dysphagia therapy, whereas a phoneme recognition task was conducted with application in dysarthria therapy. Results on the palatal adhesive suggest that it is indeed a valid alternative to dental retainers when device residence time inside the oral cavity is limited to several tens of minutes per session, which is the case for dysphagia and dysarthria therapy. Functional tongue exercises were classified with approximately 61 % accuracy across subjects, whereas for the phoneme recognition task, tense vowels had the highest recognition rate, followed by lax vowels and consonants. In summary, retainer-free optopalatography has the potential to become a viable method for providing real-time feedback on tongue movements inside the oral cavity, but still requires further improvements as outlined in the remarks on future development.:1 Introduction 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Goals and contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Scope and limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Basics of post-stroke speech and swallowing therapy 2.1 Dysarthria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Dysphagia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3 Treatment rationale and potential of biofeedback . . . . . . . . . . . . . . . . . 13 2.4 Summary and conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3 Tongue motion sensing 3.1 Contact-based methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.1.1 Electropalatography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.1.2 Manometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.1.3 Capacitive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.2 Non-contact based methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.2.1 Electromagnetic articulography . . . . . . . . . . . . . . . . . . . . . . . 23 3.2.2 Permanent magnetic articulography . . . . . . . . . . . . . . . . . . . . 24 3.2.3 Optopalatography (related work) . . . . . . . . . . . . . . . . . . . . . . 25 3.3 Electro-optical stomatography . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.4 Extraoral sensing techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.5 Summary, comparison and conclusion . . . . . . . . . . . . . . . . . . . . . . . 29 4 Fundamentals of optopalatography 4.1 Important radiometric quantities . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.1.1 Solid angle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.1.2 Radiant flux and radiant intensity . . . . . . . . . . . . . . . . . . . . . 33 4.1.3 Irradiance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.1.4 Radiance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.2 Sensing principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.2.1 Analytical models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.2.2 Monte Carlo ray tracing methods . . . . . . . . . . . . . . . . . . . . . . 37 4.2.3 Data-driven models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.2.4 Model comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.3 A priori device design consideration . . . . . . . . . . . . . . . . . . . . . . . . 41 4.3.1 Optoelectronic components . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.3.2 Additional electrical components and requirements . . . . . . . . . . . . 43 4.3.3 Intraoral sensor layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 5 Intraoral device anchorage 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.1.1 Mucoadhesion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5.1.2 Considerations for the palatal adhesive . . . . . . . . . . . . . . . . . . . 48 5.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5.2.1 Polymer selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5.2.2 Fabrication method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 5.2.3 Formulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 5.2.4 PEO tablets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 5.2.5 Connection to the intraoral sensor’s encapsulation . . . . . . . . . . . . 50 5.2.6 Formulation evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 5.3.1 Initial formulation evaluation . . . . . . . . . . . . . . . . . . . . . . . . 54 5.3.2 Final OPG adhesive formulation . . . . . . . . . . . . . . . . . . . . . . 56 5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 6 Initial device design with application in dysphagia therapy 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 6.2 Optode and optical sensor selection . . . . . . . . . . . . . . . . . . . . . . . . . 60 6.2.1 Optode and optical sensor evaluation procedure . . . . . . . . . . . . . . 61 6.2.2 Selected optical sensor characterization . . . . . . . . . . . . . . . . . . 62 6.2.3 Mapping from counts to millimeter . . . . . . . . . . . . . . . . . . . . . 62 6.2.4 Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 6.3 Device design and hardware implementation . . . . . . . . . . . . . . . . . . . . 64 6.3.1 Block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 6.3.2 Optode placement and circuit board dimensions . . . . . . . . . . . . . 64 6.3.3 Firmware description and measurement cycle . . . . . . . . . . . . . . . 66 6.3.4 Encapsulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 6.3.5 Fully assembled OPG device . . . . . . . . . . . . . . . . . . . . . . . . 67 6.4 Evaluation on the gesture recognition task . . . . . . . . . . . . . . . . . . . . . 69 6.4.1 Exercise selection, setup and recording . . . . . . . . . . . . . . . . . . . 69 6.4.2 Data corpus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 6.4.3 Sequence pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 6.4.4 Choice of classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 6.4.5 Training and evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 6.4.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 6.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 7 Improved device design with application in dysarthria therapy 7.1 Device design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 7.1.1 Design considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 7.1.2 General system overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 7.1.3 Intraoral sensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 7.1.4 Receiver and controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 7.1.5 Multiplexer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 7.2 Hardware implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 7.2.1 Optode placement and circuit board layout . . . . . . . . . . . . . . . . 87 7.2.2 Encapsulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 7.3 Device characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 7.3.1 Photodiode transient response . . . . . . . . . . . . . . . . . . . . . . . 91 7.3.2 Current source and rise time . . . . . . . . . . . . . . . . . . . . . . . . 91 7.3.3 Multiplexer switching speed . . . . . . . . . . . . . . . . . . . . . . . . . 92 7.3.4 Measurement cycle and firmware implementation . . . . . . . . . . . . . 93 7.3.5 In vitro measurement accuracy . . . . . . . . . . . . . . . . . . . . . . . 95 7.3.6 Optode measurement stability . . . . . . . . . . . . . . . . . . . . . . . 96 7.4 Evaluation on the phoneme recognition task . . . . . . . . . . . . . . . . . . . . 98 7.4.1 Corpus selection and recording setup . . . . . . . . . . . . . . . . . . . . 98 7.4.2 Annotation and sensor data post-processing . . . . . . . . . . . . . . . . 98 7.4.3 Mapping from counts to millimeter . . . . . . . . . . . . . . . . . . . . . 99 7.4.4 Classifier and feature selection . . . . . . . . . . . . . . . . . . . . . . . 100 7.4.5 Evaluation paradigms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 7.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 7.5.1 Tongue distance curve prediction . . . . . . . . . . . . . . . . . . . . . . 105 7.5.2 Tongue contact patterns and contours . . . . . . . . . . . . . . . . . . . 105 7.5.3 Phoneme recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 7.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 8 Conclusion and future work 115 9 Appendix 9.1 Analytical light transport models . . . . . . . . . . . . . . . . . . . . . . . . . . 119 9.2 Meshed Monte Carlo method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 9.3 Laser safety . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 9.4 Current source modulation voltage . . . . . . . . . . . . . . . . . . . . . . . . . 123 9.5 Transimpedance amplifier’s frequency responses . . . . . . . . . . . . . . . . . . 123 9.6 Initial OPG device’s PCB layout and circuit diagrams . . . . . . . . . . . . . . 127 9.7 Improved OPG device’s PCB layout and circuit diagrams . . . . . . . . . . . . 129 9.8 Test station layout drawing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 Bibliography 152Der Schlaganfall ist eine der häufigsten Ursachen für motorische Langzeitbehinderungen, einschließlich solcher im Mund- und Gesichtsbereich, deren Folgen u.a. Sprech- und Schluckprobleme beinhalten, welche sich in den beiden Symptomen Dysarthrie und Dysphagie äußern. In den letzten Jahrzehnten haben sich Rehabilitationsprogramme für die Behandlung von motorisch ausgeprägten Schlaganfallsymptomatiken substantiell weiterentwickelt. So liegt nicht mehr die reine Kompensation von verlorengegangener motorischer Funktionalität im Vordergrund, sondern deren aktive Wiederherstellung. Dabei hat u.a. die Verwendung von sogenanntem Biofeedback vermehrt Einzug in die Therapie erhalten, um Motivation, Engagement und Selbstwahrnehmung von ansonsten unbewussten Bewegungsabläufen seitens der Patienten zu fördern. Obwohl jedoch Sprech- und Schluckstörungen eine der häufigsten Folgen eines Schlaganfalls darstellen, wird diese Tatsache nicht von der aktuellen Entwicklung neuer Geräte und Messmethoden für quantitatives und umfassendes Biofeedback reflektiert, insbesondere nicht für die explizite Erfassung intraoraler Zungenkinematik und -kinetik und für den Anwendungsfall in der Schlaganfalltherapie. Ein möglicher Grund dafür liegt in den sehr strikten Anforderungen an ein solche Messmethode: Sie muss neben Portabilität idealerweise sowohl den Kontakt zwischen der Zunge und dem Gaumen, als auch die dreidimensionale Bewegung der Zunge in der Mundhöhle erfassen, ohne dabei die Artikulation selbst zu beeinflussen. Um diesen Anforderungen gerecht zu werden, wird in dieser Dissertation das Messprinzip der Optopalatographie untersucht, mit dem Schwerpunkt auf der Anwendung in der Dysarthrie- und Dysphagietherapie. Dies beinhaltet auch die Entwicklung eines entsprechenden Gerätes sowie dessen Befestigungsmethode in der Mundhöhle über ein dediziertes Mundschleimhautadhäsiv. Letzteres umgeht das bisherige Problem der notwendigen Anpassung eines solchen intraoralen Gerätes an einen einzelnen Nutzer. Für die Anwendung in der Dysphagietherapie erfolgte die Evaluation anhand einer automatischen Erkennung von Mobilisationsübungen der Zunge, welche routinemäßig in der funktionalen Dysphagietherapie durchgeführt werden. Für die Anwendung in der Dysarthrietherapie wurde eine Lauterkennung durchgeführt. Die Resultate bezüglich der Verwendung des Mundschleimhautadhäsives suggerieren, dass dieses tatsächlich eine valide Alternative zu den bisher verwendeten Techniken zur Befestigung intraoraler Geräte in der Mundhöhle darstellt. Zungenmobilisationsübungen wurden über Probanden hinweg mit einer Rate von 61 % erkannt, wogegen in der Lauterkennung Langvokale die höchste Erkennungsrate erzielten, gefolgt von Kurzvokalen und Konsonanten. Zusammenfassend lässt sich konstatieren, dass das Prinzip der Optopalatographie eine ernstzunehmende Option für die intraorale Erfassung von Zungenbewegungen darstellt, wobei weitere Entwicklungsschritte notwendig sind, welche im Ausblick zusammengefasst sind.:1 Introduction 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Goals and contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Scope and limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Basics of post-stroke speech and swallowing therapy 2.1 Dysarthria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Dysphagia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3 Treatment rationale and potential of biofeedback . . . . . . . . . . . . . . . . . 13 2.4 Summary and conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3 Tongue motion sensing 3.1 Contact-based methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.1.1 Electropalatography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.1.2 Manometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.1.3 Capacitive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.2 Non-contact based methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.2.1 Electromagnetic articulography . . . . . . . . . . . . . . . . . . . . . . . 23 3.2.2 Permanent magnetic articulography . . . . . . . . . . . . . . . . . . . . 24 3.2.3 Optopalatography (related work) . . . . . . . . . . . . . . . . . . . . . . 25 3.3 Electro-optical stomatography . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.4 Extraoral sensing techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.5 Summary, comparison and conclusion . . . . . . . . . . . . . . . . . . . . . . . 29 4 Fundamentals of optopalatography 4.1 Important radiometric quantities . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.1.1 Solid angle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.1.2 Radiant flux and radiant intensity . . . . . . . . . . . . . . . . . . . . . 33 4.1.3 Irradiance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.1.4 Radiance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.2 Sensing principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.2.1 Analytical models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.2.2 Monte Carlo ray tracing methods . . . . . . . . . . . . . . . . . . . . . . 37 4.2.3 Data-driven models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.2.4 Model comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.3 A priori device design consideration . . . . . . . . . . . . . . . . . . . . . . . . 41 4.3.1 Optoelectronic components . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.3.2 Additional electrical components and requirements . . . . . . . . . . . . 43 4.3.3 Intraoral sensor layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 5 Intraoral device anchorage 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.1.1 Mucoadhesion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5.1.2 Considerations for the palatal adhesive . . . . . . . . . . . . . . . . . . . 48 5.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5.2.1 Polymer selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5.2.2 Fabrication method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 5.2.3 Formulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 5.2.4 PEO tablets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 5.2.5 Connection to the intraoral sensor’s encapsulation . . . . . . . . . . . . 50 5.2.6 Formulation evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 5.3.1 Initial formulation evaluation . . . . . . . . . . . . . . . . . . . . . . . . 54 5.3.2 Final OPG adhesive formulation . . . . . . . . . . . . . . . . . . . . . . 56 5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 6 Initial device design with application in dysphagia therapy 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 6.2 Optode and optical sensor selection . . . . . . . . . . . . . . . . . . . . . . . . . 60 6.2.1 Optode and optical sensor evaluation procedure . . . . . . . . . . . . . . 61 6.2.2 Selected optical sensor characterization . . . . . . . . . . . . . . . . . . 62 6.2.3 Mapping from counts to millimeter . . . . . . . . . . . . . . . . . . . . . 62 6.2.4 Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 6.3 Device design and hardware implementation . . . . . . . . . . . . . . . . . . . . 64 6.3.1 Block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 6.3.2 Optode placement and circuit board dimensions . . . . . . . . . . . . . 64 6.3.3 Firmware description and measurement cycle . . . . . . . . . . . . . . . 66 6.3.4 Encapsulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 6.3.5 Fully assembled OPG device . . . . . . . . . . . . . . . . . . . . . . . . 67 6.4 Evaluation on the gesture recognition task . . . . . . . . . . . . . . . . . . . . . 69 6.4.1 Exercise selection, setup and recording . . . . . . . . . . . . . . . . . . . 69 6.4.2 Data corpus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 6.4.3 Sequence pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 6.4.4 Choice of classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 6.4.5 Training and evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 6.4.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 6.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 7 Improved device design with application in dysarthria therapy 7.1 Device design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 7.1.1 Design considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 7.1.2 General system overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 7.1.3 Intraoral sensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 7.1.4 Receiver and controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 7.1.5 Multiplexer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 7.2 Hardware implementation . . . . . . . . . . . . . . . . . . . . .

Technische Universität Dresden: Qucosa