Search CORE

64 research outputs found

Retainer-Free Optopalatographic Device Design and Evaluation as a Feedback Tool in Post-Stroke Speech and Swallowing Therapy

Author: Wagner Christoph
Publication venue
Publication date: 21/11/2023
Field of study

Stroke is one of the leading causes of long-term motor disability, including oro-facial impairments which affect speech and swallowing. Over the last decades, rehabilitation programs have evolved from utilizing mainly compensatory measures to focusing on recovering lost function. In the continuing effort to improve recovery, the concept of biofeedback has increasingly been leveraged to enhance self-efficacy, motivation and engagement during training. Although both speech and swallowing disturbances resulting from oro-facial impairments are frequent sequelae of stroke, efforts to develop sensing technologies that provide comprehensive and quantitative feedback on articulator kinematics and kinetics, especially those of the tongue, and specifically during post-stroke speech and swallowing therapy have been sparse. To that end, such a sensing device needs to accurately capture intraoral tongue motion and contact with the hard palate, which can then be translated into an appropriate form of feedback, without affecting tongue motion itself and while still being light-weight and portable. This dissertation proposes the use of an intraoral sensing principle known as optopalatography to provide such feedback while also exploring the design of optopalatographic devices itself for use in dysphagia and dysarthria therapy. Additionally, it presents an alternative means of holding the device in place inside the oral cavity with a newly developed palatal adhesive instead of relying on dental retainers, which previously limited device usage to a single person. The evaluation was performed on the task of automatically classifying different functional tongue exercises from one another with application in dysphagia therapy, whereas a phoneme recognition task was conducted with application in dysarthria therapy. Results on the palatal adhesive suggest that it is indeed a valid alternative to dental retainers when device residence time inside the oral cavity is limited to several tens of minutes per session, which is the case for dysphagia and dysarthria therapy. Functional tongue exercises were classified with approximately 61 % accuracy across subjects, whereas for the phoneme recognition task, tense vowels had the highest recognition rate, followed by lax vowels and consonants. In summary, retainer-free optopalatography has the potential to become a viable method for providing real-time feedback on tongue movements inside the oral cavity, but still requires further improvements as outlined in the remarks on future development.:1 Introduction 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Goals and contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Scope and limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Basics of post-stroke speech and swallowing therapy 2.1 Dysarthria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Dysphagia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3 Treatment rationale and potential of biofeedback . . . . . . . . . . . . . . . . . 13 2.4 Summary and conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3 Tongue motion sensing 3.1 Contact-based methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.1.1 Electropalatography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.1.2 Manometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.1.3 Capacitive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.2 Non-contact based methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.2.1 Electromagnetic articulography . . . . . . . . . . . . . . . . . . . . . . . 23 3.2.2 Permanent magnetic articulography . . . . . . . . . . . . . . . . . . . . 24 3.2.3 Optopalatography (related work) . . . . . . . . . . . . . . . . . . . . . . 25 3.3 Electro-optical stomatography . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.4 Extraoral sensing techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.5 Summary, comparison and conclusion . . . . . . . . . . . . . . . . . . . . . . . 29 4 Fundamentals of optopalatography 4.1 Important radiometric quantities . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.1.1 Solid angle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.1.2 Radiant flux and radiant intensity . . . . . . . . . . . . . . . . . . . . . 33 4.1.3 Irradiance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.1.4 Radiance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.2 Sensing principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.2.1 Analytical models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.2.2 Monte Carlo ray tracing methods . . . . . . . . . . . . . . . . . . . . . . 37 4.2.3 Data-driven models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.2.4 Model comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.3 A priori device design consideration . . . . . . . . . . . . . . . . . . . . . . . . 41 4.3.1 Optoelectronic components . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.3.2 Additional electrical components and requirements . . . . . . . . . . . . 43 4.3.3 Intraoral sensor layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 5 Intraoral device anchorage 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.1.1 Mucoadhesion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5.1.2 Considerations for the palatal adhesive . . . . . . . . . . . . . . . . . . . 48 5.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5.2.1 Polymer selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5.2.2 Fabrication method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 5.2.3 Formulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 5.2.4 PEO tablets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 5.2.5 Connection to the intraoral sensor’s encapsulation . . . . . . . . . . . . 50 5.2.6 Formulation evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 5.3.1 Initial formulation evaluation . . . . . . . . . . . . . . . . . . . . . . . . 54 5.3.2 Final OPG adhesive formulation . . . . . . . . . . . . . . . . . . . . . . 56 5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 6 Initial device design with application in dysphagia therapy 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 6.2 Optode and optical sensor selection . . . . . . . . . . . . . . . . . . . . . . . . . 60 6.2.1 Optode and optical sensor evaluation procedure . . . . . . . . . . . . . . 61 6.2.2 Selected optical sensor characterization . . . . . . . . . . . . . . . . . . 62 6.2.3 Mapping from counts to millimeter . . . . . . . . . . . . . . . . . . . . . 62 6.2.4 Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 6.3 Device design and hardware implementation . . . . . . . . . . . . . . . . . . . . 64 6.3.1 Block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 6.3.2 Optode placement and circuit board dimensions . . . . . . . . . . . . . 64 6.3.3 Firmware description and measurement cycle . . . . . . . . . . . . . . . 66 6.3.4 Encapsulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 6.3.5 Fully assembled OPG device . . . . . . . . . . . . . . . . . . . . . . . . 67 6.4 Evaluation on the gesture recognition task . . . . . . . . . . . . . . . . . . . . . 69 6.4.1 Exercise selection, setup and recording . . . . . . . . . . . . . . . . . . . 69 6.4.2 Data corpus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 6.4.3 Sequence pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 6.4.4 Choice of classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 6.4.5 Training and evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 6.4.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 6.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 7 Improved device design with application in dysarthria therapy 7.1 Device design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 7.1.1 Design considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 7.1.2 General system overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 7.1.3 Intraoral sensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 7.1.4 Receiver and controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 7.1.5 Multiplexer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 7.2 Hardware implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 7.2.1 Optode placement and circuit board layout . . . . . . . . . . . . . . . . 87 7.2.2 Encapsulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 7.3 Device characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 7.3.1 Photodiode transient response . . . . . . . . . . . . . . . . . . . . . . . 91 7.3.2 Current source and rise time . . . . . . . . . . . . . . . . . . . . . . . . 91 7.3.3 Multiplexer switching speed . . . . . . . . . . . . . . . . . . . . . . . . . 92 7.3.4 Measurement cycle and firmware implementation . . . . . . . . . . . . . 93 7.3.5 In vitro measurement accuracy . . . . . . . . . . . . . . . . . . . . . . . 95 7.3.6 Optode measurement stability . . . . . . . . . . . . . . . . . . . . . . . 96 7.4 Evaluation on the phoneme recognition task . . . . . . . . . . . . . . . . . . . . 98 7.4.1 Corpus selection and recording setup . . . . . . . . . . . . . . . . . . . . 98 7.4.2 Annotation and sensor data post-processing . . . . . . . . . . . . . . . . 98 7.4.3 Mapping from counts to millimeter . . . . . . . . . . . . . . . . . . . . . 99 7.4.4 Classifier and feature selection . . . . . . . . . . . . . . . . . . . . . . . 100 7.4.5 Evaluation paradigms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 7.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 7.5.1 Tongue distance curve prediction . . . . . . . . . . . . . . . . . . . . . . 105 7.5.2 Tongue contact patterns and contours . . . . . . . . . . . . . . . . . . . 105 7.5.3 Phoneme recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 7.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 8 Conclusion and future work 115 9 Appendix 9.1 Analytical light transport models . . . . . . . . . . . . . . . . . . . . . . . . . . 119 9.2 Meshed Monte Carlo method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 9.3 Laser safety . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 9.4 Current source modulation voltage . . . . . . . . . . . . . . . . . . . . . . . . . 123 9.5 Transimpedance amplifier’s frequency responses . . . . . . . . . . . . . . . . . . 123 9.6 Initial OPG device’s PCB layout and circuit diagrams . . . . . . . . . . . . . . 127 9.7 Improved OPG device’s PCB layout and circuit diagrams . . . . . . . . . . . . 129 9.8 Test station layout drawing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 Bibliography 152Der Schlaganfall ist eine der häufigsten Ursachen für motorische Langzeitbehinderungen, einschließlich solcher im Mund- und Gesichtsbereich, deren Folgen u.a. Sprech- und Schluckprobleme beinhalten, welche sich in den beiden Symptomen Dysarthrie und Dysphagie äußern. In den letzten Jahrzehnten haben sich Rehabilitationsprogramme für die Behandlung von motorisch ausgeprägten Schlaganfallsymptomatiken substantiell weiterentwickelt. So liegt nicht mehr die reine Kompensation von verlorengegangener motorischer Funktionalität im Vordergrund, sondern deren aktive Wiederherstellung. Dabei hat u.a. die Verwendung von sogenanntem Biofeedback vermehrt Einzug in die Therapie erhalten, um Motivation, Engagement und Selbstwahrnehmung von ansonsten unbewussten Bewegungsabläufen seitens der Patienten zu fördern. Obwohl jedoch Sprech- und Schluckstörungen eine der häufigsten Folgen eines Schlaganfalls darstellen, wird diese Tatsache nicht von der aktuellen Entwicklung neuer Geräte und Messmethoden für quantitatives und umfassendes Biofeedback reflektiert, insbesondere nicht für die explizite Erfassung intraoraler Zungenkinematik und -kinetik und für den Anwendungsfall in der Schlaganfalltherapie. Ein möglicher Grund dafür liegt in den sehr strikten Anforderungen an ein solche Messmethode: Sie muss neben Portabilität idealerweise sowohl den Kontakt zwischen der Zunge und dem Gaumen, als auch die dreidimensionale Bewegung der Zunge in der Mundhöhle erfassen, ohne dabei die Artikulation selbst zu beeinflussen. Um diesen Anforderungen gerecht zu werden, wird in dieser Dissertation das Messprinzip der Optopalatographie untersucht, mit dem Schwerpunkt auf der Anwendung in der Dysarthrie- und Dysphagietherapie. Dies beinhaltet auch die Entwicklung eines entsprechenden Gerätes sowie dessen Befestigungsmethode in der Mundhöhle über ein dediziertes Mundschleimhautadhäsiv. Letzteres umgeht das bisherige Problem der notwendigen Anpassung eines solchen intraoralen Gerätes an einen einzelnen Nutzer. Für die Anwendung in der Dysphagietherapie erfolgte die Evaluation anhand einer automatischen Erkennung von Mobilisationsübungen der Zunge, welche routinemäßig in der funktionalen Dysphagietherapie durchgeführt werden. Für die Anwendung in der Dysarthrietherapie wurde eine Lauterkennung durchgeführt. Die Resultate bezüglich der Verwendung des Mundschleimhautadhäsives suggerieren, dass dieses tatsächlich eine valide Alternative zu den bisher verwendeten Techniken zur Befestigung intraoraler Geräte in der Mundhöhle darstellt. Zungenmobilisationsübungen wurden über Probanden hinweg mit einer Rate von 61 % erkannt, wogegen in der Lauterkennung Langvokale die höchste Erkennungsrate erzielten, gefolgt von Kurzvokalen und Konsonanten. Zusammenfassend lässt sich konstatieren, dass das Prinzip der Optopalatographie eine ernstzunehmende Option für die intraorale Erfassung von Zungenbewegungen darstellt, wobei weitere Entwicklungsschritte notwendig sind, welche im Ausblick zusammengefasst sind.:1 Introduction 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Goals and contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Scope and limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Basics of post-stroke speech and swallowing therapy 2.1 Dysarthria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Dysphagia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3 Treatment rationale and potential of biofeedback . . . . . . . . . . . . . . . . . 13 2.4 Summary and conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3 Tongue motion sensing 3.1 Contact-based methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.1.1 Electropalatography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.1.2 Manometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.1.3 Capacitive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.2 Non-contact based methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.2.1 Electromagnetic articulography . . . . . . . . . . . . . . . . . . . . . . . 23 3.2.2 Permanent magnetic articulography . . . . . . . . . . . . . . . . . . . . 24 3.2.3 Optopalatography (related work) . . . . . . . . . . . . . . . . . . . . . . 25 3.3 Electro-optical stomatography . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.4 Extraoral sensing techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.5 Summary, comparison and conclusion . . . . . . . . . . . . . . . . . . . . . . . 29 4 Fundamentals of optopalatography 4.1 Important radiometric quantities . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.1.1 Solid angle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.1.2 Radiant flux and radiant intensity . . . . . . . . . . . . . . . . . . . . . 33 4.1.3 Irradiance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.1.4 Radiance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.2 Sensing principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.2.1 Analytical models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.2.2 Monte Carlo ray tracing methods . . . . . . . . . . . . . . . . . . . . . . 37 4.2.3 Data-driven models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.2.4 Model comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.3 A priori device design consideration . . . . . . . . . . . . . . . . . . . . . . . . 41 4.3.1 Optoelectronic components . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.3.2 Additional electrical components and requirements . . . . . . . . . . . . 43 4.3.3 Intraoral sensor layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 5 Intraoral device anchorage 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.1.1 Mucoadhesion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5.1.2 Considerations for the palatal adhesive . . . . . . . . . . . . . . . . . . . 48 5.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5.2.1 Polymer selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5.2.2 Fabrication method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 5.2.3 Formulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 5.2.4 PEO tablets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 5.2.5 Connection to the intraoral sensor’s encapsulation . . . . . . . . . . . . 50 5.2.6 Formulation evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 5.3.1 Initial formulation evaluation . . . . . . . . . . . . . . . . . . . . . . . . 54 5.3.2 Final OPG adhesive formulation . . . . . . . . . . . . . . . . . . . . . . 56 5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 6 Initial device design with application in dysphagia therapy 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 6.2 Optode and optical sensor selection . . . . . . . . . . . . . . . . . . . . . . . . . 60 6.2.1 Optode and optical sensor evaluation procedure . . . . . . . . . . . . . . 61 6.2.2 Selected optical sensor characterization . . . . . . . . . . . . . . . . . . 62 6.2.3 Mapping from counts to millimeter . . . . . . . . . . . . . . . . . . . . . 62 6.2.4 Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 6.3 Device design and hardware implementation . . . . . . . . . . . . . . . . . . . . 64 6.3.1 Block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 6.3.2 Optode placement and circuit board dimensions . . . . . . . . . . . . . 64 6.3.3 Firmware description and measurement cycle . . . . . . . . . . . . . . . 66 6.3.4 Encapsulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 6.3.5 Fully assembled OPG device . . . . . . . . . . . . . . . . . . . . . . . . 67 6.4 Evaluation on the gesture recognition task . . . . . . . . . . . . . . . . . . . . . 69 6.4.1 Exercise selection, setup and recording . . . . . . . . . . . . . . . . . . . 69 6.4.2 Data corpus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 6.4.3 Sequence pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 6.4.4 Choice of classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 6.4.5 Training and evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 6.4.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 6.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 7 Improved device design with application in dysarthria therapy 7.1 Device design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 7.1.1 Design considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 7.1.2 General system overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 7.1.3 Intraoral sensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 7.1.4 Receiver and controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 7.1.5 Multiplexer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 7.2 Hardware implementation . . . . . . . . . . . . . . . . . . . . .

Technische Universität Dresden: Qucosa

A Silent-Speech Interface using Electro-Optical Stomatography

Author: Stone Simon
Publication venue: Thelem Universitätsverlag & Buchhandlung GmbH & Co. KG
Publication date: 21/06/2022
Field of study

Sprachtechnologie ist eine große und wachsende Industrie, die das Leben von technologieinteressierten Nutzern auf zahlreichen Wegen bereichert. Viele potenzielle Nutzer werden jedoch ausgeschlossen: Nämlich alle Sprecher, die nur schwer oder sogar gar nicht Sprache produzieren können. Silent-Speech Interfaces bieten einen Weg, mit Maschinen durch ein bequemes sprachgesteuertes Interface zu kommunizieren ohne dafür akustische Sprache zu benötigen. Sie können außerdem prinzipiell eine Ersatzstimme stellen, indem sie die intendierten Äußerungen, die der Nutzer nur still artikuliert, künstlich synthetisieren. Diese Dissertation stellt ein neues Silent-Speech Interface vor, das auf einem neu entwickelten Messsystem namens Elektro-Optischer Stomatografie und einem neuartigen parametrischen Vokaltraktmodell basiert, das die Echtzeitsynthese von Sprache basierend auf den gemessenen Daten ermöglicht. Mit der Hardware wurden Studien zur Einzelworterkennung durchgeführt, die den Stand der Technik in der intra- und inter-individuellen Genauigkeit erreichten und übertrafen. Darüber hinaus wurde eine Studie abgeschlossen, in der die Hardware zur Steuerung des Vokaltraktmodells in einer direkten Artikulation-zu-Sprache-Synthese verwendet wurde. Während die Verständlichkeit der Synthese von Vokalen sehr hoch eingeschätzt wurde, ist die Verständlichkeit von Konsonanten und kontinuierlicher Sprache sehr schlecht. Vielversprechende Möglichkeiten zur Verbesserung des Systems werden im Ausblick diskutiert.:Statement of authorship iii Abstract v List of Figures vii List of Tables xi Acronyms xiii 1. Introduction 1 1.1. The concept of a Silent-Speech Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2. Structure of this work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Fundamentals of phonetics 7 2.1. Components of the human speech production system . . . . . . . . . . . . . . . . . . . 7 2.2. Vowel sounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3. Consonantal sounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.4. Acoustic properties of speech sounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.5. Coarticulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.6. Phonotactics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.7. Summary and implications for the design of a Silent-Speech Interface (SSI) . . . . . . . 21 3. Articulatory data acquisition techniques in Silent-Speech Interfaces 25 3.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.2. Scope of the literature review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.3. Video Recordings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.4. Ultrasonography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.5. Electromyography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.6. Permanent-Magnetic Articulography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.7. Electromagnetic Articulography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.8. Radio waves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.9. Palatography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.10.Conclusion and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4. Electro-Optical Stomatography 55 4.1. Contact sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.2. Optical distance sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.3. Lip sensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.4. Sensor Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.5. Control Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 4.6. Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 5. Articulation-to-Text 99 5.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 5.2. Command word recognition pilot study . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 5.3. Command word recognition small-scale study . . . . . . . . . . . . . . . . . . . . . . . . 102 6. Articulation-to-Speech 109 6.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 6.2. Articulatory synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 6.3. The six point vocal tract model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 6.4. Objective evaluation of the vocal tract model . . . . . . . . . . . . . . . . . . . . . . . . 116 6.5. Perceptual evaluation of the vocal tract model . . . . . . . . . . . . . . . . . . . . . . . . 120 6.6. Direct synthesis using EOS to control the vocal tract model . . . . . . . . . . . . . . . . 125 6.7. Pitch and voicing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 7. Summary and outlook 145 7.1. Summary of the contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 7.2. Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 A. Overview of the International Phonetic Alphabet 151 B. Mathematical proofs and derivations 153 B.1. Combinatoric calculations illustrating the reduction of possible syllables using phonotactics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 B.2. Signal Averaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 B.3. Effect of the contact sensor area on the conductance . . . . . . . . . . . . . . . . . . . . 155 B.4. Calculation of the forward current for the OP280V diode . . . . . . . . . . . . . . . . . . 155 C. Schematics and layouts 157 C.1. Schematics of the control unit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 C.2. Layout of the control unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 C.3. Bill of materials of the control unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 C.4. Schematics of the sensor unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 C.5. Layout of the sensor unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 C.6. Bill of materials of the sensor unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 D. Sensor unit assembly 169 E. Firmware flow and data protocol 177 F. Palate file format 181 G. Supplemental material regarding the vocal tract model 183 H. Articulation-to-Speech: Optimal hyperparameters 189 Bibliography 191Speech technology is a major and growing industry that enriches the lives of technologically-minded people in a number of ways. Many potential users are, however, excluded: Namely, all speakers who cannot easily or even at all produce speech. Silent-Speech Interfaces offer a way to communicate with a machine by a convenient speech recognition interface without the need for acoustic speech. They also can potentially provide a full replacement voice by synthesizing the intended utterances that are only silently articulated by the user. To that end, the speech movements need to be captured and mapped to either text or acoustic speech. This dissertation proposes a new Silent-Speech Interface based on a newly developed measurement technology called Electro-Optical Stomatography and a novel parametric vocal tract model to facilitate real-time speech synthesis based on the measured data. The hardware was used to conduct command word recognition studies reaching state-of-the-art intra- and inter-individual performance. Furthermore, a study on using the hardware to control the vocal tract model in a direct articulation-to-speech synthesis loop was also completed. While the intelligibility of synthesized vowels was high, the intelligibility of consonants and connected speech was quite poor. Promising ways to improve the system are discussed in the outlook.:Statement of authorship iii Abstract v List of Figures vii List of Tables xi Acronyms xiii 1. Introduction 1 1.1. The concept of a Silent-Speech Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2. Structure of this work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Fundamentals of phonetics 7 2.1. Components of the human speech production system . . . . . . . . . . . . . . . . . . . 7 2.2. Vowel sounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3. Consonantal sounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.4. Acoustic properties of speech sounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.5. Coarticulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.6. Phonotactics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.7. Summary and implications for the design of a Silent-Speech Interface (SSI) . . . . . . . 21 3. Articulatory data acquisition techniques in Silent-Speech Interfaces 25 3.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.2. Scope of the literature review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.3. Video Recordings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.4. Ultrasonography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.5. Electromyography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.6. Permanent-Magnetic Articulography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.7. Electromagnetic Articulography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.8. Radio waves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.9. Palatography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.10.Conclusion and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4. Electro-Optical Stomatography 55 4.1. Contact sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.2. Optical distance sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.3. Lip sensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.4. Sensor Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.5. Control Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 4.6. Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 5. Articulation-to-Text 99 5.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 5.2. Command word recognition pilot study . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 5.3. Command word recognition small-scale study . . . . . . . . . . . . . . . . . . . . . . . . 102 6. Articulation-to-Speech 109 6.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 6.2. Articulatory synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 6.3. The six point vocal tract model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 6.4. Objective evaluation of the vocal tract model . . . . . . . . . . . . . . . . . . . . . . . . 116 6.5. Perceptual evaluation of the vocal tract model . . . . . . . . . . . . . . . . . . . . . . . . 120 6.6. Direct synthesis using EOS to control the vocal tract model . . . . . . . . . . . . . . . . 125 6.7. Pitch and voicing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 7. Summary and outlook 145 7.1. Summary of the contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 7.2. Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 A. Overview of the International Phonetic Alphabet 151 B. Mathematical proofs and derivations 153 B.1. Combinatoric calculations illustrating the reduction of possible syllables using phonotactics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 B.2. Signal Averaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 B.3. Effect of the contact sensor area on the conductance . . . . . . . . . . . . . . . . . . . . 155 B.4. Calculation of the forward current for the OP280V diode . . . . . . . . . . . . . . . . . . 155 C. Schematics and layouts 157 C.1. Schematics of the control unit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 C.2. Layout of the control unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 C.3. Bill of materials of the control unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 C.4. Schematics of the sensor unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 C.5. Layout of the sensor unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 C.6. Bill of materials of the sensor unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 D. Sensor unit assembly 169 E. Firmware flow and data protocol 177 F. Palate file format 181 G. Supplemental material regarding the vocal tract model 183 H. Articulation-to-Speech: Optimal hyperparameters 189 Bibliography 19

Technische Universität Dresden: Qucosa

Multi-modal imaging of brain networks subserving speech comprehension

Author: Halai Ajay
Publication venue
Publication date: 01/08/2014
Field of study

The University of Manchester - Institutional Repository

Dispersion of axially symmetric waves in fluid-filled cylindrical shells

Author: Ahyi A. C.
Bao X.L.
Bjørnø Irina
Jensen Leif Bjørnø
Raju P. K.
Überall H.
Publication venue: 'Acoustical Society of America (ASA)'
Publication date: 01/01/2000
Field of study

Crossref

Online Research Database In Technology

Models and analysis of vocal emissions for biomedical applications

Author
Publication venue: 'Firenze University Press'
Publication date: 31/05/2022
Field of study

This book of Proceedings collects the papers presented at the 4th International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, MAVEBA 2005, held 29-31 October 2005, Firenze, Italy. The workshop is organised every two years, and aims to stimulate contacts between specialists active in research and industrial developments, in the area of voice analysis for biomedical applications. The scope of the Workshop includes all aspects of voice modelling and analysis, ranging from fundamental research to all kinds of biomedical applications and related established and advanced technologies

Directory of Open Access Books (DOAB)

Leveraging Spatiotemporal Relationships of High-frequency Activation in Human Electrocorticographic Recordings for Speech Brain-Computer-Interface

Author: Milsap Griffin W
Publication venue: 'The Busan Gyeongnam Mathematical Society'
Publication date: 15/04/2019
Field of study

Speech production is one of the most intricate yet natural human behaviors and is most keenly appreciated when it becomes difficult or impossible; as is the case for patients suffering from locked-in syndrome. Burgeoning understanding of the various cortical representations of language has brought into question the viability of a speech neuroprosthesis using implanted electrodes. The temporal resolution of intracranial electrophysiological recordings, frequently billed as a great asset of electrocorticography (ECoG), has actually been a hindrance as speech decoders have struggled to take advantage of this timing information. There have been few demonstrations of how well a speech neuroprosthesis will realistically generalize across contexts when constructed using causal feature extraction and language models that can be applied and adapted in real-time. The research detailed in this dissertation aims primarily to characterize the spatiotemporal relationships of high frequency activity across ECoG arrays during word production. Once identified, these relationships map to motor and semantic representations of speech through the use of algorithms and classifiers that rapidly quantify these relationships in single-trials. The primary hypothesis put forward by this dissertation is that the onset, duration and temporal profile of high frequency activity in ECoG recordings is a useful feature for speech decoding. These features have rarely been used in state-of-the-art speech decoders, which tend to produce output from instantaneous high frequency power across cortical sites, or rely upon precise behavioral time-locking to take advantage of high frequency activity at several time-points relative to behavioral onset times. This hypothesis was examined in three separate studies. First, software was created that rapidly characterizes spatiotemporal relationships of neural features. Second, semantic representations of speech were examined using these spatiotemporal features. Finally, utterances were discriminated in single-trials with low latency and high accuracy using spatiotemporal matched filters in a neural keyword-spotting paradigm. Outcomes from this dissertation inform implant placement for a human speech prosthesis and provide the scientific and methodological basis to motivate further research of an implant specifically for speech-based brain-computer-interfaces

JScholarship

Expectation suppression across sensory modalitites: a MEG investigation

Author: Nara Sanjeev
Publication venue
Publication date: 29/03/2022
Field of study

140 p.In the last few decades, a lot of research focus has been to understand how the human brain generates expectation about the incoming sensory responses and how it deals with surprise or unpredictable input. It is evident in predictive processing literature that the human brain suppresses the neural responses to predictable/expected stimuli (termed as expectation suppression effect). This thesis provide evidence to how expectation suppression is affected by content-based expectations (what) and temporal uncertainty (when) across sensory modalities (visual and auditory) using state-of-art Magnetoencephalography (MEG) imaging. The result shows that visual domain is more sensitive to content-based expectations (what) more than the timing (when), also visual domain shows sensitivity to timing (when) only if what was predictable. However, Auditory domain is equally sensitive to what and when features, showing enhanced suppression to expectation compared to visual domain. This thesis concludes conclude that the sensory modalities deal differently with the contextual expectations and temporal predictability. This suggests that while investigating predictive processing in the human brain, the modality specific differences should be considered, since the predictive mechanism at work in one domain should not necessarily be generalized to other domains as well

Archivo Digital para la Docencia y la Investigación

State of the art of audio- and video based solutions for AAL

Working Group 3. Audio- and Video-based AAL ApplicationsIt is a matter of fact that Europe is facing more and more crucial challenges regarding health and social care due to the demographic change and the current economic context. The recent COVID-19 pandemic has stressed this situation even further, thus highlighting the need for taking action. Active and Assisted Living (AAL) technologies come as a viable approach to help facing these challenges, thanks to the high potential they have in enabling remote care and support. Broadly speaking, AAL can be referred to as the use of innovative and advanced Information and Communication Technologies to create supportive, inclusive and empowering applications and environments that enable older, impaired or frail people to live independently and stay active longer in society. AAL capitalizes on the growing pervasiveness and effectiveness of sensing and computing facilities to supply the persons in need with smart assistance, by responding to their necessities of autonomy, independence, comfort, security and safety. The application scenarios addressed by AAL are complex, due to the inherent heterogeneity of the end-user population, their living arrangements, and their physical conditions or impairment. Despite aiming at diverse goals, AAL systems should share some common characteristics. They are designed to provide support in daily life in an invisible, unobtrusive and user-friendly manner. Moreover, they are conceived to be intelligent, to be able to learn and adapt to the requirements and requests of the assisted people, and to synchronise with their specific needs. Nevertheless, to ensure the uptake of AAL in society, potential users must be willing to use AAL applications and to integrate them in their daily environments and lives. In this respect, video- and audio-based AAL applications have several advantages, in terms of unobtrusiveness and information richness. Indeed, cameras and microphones are far less obtrusive with respect to the hindrance other wearable sensors may cause to one’s activities. In addition, a single camera placed in a room can record most of the activities performed in the room, thus replacing many other non-visual sensors. Currently, video-based applications are effective in recognising and monitoring the activities, the movements, and the overall conditions of the assisted individuals as well as to assess their vital parameters (e.g., heart rate, respiratory rate). Similarly, audio sensors have the potential to become one of the most important modalities for interaction with AAL systems, as they can have a large range of sensing, do not require physical presence at a particular location and are physically intangible. Moreover, relevant information about individuals’ activities and health status can derive from processing audio signals (e.g., speech recordings). Nevertheless, as the other side of the coin, cameras and microphones are often perceived as the most intrusive technologies from the viewpoint of the privacy of the monitored individuals. This is due to the richness of the information these technologies convey and the intimate setting where they may be deployed. Solutions able to ensure privacy preservation by context and by design, as well as to ensure high legal and ethical standards are in high demand. After the review of the current state of play and the discussion in GoodBrother, we may claim that the first solutions in this direction are starting to appear in the literature. A multidisciplinary 4 debate among experts and stakeholders is paving the way towards AAL ensuring ergonomics, usability, acceptance and privacy preservation. The DIANA, PAAL, and VisuAAL projects are examples of this fresh approach. This report provides the reader with a review of the most recent advances in audio- and video-based monitoring technologies for AAL. It has been drafted as a collective effort of WG3 to supply an introduction to AAL, its evolution over time and its main functional and technological underpinnings. In this respect, the report contributes to the field with the outline of a new generation of ethical-aware AAL technologies and a proposal for a novel comprehensive taxonomy of AAL systems and applications. Moreover, the report allows non-technical readers to gather an overview of the main components of an AAL system and how these function and interact with the end-users. The report illustrates the state of the art of the most successful AAL applications and functions based on audio and video data, namely (i) lifelogging and self-monitoring, (ii) remote monitoring of vital signs, (iii) emotional state recognition, (iv) food intake monitoring, activity and behaviour recognition, (v) activity and personal assistance, (vi) gesture recognition, (vii) fall detection and prevention, (viii) mobility assessment and frailty recognition, and (ix) cognitive and motor rehabilitation. For these application scenarios, the report illustrates the state of play in terms of scientific advances, available products and research project. The open challenges are also highlighted. The report ends with an overview of the challenges, the hindrances and the opportunities posed by the uptake in real world settings of AAL technologies. In this respect, the report illustrates the current procedural and technological approaches to cope with acceptability, usability and trust in the AAL technology, by surveying strategies and approaches to co-design, to privacy preservation in video and audio data, to transparency and explainability in data processing, and to data transmission and communication. User acceptance and ethical considerations are also debated. Finally, the potentials coming from the silver economy are overviewed.publishedVersio

VID:Open

Underwater noise due to precipitation

Author: Crum Lawrence A.
Jensen Leif Bjørnø
Prosperetti Andrea
Pumphrey Hugh C.
Publication venue: 'Acoustical Society of America (ASA)'
Publication date: 01/01/1989
Field of study

Crossref

Online Research Database In Technology

The phonetics of speech breathing : pauses, physiology, acoustics, and perception

Author: Werner Raphael Johannes
Publication venue: Saarländische Universitäts- und Landesbibliothek
Publication date: 01/01/2023
Field of study

Speech is made up of a continuous stream of speech sounds that is interrupted by pauses and breathing. As phoneticians are primarily interested in describing the segments of the speech stream, pauses and breathing are often neglected in phonetic studies, even though they are vital for speech. The present work adds to a more detailed view of both pausing and speech breathing with a special focus on the latter and the resulting breath noises, investigating their acoustic, physiological, and perceptual aspects. We present an overview of how a selection of corpora annotate pauses and pause-internal particles, as well as a recording setup that can be used for further studies on speech breathing. For pauses, this work emphasized their optionality and variability under different tempos, as well as the temporal composition of silence and breath noise in breath pauses. For breath noises, we first focused on acoustic and physiological characteristics: We explored alignment between the onsets and offsets of audible breath noises with the start and end of expansion of both rib cage and abdomen. Further, we found similarities between speech breath noises and aspiration phases of /k/, as well as that breath noises may be produced with a more open and slightly more front place of articulation than realizations of schwa. We found positive correlations between acoustic and physiological parameters, suggesting that when speakers inhale faster, the resulting breath noises were more intense and produced more anterior in the mouth. Inspecting the entire spectrum of speech breath noises, we showed relatively flat spectra and several weak peaks. These peaks largely overlapped with resonances reported for inhalations produced with a central vocal tract configuration. We used 3D-printed vocal tract models representing four vowels and four fricatives to simulate in- and exhalations by reversing airflow direction. We found the direction to not have a general effect for all models, but only for those with high-tongue configurations, as opposed to those that were more open. Then, we compared inhalations produced with the schwa-model to human inhalations in an attempt to approach the vocal tract configuration in speech breathing. There were some similarities, however, several complexities of human speech breathing not captured in the models complicated comparisons. In two perception studies, we investigated how much information listeners could auditorily extract from breath noises. First, we tested categorizing different breath noises into six different types, based on airflow direction and airway usage, e.g. oral inhalation. Around two thirds of all answers were correct. Second, we investigated how well breath noises could be used to discriminate between speakers and to extract coarse information on speaker characteristics, such as age (old/young) and sex (female/male). We found that listeners were able to distinguish between two breath noises coming from the same or different speakers in around two thirds of all cases. Hearing one breath noise, classification of sex was successful in around 64%, while for age it was 50%, suggesting that sex was more perceivable than age in breath noises.Deutsche Forschungsgemeinschaft (DFG) – Projektnummer 418659027: "Pause-internal phonetic particles in speech communication

Acronym