Search CORE

76 research outputs found

Brain-Computer Interface and Silent Speech Recognition on Decentralized Messaging Applications

Author: Lourenço Fábio André Vilar
Publication venue
Publication date: 01/01/2020
Field of study

Online communications have been increasingly gaining prevalence in people’s daily lives, with its widespread adoption being catalyzed by technological advances, especially in instant messaging platforms. Although there have been strides for the inclusion of disabled individuals to ease communication between peers, people who suffer hand/arm impairments have little to no support in regular mainstream applications to efficiently communicate with other individuals. Moreover, a problem with the current solutions that fall back on speech-to-text techniques is the lack of privacy when the usage of these alternatives is conducted in public. Additionally, as centralized systems have come into scrutiny regarding privacy and security, the development of alternative decentralized solutions has increased by the use of blockchain technology and its variants. Within the inclusivity paradigm, this project showcases an alternative on human-computer interaction with support for the aforementioned disabled people, through the use of a braincomputer interface allied to a silent speech recognition system, for application navigation and text input purposes, respectively. A brain-computer interface allows a user to interact with the platform just by though, while the silent speech recognition system enables the input of text by reading activity from articulatory muscles without the need of actually speaking audibly. Therefore, the combination of both techniques creates a full hands-free interaction with the platform, empowering hand/arm disabled users in daily life communications. Furthermore, the users of the application will be inserted in a decentralized system that is designed for secure communication and exchange of data between peers, enforcing the privacy concern that is a cornerstone of the platform.Comunicações online têm cada vez mais ganhado prevalência na vida contemporânea de pessoas, tendo a sua adoção sido catalisada pelos avanços tecnológicos, especialmente em plataformas de mensagens instantâneas. Embora tenham havido desenvolvimentos relativamente à inclusão de indivíduos com deficiência para facilitar a comunicação entre pessoas, as que sofrem de incapacidades motoras nas mãos/braços têm um suporte escasso em aplicações convencionais para comunicar de forma eficiente com outros sujeitos. Além disso, um problema com as soluções atuais que recorrem a técnicas de voz-para-texto é a falta de privacidade nas comunicações quando usadas em público. Adicionalmente, há medida que sistemas centralizados têm atraído ceticismo relativamente à privacidade e segurança, o desenvolvimento de soluções descentralizadas e alternativas têm aumentado pelo uso de tecnologias de blockchain e as suas variantes. Dentro do paradigma de inclusão, este projeto demonstras uma alternativa na interação humano-computador com suporte para os indivíduos referidos anteriormente, através do uso de uma interface cérebro-computador aliada a um sistema de reconhecimento de fala silenciosa, para navegação na aplicação e introdução de texto, respetivamente. Uma interface cérebro-computador permite o utilizador interagir com a plataforma apenas através do pensamento, enquanto que um sistema de reconhecimento de fala silenciosa possibilita a introdução de texto pela leitura da atividade dos músculos articulatórios, sem a necessidade de falar em voz alta. Assim, a combinação de ambas as técnicas criam uma interação totalmente de mãos-livres com a plataforma, melhorando as comunicações do dia-a-dia de pessoas com incapacidades nas mãos/braços. Além disso, os utilizadores serão inseridos num sistema descentralizado, desenhado para comunicações e trocas de dados seguras entre pares, reforçando, assim, a preocupação com a privacidade, que é um conceito base da plataforma

Repositório Científico do Instituto Politécnico do Porto

EMG-to-Speech: Direct Generation of Speech from Facial Electromyographic Signals

Author: Janke Matthias
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2016
Field of study

The general objective of this work is the design, implementation, improvement and evaluation of a system that uses surface electromyographic (EMG) signals and directly synthesizes an audible speech output: EMG-to-speech

KITopen

Frame-Based Phone Classification Using EMG Signals

Author: De Zuazo Oteiza Xabier
Del Blanco Sierra Eder
Hernáez Rioja Inmaculada
Navas Cordón Eva
Salomons Inge
Publication venue: MDPI
Publication date: 13/07/2023
Field of study

This paper evaluates the impact of inter-speaker and inter-session variability on the development of a silent speech interface (SSI) based on electromyographic (EMG) signals from the facial muscles. The final goal of the SSI is to provide a communication tool for Spanish-speaking laryngectomees by generating audible speech from voiceless articulation. However, before moving on to such a complex task, a simpler phone classification task in different modalities regarding speaker and session dependency is performed for this study. These experiments consist of processing the recorded utterances into phone-labeled segments and predicting the phonetic labels using only features obtained from the EMG signals. We evaluate and compare the performance of each model considering the classification accuracy. Results show that the models are able to predict the phonetic label best when they are trained and tested using data from the same session. The accuracy drops drastically when the model is tested with data from a different session, although it improves when more data are added to the training data. Similarly, when the same model is tested on a session from a different speaker, the accuracy decreases. This suggests that using larger amounts of data could help to reduce the impact of inter-session variability, but more research is required to understand if this approach would suffice to account for inter-speaker variability as well.This research was funded by Agencia Estatal de Investigación grant number ref.PID2019-108040RB-C21/AEI/10.13039/50110001103

Archivo Digital para la Docencia y la Investigación

Applying DNN Adaptation to Reduce the Session Dependency of Ultrasound Tongue Imaging-based Silent Speech Interfaces

Author: Csapó Tamás Gábor
Gosztolya Gábor
Grósz Tamás
Markó Alexandra
Tóth László
Publication venue: 'Obuda University'
Publication date: 01/01/2020
Field of study

SZTE Publicatio Repozitórium - SZTE - Repository of Publications

Deep learning for healthcare applications based on physiological signals: A review

Author: Acharya
Acharya
Acharya
Acharya
Acharya
Acharya
Acharya
Acharya
Acharya
Acharya
Acharya
Ahmed
Allard
An
Anton
Aschenbrenner-Scheibe
Atzori
Atzori
Atzori
Barea
Basmajian
Bengio
Bengio
Bergstra
Blankertz
Brunner
Bulling
Bunce
Cecotti
Chen
Chen
Cheng
De Chazal
De Luca
De Luca
Dean
Devasahayam
Du
Erhan
Faust
Faust
Faust
Faust
Faust
Faust
Faust
Faust
Faust
Faust
Faust
Fraiwan
Fukushima
Geng
Goldberger
Goodfellow
Greenwald
Guger
Gödel
Hajinoroozi
Hajinoroozi
He
Hinton
Hinton
Hochreiter
Hopfield
Hosseini
Huang
Huve
Jeffries
Jenny
Jia
Jingwei
Jirayucharoensak
Jung
Kalayci
Kantz
Kemp
Kherlopian
Kingma
Kiral-Kornek
Kiranyaz
Koelstra
Kohavi
Krupinski
Kutner
Larochelle
LeCun
LeCun
LeCun
Lee
Leeb
Li
Liu
Liu
Lu
Luo
Längkvist
Längkvist
Maiwald
Majumdar
Miller
Min
Miotto
Mirowski
Moody
Moody
Moretti
Morrell
Muduli
Najafabadi
Nolle
Nurse
Oh Shu Lih
Oliver Faust
Park
Penzel
Piroska
Piryatinska
Pourbabaee
Rao
Ren
Sajda
Salakhutdinov
Schaffer
Schelter
Schelter
Schirrmeister
Schirrmeister
Schlögl
Shashikumar
Smolensky
Spampinato
Squire
Stead
Sörnmo
Taji
Tan
Tan Jen Hong
Tokui
U Rajendra Acharya
Van Drongelen
van Eck
van Putten
Vertinsky
Waller
Waltman
Wand
Wand
Wand
Wang
Winterhalder
Wolpaw
Xia
Xia
Xing
Yoon
Yuki Hagiwara
Zadrozny
Zhai
Zhang
Zheng
Zheng
Zheng
Zhi
Zhu
Publication venue: 'Elsevier BV'
Publication date: 01/07/2018
Field of study

Background and objective: We have cast the net into the ocean of knowledge to retrieve the latest scientific research on deep learning methods for physiological signals. We found 53 research papers on this topic, published from 01.01.2008 to 31.12.2017. Methods: An initial bibliometric analysis shows that the reviewed papers focused on Electromyogram(EMG), Electroencephalogram(EEG), Electrocardiogram(ECG), and Electrooculogram(EOG). These four categories were used to structure the subsequent content review. Results: During the content review, we understood that deep learning performs better for big and varied datasets than classic analysis and machine classification methods. Deep learning algorithms try to develop the model by using all the available input. Conclusions: This review paper depicts the application of various deep learning algorithms used till recently, but in future it will be used for more healthcare areas to improve the quality of diagnosi

Crossref

Sheffield Hallam University Research Archive

Deep Learning for Processing Electromyographic Signals: a Taxonomy-based Survey

Author: Antonio Brunetti
Domenico Buongiorno
Giacomo Donato Cascarano
Giovanni Dimauro
Irio De Feudis
Leonarda Carnimeo
Vitoantonio Bevilacqua
Publication venue: 'Elsevier BV'
Publication date: 01/01/2021
Field of study

Deep Learning (DL) has been recently employed to build smart systems that perform incredibly well in a wide range of tasks, such as image recognition, machine translation, and self-driving cars. In several fields the considerable improvement in the computing hardware and the increasing need for big data analytics has boosted DL work. In recent years physiological signal processing has strongly benefited from deep learning. In general, there is an exponential increase in the number of studies concerning the processing of electromyographic (EMG) signals using DL methods. This phenomenon is mostly explained by the current limitation of myoelectric controlled prostheses as well as the recent release of large EMG recording datasets, e.g. Ninapro. Such a growing trend has inspired us to seek and review recent papers focusing on processing EMG signals using DL methods. Referring to the Scopus database, a systematic literature search of papers published between January 2014 and March 2019 was carried out, and sixty-five papers were chosen for review after a full text analysis. The bibliometric research revealed that the reviewed papers can be grouped in four main categories according to the final application of the EMG signal analysis: Hand Gesture Classification, Speech and Emotion Classification, Sleep Stage Classification and Other Applications. The review process also confirmed the increasing trend in terms of published papers, the number of papers published in 2018 is indeed four times the amount of papers published the year before. As expected, most of the analyzed papers (≈60 %) concern the identification of hand gestures, thus supporting our hypothesis. Finally, it is worth reporting that the convolutional neural network (CNN) is the most used topology among the several involved DL architectures, in fact, the sixty percent approximately of the reviewed articles consider a CNN

Archivio istituzionale della ricerca - Università di Bari

Adaptation of Tacotron2-based Text-To-Speech for Articulatory-to-Acoustic Mapping using Ultrasound Tongue Imaging

Author: Csapó Tamás Gábor
Gosztolya Gábor
Honarmandi Shandiz Amin
Markó Alexandra
Németh Géza
Tóth László
Zainkó Csaba
Publication venue: 'International Speech Communication Association'
Publication date: 01/01/2021
Field of study

SZTE Publicatio Repozitórium - SZTE - Repository of Publications

Silent Speech Interfaces for Speech Restoration: A Review

Author: González López José Andrés
Gómez Alanís Alejandro
Gómez Ángel M.
Martín Doñas Juan M.
Pérez Córdoba José Luis
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 24/09/2020
Field of study

This work was supported in part by the Agencia Estatal de Investigacion (AEI) under Grant PID2019-108040RB-C22/AEI/10.13039/501100011033. The work of Jose A. Gonzalez-Lopez was supported in part by the Spanish Ministry of Science, Innovation and Universities under Juan de la Cierva-Incorporation Fellowship (IJCI-2017-32926).This review summarises the status of silent speech interface (SSI) research. SSIs rely on non-acoustic biosignals generated by the human body during speech production to enable communication whenever normal verbal communication is not possible or not desirable. In this review, we focus on the first case and present latest SSI research aimed at providing new alternative and augmentative communication methods for persons with severe speech disorders. SSIs can employ a variety of biosignals to enable silent communication, such as electrophysiological recordings of neural activity, electromyographic (EMG) recordings of vocal tract movements or the direct tracking of articulator movements using imaging techniques. Depending on the disorder, some sensing techniques may be better suited than others to capture speech-related information. For instance, EMG and imaging techniques are well suited for laryngectomised patients, whose vocal tract remains almost intact but are unable to speak after the removal of the vocal folds, but fail for severely paralysed individuals. From the biosignals, SSIs decode the intended message, using automatic speech recognition or speech synthesis algorithms. Despite considerable advances in recent years, most present-day SSIs have only been validated in laboratory settings for healthy users. Thus, as discussed in this paper, a number of challenges remain to be addressed in future research before SSIs can be promoted to real-world applications. If these issues can be addressed successfully, future SSIs will improve the lives of persons with severe speech impairments by restoring their communication capabilities.Agencia Estatal de Investigacion (AEI) PID2019-108040RB-C22/AEI/10.13039/501100011033Spanish Ministry of Science, Innovation and Universities under Juan de la Cierva-Incorporation Fellowship IJCI-2017-3292

arXiv.org e-Print Archive

Repositorio Institucional Universidad de Granada

A Silent-Speech Interface using Electro-Optical Stomatography

Author: Stone Simon
Publication venue: Thelem Universitätsverlag & Buchhandlung GmbH & Co. KG
Publication date: 21/06/2022
Field of study

Sprachtechnologie ist eine große und wachsende Industrie, die das Leben von technologieinteressierten Nutzern auf zahlreichen Wegen bereichert. Viele potenzielle Nutzer werden jedoch ausgeschlossen: Nämlich alle Sprecher, die nur schwer oder sogar gar nicht Sprache produzieren können. Silent-Speech Interfaces bieten einen Weg, mit Maschinen durch ein bequemes sprachgesteuertes Interface zu kommunizieren ohne dafür akustische Sprache zu benötigen. Sie können außerdem prinzipiell eine Ersatzstimme stellen, indem sie die intendierten Äußerungen, die der Nutzer nur still artikuliert, künstlich synthetisieren. Diese Dissertation stellt ein neues Silent-Speech Interface vor, das auf einem neu entwickelten Messsystem namens Elektro-Optischer Stomatografie und einem neuartigen parametrischen Vokaltraktmodell basiert, das die Echtzeitsynthese von Sprache basierend auf den gemessenen Daten ermöglicht. Mit der Hardware wurden Studien zur Einzelworterkennung durchgeführt, die den Stand der Technik in der intra- und inter-individuellen Genauigkeit erreichten und übertrafen. Darüber hinaus wurde eine Studie abgeschlossen, in der die Hardware zur Steuerung des Vokaltraktmodells in einer direkten Artikulation-zu-Sprache-Synthese verwendet wurde. Während die Verständlichkeit der Synthese von Vokalen sehr hoch eingeschätzt wurde, ist die Verständlichkeit von Konsonanten und kontinuierlicher Sprache sehr schlecht. Vielversprechende Möglichkeiten zur Verbesserung des Systems werden im Ausblick diskutiert.:Statement of authorship iii Abstract v List of Figures vii List of Tables xi Acronyms xiii 1. Introduction 1 1.1. The concept of a Silent-Speech Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2. Structure of this work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Fundamentals of phonetics 7 2.1. Components of the human speech production system . . . . . . . . . . . . . . . . . . . 7 2.2. Vowel sounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3. Consonantal sounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.4. Acoustic properties of speech sounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.5. Coarticulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.6. Phonotactics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.7. Summary and implications for the design of a Silent-Speech Interface (SSI) . . . . . . . 21 3. Articulatory data acquisition techniques in Silent-Speech Interfaces 25 3.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.2. Scope of the literature review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.3. Video Recordings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.4. Ultrasonography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.5. Electromyography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.6. Permanent-Magnetic Articulography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.7. Electromagnetic Articulography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.8. Radio waves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.9. Palatography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.10.Conclusion and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4. Electro-Optical Stomatography 55 4.1. Contact sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.2. Optical distance sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.3. Lip sensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.4. Sensor Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.5. Control Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 4.6. Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 5. Articulation-to-Text 99 5.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 5.2. Command word recognition pilot study . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 5.3. Command word recognition small-scale study . . . . . . . . . . . . . . . . . . . . . . . . 102 6. Articulation-to-Speech 109 6.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 6.2. Articulatory synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 6.3. The six point vocal tract model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 6.4. Objective evaluation of the vocal tract model . . . . . . . . . . . . . . . . . . . . . . . . 116 6.5. Perceptual evaluation of the vocal tract model . . . . . . . . . . . . . . . . . . . . . . . . 120 6.6. Direct synthesis using EOS to control the vocal tract model . . . . . . . . . . . . . . . . 125 6.7. Pitch and voicing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 7. Summary and outlook 145 7.1. Summary of the contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 7.2. Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 A. Overview of the International Phonetic Alphabet 151 B. Mathematical proofs and derivations 153 B.1. Combinatoric calculations illustrating the reduction of possible syllables using phonotactics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 B.2. Signal Averaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 B.3. Effect of the contact sensor area on the conductance . . . . . . . . . . . . . . . . . . . . 155 B.4. Calculation of the forward current for the OP280V diode . . . . . . . . . . . . . . . . . . 155 C. Schematics and layouts 157 C.1. Schematics of the control unit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 C.2. Layout of the control unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 C.3. Bill of materials of the control unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 C.4. Schematics of the sensor unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 C.5. Layout of the sensor unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 C.6. Bill of materials of the sensor unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 D. Sensor unit assembly 169 E. Firmware flow and data protocol 177 F. Palate file format 181 G. Supplemental material regarding the vocal tract model 183 H. Articulation-to-Speech: Optimal hyperparameters 189 Bibliography 191Speech technology is a major and growing industry that enriches the lives of technologically-minded people in a number of ways. Many potential users are, however, excluded: Namely, all speakers who cannot easily or even at all produce speech. Silent-Speech Interfaces offer a way to communicate with a machine by a convenient speech recognition interface without the need for acoustic speech. They also can potentially provide a full replacement voice by synthesizing the intended utterances that are only silently articulated by the user. To that end, the speech movements need to be captured and mapped to either text or acoustic speech. This dissertation proposes a new Silent-Speech Interface based on a newly developed measurement technology called Electro-Optical Stomatography and a novel parametric vocal tract model to facilitate real-time speech synthesis based on the measured data. The hardware was used to conduct command word recognition studies reaching state-of-the-art intra- and inter-individual performance. Furthermore, a study on using the hardware to control the vocal tract model in a direct articulation-to-speech synthesis loop was also completed. While the intelligibility of synthesized vowels was high, the intelligibility of consonants and connected speech was quite poor. Promising ways to improve the system are discussed in the outlook.:Statement of authorship iii Abstract v List of Figures vii List of Tables xi Acronyms xiii 1. Introduction 1 1.1. The concept of a Silent-Speech Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2. Structure of this work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Fundamentals of phonetics 7 2.1. Components of the human speech production system . . . . . . . . . . . . . . . . . . . 7 2.2. Vowel sounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3. Consonantal sounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.4. Acoustic properties of speech sounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.5. Coarticulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.6. Phonotactics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.7. Summary and implications for the design of a Silent-Speech Interface (SSI) . . . . . . . 21 3. Articulatory data acquisition techniques in Silent-Speech Interfaces 25 3.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.2. Scope of the literature review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.3. Video Recordings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.4. Ultrasonography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.5. Electromyography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.6. Permanent-Magnetic Articulography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.7. Electromagnetic Articulography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.8. Radio waves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.9. Palatography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.10.Conclusion and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4. Electro-Optical Stomatography 55 4.1. Contact sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.2. Optical distance sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.3. Lip sensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.4. Sensor Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.5. Control Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 4.6. Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 5. Articulation-to-Text 99 5.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 5.2. Command word recognition pilot study . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 5.3. Command word recognition small-scale study . . . . . . . . . . . . . . . . . . . . . . . . 102 6. Articulation-to-Speech 109 6.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 6.2. Articulatory synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 6.3. The six point vocal tract model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 6.4. Objective evaluation of the vocal tract model . . . . . . . . . . . . . . . . . . . . . . . . 116 6.5. Perceptual evaluation of the vocal tract model . . . . . . . . . . . . . . . . . . . . . . . . 120 6.6. Direct synthesis using EOS to control the vocal tract model . . . . . . . . . . . . . . . . 125 6.7. Pitch and voicing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 7. Summary and outlook 145 7.1. Summary of the contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 7.2. Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 A. Overview of the International Phonetic Alphabet 151 B. Mathematical proofs and derivations 153 B.1. Combinatoric calculations illustrating the reduction of possible syllables using phonotactics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 B.2. Signal Averaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 B.3. Effect of the contact sensor area on the conductance . . . . . . . . . . . . . . . . . . . . 155 B.4. Calculation of the forward current for the OP280V diode . . . . . . . . . . . . . . . . . . 155 C. Schematics and layouts 157 C.1. Schematics of the control unit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 C.2. Layout of the control unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 C.3. Bill of materials of the control unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 C.4. Schematics of the sensor unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 C.5. Layout of the sensor unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 C.6. Bill of materials of the sensor unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 D. Sensor unit assembly 169 E. Firmware flow and data protocol 177 F. Palate file format 181 G. Supplemental material regarding the vocal tract model 183 H. Articulation-to-Speech: Optimal hyperparameters 189 Bibliography 19

Technische Universität Dresden: Qucosa