Search CORE

249 research outputs found

Silent Speech Interfaces for Speech Restoration: A Review

Author: González López José Andrés
Gómez Alanís Alejandro
Gómez Ángel M.
Martín Doñas Juan M.
Pérez Córdoba José Luis
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 24/09/2020
Field of study

This work was supported in part by the Agencia Estatal de Investigacion (AEI) under Grant PID2019-108040RB-C22/AEI/10.13039/501100011033. The work of Jose A. Gonzalez-Lopez was supported in part by the Spanish Ministry of Science, Innovation and Universities under Juan de la Cierva-Incorporation Fellowship (IJCI-2017-32926).This review summarises the status of silent speech interface (SSI) research. SSIs rely on non-acoustic biosignals generated by the human body during speech production to enable communication whenever normal verbal communication is not possible or not desirable. In this review, we focus on the first case and present latest SSI research aimed at providing new alternative and augmentative communication methods for persons with severe speech disorders. SSIs can employ a variety of biosignals to enable silent communication, such as electrophysiological recordings of neural activity, electromyographic (EMG) recordings of vocal tract movements or the direct tracking of articulator movements using imaging techniques. Depending on the disorder, some sensing techniques may be better suited than others to capture speech-related information. For instance, EMG and imaging techniques are well suited for laryngectomised patients, whose vocal tract remains almost intact but are unable to speak after the removal of the vocal folds, but fail for severely paralysed individuals. From the biosignals, SSIs decode the intended message, using automatic speech recognition or speech synthesis algorithms. Despite considerable advances in recent years, most present-day SSIs have only been validated in laboratory settings for healthy users. Thus, as discussed in this paper, a number of challenges remain to be addressed in future research before SSIs can be promoted to real-world applications. If these issues can be addressed successfully, future SSIs will improve the lives of persons with severe speech impairments by restoring their communication capabilities.Agencia Estatal de Investigacion (AEI) PID2019-108040RB-C22/AEI/10.13039/501100011033Spanish Ministry of Science, Innovation and Universities under Juan de la Cierva-Incorporation Fellowship IJCI-2017-3292

arXiv.org e-Print Archive

Repositorio Institucional Universidad de Granada

Neural Speaker Embeddings for Ultrasound-based Silent Speech Interfaces

Author: Csapó Tamás Gábor
Gosztolya Gábor
Markó Alexandra
Shandiz Amin Honarmandi
Tóth László
Publication venue
Publication date: 01/01/2021
Field of study

Articulatory-to-acoustic mapping seeks to reconstruct speech from a recording of the articulatory movements, for example, an ultrasound video. Just like speech signals, these recordings represent not only the linguistic content, but are also highly specific to the actual speaker. Hence, due to the lack of multi-speaker data sets, researchers have so far concentrated on speaker-dependent modeling. Here, we present multi-speaker experiments using the recently published TaL80 corpus. To model speaker characteristics, we adjusted the x-vector framework popular in speech processing to operate with ultrasound tongue videos. Next, we performed speaker recognition experiments using 50 speakers from the corpus. Then, we created speaker embedding vectors and evaluated them on the remaining speakers. Finally, we examined how the embedding vector influences the accuracy of our ultrasound-to-speech conversion network in a multi-speaker scenario. In the experiments we attained speaker recognition error rates below 3%, and we also found that the embedding vectors generalize nicely to unseen speakers. Our first attempt to apply them in a multi-speaker silent speech framework brought about a marginal reduction in the error rate of the spectral estimation step.Comment: 5 pages, 3 figures, 3 table

arXiv.org e-Print Archive

Repository of the Academy's Library

Neural Speaker Embeddings for Ultrasound-Based Silent Speech Interfaces

Author: Csapó Tamás Gábor
Gosztolya Gábor
Honarmandi Shandiz Amin
Markó Alexandra
Tóth László
Publication venue: 'International Speech Communication Association'
Publication date: 01/01/2021
Field of study

SZTE Publicatio Repozitórium - SZTE - Repository of Publications

Improvements of Silent Speech Interface Algorithms

Author: Honarmandi Shandiz Amin
Publication venue
Publication date
Field of study

SZTE Doktori Értekezések Repozitórium (SZTE Repository of Dissertations)

Adaptation of Tacotron2-based Text-To-Speech for Articulatory-to-Acoustic Mapping using Ultrasound Tongue Imaging

Author: Csapó Tamás Gábor
Gosztolya Gábor
Honarmandi Shandiz Amin
Markó Alexandra
Németh Géza
Tóth László
Zainkó Csaba
Publication venue: 'International Speech Communication Association'
Publication date: 01/01/2021
Field of study

SZTE Publicatio Repozitórium - SZTE - Repository of Publications

EMG-to-Speech: Direct Generation of Speech from Facial Electromyographic Signals

Author: Janke Matthias
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2016
Field of study

The general objective of this work is the design, implementation, improvement and evaluation of a system that uses surface electromyographic (EMG) signals and directly synthesizes an audible speech output: EMG-to-speech

KITopen

Neural Speaker Embeddings for Ultrasound-Based Silent Speech Interfaces

Author: Csapó Tamás Gábor
Gosztolya Gábor
Honarmandi Shandiz Amin
Markó Alexandra
Tóth László
Publication venue: 'International Speech Communication Association'
Publication date: 01/01/2021
Field of study

Repository of the Academy's Library