Search CORE

7,479 research outputs found

Visual-only recognition of normal, whispered and silent speech

Author: Cetin D
Pantic M
Petridis S
Shen J
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 18/02/2018
Field of study

Silent speech interfaces have been recently proposed as a way to enable communication when the acoustic signal is not available. This introduces the need to build visual speech recognition systems for silent and whispered speech. However, almost all the recently proposed systems have been trained on vocalised data only. This is in contrast with evidence in the literature which suggests that lip movements change depending on the speech mode. In this work, we introduce a new audiovisual database which is publicly available and contains normal, whispered and silent speech. To the best of our knowledge, this is the first study which investigates the differences between the three speech modes using the visual modality only. We show that an absolute decrease in classification rate of up to 3.7% is observed when training and testing on normal and whispered, respectively, and vice versa. An even higher decrease of up to 8.5% is reported when the models are tested on silent speech. This reveals that there are indeed visual differences between the 3 speech modes and the common assumption that vocalized training data can be used directly to train a silent speech recognition system may not be true

arXiv.org e-Print Archive

Crossref

Spiral - Imperial College Digital Repository

Unspoken Speech - Speech Recognition based on Electroencephalography

Author: Wester Marek
Publication venue
Publication date: 04/08/2008
Field of study

KITopen

EMG-to-Speech: Direct Generation of Speech from Facial Electromyographic Signals

Author: Janke Matthias
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2016
Field of study

The general objective of this work is the design, implementation, improvement and evaluation of a system that uses surface electromyographic (EMG) signals and directly synthesizes an audible speech output: EMG-to-speech

KITopen

A silent speech system based on permanent magnet articulography and direct synthesis

Author: Bai Jie
Cheah Lam A.
Ell Stephen R.
Gilbert James M.
Gonzalez Jose A.
Green Phil D.
Moore Roger K.
Publication venue: 'Elsevier BV'
Publication date: 14/03/2016
Field of study

In this paper we present a silent speech interface (SSI) system aimed at restoring speech communication for individuals who have lost their voice due to laryngectomy or diseases affecting the vocal folds. In the proposed system, articulatory data captured from the lips and tongue using permanent magnet articulography (PMA) are converted into audible speech using a speaker-dependent transformation learned from simultaneous recordings of PMA and audio signals acquired before laryngectomy. The transformation is represented using a mixture of factor analysers, which is a generative model that allows us to efficiently model non-linear behaviour and perform dimensionality reduction at the same time. The learned transformation is then deployed during normal usage of the SSI to restore the acoustic speech signal associated with the captured PMA data. The proposed system is evaluated using objective quality measures and listening tests on two databases containing PMA and audio recordings for normal speakers. Results show that it is possible to reconstruct speech from articulator movements captured by an unobtrusive technique without an intermediate recognition step. The SSI is capable of producing speech of sufficient intelligibility and naturalness that the speaker is clearly identifiable, but problems remain in scaling up the process to function consistently for phonetically rich vocabularies

Repository@Hull - Worktribe

Silent versus modal multi-speaker speech recognition from ultrasound and video

Author: Eshky Aciel
Renals Steve
Ribeiro Manuel Sam
Richmond Korin
Publication venue
Publication date: 27/02/2021
Field of study

We investigate multi-speaker speech recognition from ultrasound images of the tongue and video images of the lips. We train our systems on imaging data from modal speech, and evaluate on matched test sets of two speaking modes: silent and modal speech. We observe that silent speech recognition from imaging data underperforms compared to modal speech recognition, likely due to a speaking-mode mismatch between training and testing. We improve silent speech recognition performance using techniques that address the domain mismatch, such as fMLLR and unsupervised model adaptation. We also analyse the properties of silent and modal speech in terms of utterance duration and the size of the articulatory space. To estimate the articulatory space, we compute the convex hull of tongue splines, extracted from ultrasound tongue images. Overall, we observe that the duration of silent speech is longer than that of modal speech, and that silent speech covers a smaller articulatory space than modal speech. Although these two properties are statistically significant across speaking modes, they do not directly correlate with word error rates from speech recognition.Comment: 5 pages, 5 figures, Submitted to Interspeech 202

arXiv.org e-Print Archive

Edinburgh Research Explorer

Impact of Different Speaking Modes on EMG-based Speech Recognition

Author: Jou Szu-Chen
Schultz Tanja
Toth Arthur
Wand Michael
Publication venue: ISCA
Publication date: 01/01/2009
Field of study

KITopen

Advancing Electromyographic Continuous Speech Recognition: Signal Preprocessing and Modeling

Author: Wand Michael
Publication venue: KIT Scientific Publishing, Karlsruhe
Publication date: 01/01/2014
Field of study

Speech is the natural medium of human communication, but audible speech can be overheard by bystanders and excludes speech-disabled people. This work presents a speech recognizer based on surface electromyography, where electric potentials of the facial muscles are captured by surface electrodes, allowing speech to be processed nonacoustically. A system which was state-of-the-art at the beginning of this book is substantially improved in terms of accuracy, flexibility, and robustness

KITopen

Directory of Open Access Books (DOAB)

Advancing Electromyographic Continuous Speech Recognition: Signal Preprocessing and Modeling

Author: Wand Michael
Publication venue: KIT Scientific Publishing
Publication date: 30/07/2019
Field of study

Directory of Open Access Books (DOAB)

Improving the Gap in Visual Speech Recognition Between Normal and Silent Speech Based on Metric Learning

Author: Feng Qi
Kashiwagi Sara
Morishima Shigeo
Tanaka Keitaro
Publication venue
Publication date: 16/10/2023
Field of study

This paper presents a novel metric learning approach to address the performance gap between normal and silent speech in visual speech recognition (VSR). The difference in lip movements between the two poses a challenge for existing VSR models, which exhibit degraded accuracy when applied to silent speech. To solve this issue and tackle the scarcity of training data for silent speech, we propose to leverage the shared literal content between normal and silent speech and present a metric learning approach based on visemes. Specifically, we aim to map the input of two speech types close to each other in a latent space if they have similar viseme representations. By minimizing the Kullback-Leibler divergence of the predicted viseme probability distributions between and within the two speech types, our model effectively learns and predicts viseme identities. Our evaluation demonstrates that our method improves the accuracy of silent VSR, even when limited training data is available.Comment: Accepted by INTERSPEECH 202

arXiv.org e-Print Archive