16 research outputs found

    Restoring Speech Following Total Removal of the Larynx

    Get PDF
    By speech articulator movement and training a transformation to audio we can restore the power of speech to someone who has lost their larynx. We sense changes in magnetic field caused by movements of small magnets attached to the lips and tongue. The sensor transformation uses recurrent neural networks

    Frame-Based Phone Classification Using EMG Signals

    Get PDF
    This paper evaluates the impact of inter-speaker and inter-session variability on the development of a silent speech interface (SSI) based on electromyographic (EMG) signals from the facial muscles. The final goal of the SSI is to provide a communication tool for Spanish-speaking laryngectomees by generating audible speech from voiceless articulation. However, before moving on to such a complex task, a simpler phone classification task in different modalities regarding speaker and session dependency is performed for this study. These experiments consist of processing the recorded utterances into phone-labeled segments and predicting the phonetic labels using only features obtained from the EMG signals. We evaluate and compare the performance of each model considering the classification accuracy. Results show that the models are able to predict the phonetic label best when they are trained and tested using data from the same session. The accuracy drops drastically when the model is tested with data from a different session, although it improves when more data are added to the training data. Similarly, when the same model is tested on a session from a different speaker, the accuracy decreases. This suggests that using larger amounts of data could help to reduce the impact of inter-session variability, but more research is required to understand if this approach would suffice to account for inter-speaker variability as well.This research was funded by Agencia Estatal de Investigaci贸n grant number ref.PID2019-108040RB-C21/AEI/10.13039/50110001103

    Non-Parallel Articulatory-to-Acoustic Conversion Using Multiview-based Time Warping

    Get PDF
    This work was supported in part by the Spanish State Research Agency (SRA) grant number PID2019-108040RB-C22/SRA/10.13039/501100011033, and the FEDER/Junta de Andaluc铆aConsejer铆a de Transformaci贸n Econ贸mica, Industria, Conocimiento y Universidades project no. B-SEJ-570-UGR20.In this paper, we propose a novel algorithm called multiview temporal alignment by dependence maximisation in the latent space (TRANSIENCE) for the alignment of time series consisting of sequences of feature vectors with different length and dimensionality of the feature vectors. The proposed algorithm, which is based on the theory of multiview learning, can be seen as an extension of the well-known dynamic time warping (DTW) algorithm but, as mentioned, it allows the sequences to have different dimensionalities. Our algorithm attempts to find an optimal temporal alignment between pairs of nonaligned sequences by first projecting their feature vectors into a common latent space where both views are maximally similar. To do this, powerful, nonlinear deep neural network (DNN) models are employed. Then, the resulting sequences of embedding vectors are aligned using DTW. Finally, the alignment paths obtained in the previous step are applied to the original sequences to align them. In the paper, we explore several variants of the algorithm that mainly differ in the way the DNNs are trained. We evaluated the proposed algorithm on a articulatory-to-acoustic (A2A) synthesis task involving the generation of audible speech from motion data captured from the lips and tongue of healthy speakers using a technique known as permanent magnet articulography (PMA). In this task, our algorithm is applied during the training stage to align pairs of nonaligned speech and PMA recordings that are later used to train DNNs able to synthesis speech from PMA data. Our results show the quality of speech generated in the nonaligned scenario is comparable to that obtained in the parallel scenario.Spanish State Research Agency (SRA) PID2019-108040RB-C22/SRA/10.13039/501100011033FEDER/Junta de Andaluc铆aConsejer铆a de Transformaci贸n Econ贸mica, Industria, Conocimiento y Universidades project no. B-SEJ-570-UGR20

    Predicting 3D lip shapes using facial surface EMG

    Get PDF
    Aim聽The aim of this study is to prove that facial surface electromyography (sEMG) conveys sufficient information to predict 3D lip shapes. High sEMG predictive accuracy implies we could train a neural control model for activation of biomechanical models by simultaneously recording sEMG signals and their associated motions. Materials and methods With a stereo camera set-up, we recorded 3D lip shapes and simultaneously performed sEMG measurements of the facial muscles, applying principal component analysis (PCA) and a modified general regression neural network (GRNN) to link the sEMG measurements to 3D lip shapes. To test reproducibility, we conducted our experiment on five volunteers, evaluating several sEMG features and window lengths in unipolar and bipolar configurations in search of the optimal settings for facial sEMG. Conclusions The errors of the two methods were comparable. We managed to predict 3D lip shapes with a mean accuracy of 2.76 mm when using the PCA method and 2.78 mm when using modified GRNN. Whereas performance improved with shorter window lengths, feature type and configuration had little influence

    Revisi贸n de las Tecnolog铆as y Aplicaciones del Habla Sub-vocal

    Get PDF
    This paper presents a review of the main applicative and methodological approaches that have been developed in recent years for sub-vocal speech or silent language. The sub-vocal speech can be defined as the identification and characterization of bioelectric signals that control the vocal tract, when is not produced sound production by the caller. The first section makes a deep review of methods for detecting silent language. In the second part are evaluated the technologies implemented in recent years, followed by a review of the main applications of this type of speech and finally present a broad comparison between jobs that have been developed in industry and academic applications.Este trabajo presenta una revisi贸n de estado de las principales tem谩ticas aplicativas y metodol贸gicas del habla sub-vocal que se han venido desarrollando en los 煤ltimos a帽os. La primera secci贸n hace una honda revisi贸n de los m茅todos de detecci贸n del lenguaje silencioso. En la segunda parte se eval煤an las tecnolog铆as implementadas en los 煤ltimos a帽os, seguido de un an谩lisis en las principales aplicaciones de este tipo de lenguaje y finalmente presentado una amplia comparaci贸n entre los trabajos que se han hecho en industria y academia utilizando este tipo de desarrollos

    Non-parallel articulatory-to-acoustic conversion using multiview-based time warping

    Get PDF
    In this paper, we propose a novel algorithm called multiview temporal alignment by dependence maximisation in the latent space (TRANSIENCE) for the alignment of time series consisting of sequences of feature vectors with different length and dimensionality of the feature vectors. The proposed algorithm, which is based on the theory of multiview learning, can be seen as an extension of the well-known dynamic time warping (DTW) algorithm but, as mentioned, it allows the sequences to have different dimensionalities. Our algorithm attempts to find an optimal temporal alignment between pairs of nonaligned sequences by first projecting their feature vectors into a common latent space where both views are maximally similar. To do this, powerful, nonlinear deep neural network (DNN) models are employed. Then, the resulting sequences of embedding vectors are aligned using DTW. Finally, the alignment paths obtained in the previous step are applied to the original sequences to align them. In the paper, we explore several variants of the algorithm that mainly differ in the way the DNNs are trained. We evaluated the proposed algorithm on a articulatory-to-acoustic (A2A) synthesis task involving the generation of audible speech from motion data captured from the lips and tongue of healthy speakers using a technique known as permanent magnet articulography (PMA). In this task, our algorithm is applied during the training stage to align pairs of nonaligned speech and PMA recordings that are later used to train DNNs able to synthesis speech from PMA data. Our results show the quality of speech generated in the nonaligned scenario is comparable to that obtained in the parallel scenario

    EMG-to-Speech: Direct Generation of Speech from Facial Electromyographic Signals

    Get PDF
    The general objective of this work is the design, implementation, improvement and evaluation of a system that uses surface electromyographic (EMG) signals and directly synthesizes an audible speech output: EMG-to-speech
    corecore