1,957 research outputs found

    A silent speech system based on permanent magnet articulography and direct synthesis

    Get PDF
    In this paper we present a silent speech interface (SSI) system aimed at restoring speech communication for individuals who have lost their voice due to laryngectomy or diseases affecting the vocal folds. In the proposed system, articulatory data captured from the lips and tongue using permanent magnet articulography (PMA) are converted into audible speech using a speaker-dependent transformation learned from simultaneous recordings of PMA and audio signals acquired before laryngectomy. The transformation is represented using a mixture of factor analysers, which is a generative model that allows us to efficiently model non-linear behaviour and perform dimensionality reduction at the same time. The learned transformation is then deployed during normal usage of the SSI to restore the acoustic speech signal associated with the captured PMA data. The proposed system is evaluated using objective quality measures and listening tests on two databases containing PMA and audio recordings for normal speakers. Results show that it is possible to reconstruct speech from articulator movements captured by an unobtrusive technique without an intermediate recognition step. The SSI is capable of producing speech of sufficient intelligibility and naturalness that the speaker is clearly identifiable, but problems remain in scaling up the process to function consistently for phonetically rich vocabularies

    Silent Speech Interfaces for Speech Restoration: A Review

    Get PDF
    This work was supported in part by the Agencia Estatal de Investigacion (AEI) under Grant PID2019-108040RB-C22/AEI/10.13039/501100011033. The work of Jose A. Gonzalez-Lopez was supported in part by the Spanish Ministry of Science, Innovation and Universities under Juan de la Cierva-Incorporation Fellowship (IJCI-2017-32926).This review summarises the status of silent speech interface (SSI) research. SSIs rely on non-acoustic biosignals generated by the human body during speech production to enable communication whenever normal verbal communication is not possible or not desirable. In this review, we focus on the first case and present latest SSI research aimed at providing new alternative and augmentative communication methods for persons with severe speech disorders. SSIs can employ a variety of biosignals to enable silent communication, such as electrophysiological recordings of neural activity, electromyographic (EMG) recordings of vocal tract movements or the direct tracking of articulator movements using imaging techniques. Depending on the disorder, some sensing techniques may be better suited than others to capture speech-related information. For instance, EMG and imaging techniques are well suited for laryngectomised patients, whose vocal tract remains almost intact but are unable to speak after the removal of the vocal folds, but fail for severely paralysed individuals. From the biosignals, SSIs decode the intended message, using automatic speech recognition or speech synthesis algorithms. Despite considerable advances in recent years, most present-day SSIs have only been validated in laboratory settings for healthy users. Thus, as discussed in this paper, a number of challenges remain to be addressed in future research before SSIs can be promoted to real-world applications. If these issues can be addressed successfully, future SSIs will improve the lives of persons with severe speech impairments by restoring their communication capabilities.Agencia Estatal de Investigacion (AEI) PID2019-108040RB-C22/AEI/10.13039/501100011033Spanish Ministry of Science, Innovation and Universities under Juan de la Cierva-Incorporation Fellowship IJCI-2017-3292

    Direct Speech Reconstruction From Articulatory Sensor Data by Machine Learning

    Get PDF
    This paper describes a technique that generates speech acoustics from articulator movements. Our motivation is to help people who can no longer speak following laryngectomy, a procedure that is carried out tens of thousands of times per year in the Western world. Our method for sensing articulator movement, permanent magnetic articulography, relies on small, unobtrusive magnets attached to the lips and tongue. Changes in magnetic field caused by magnet movements are sensed and form the input to a process that is trained to estimate speech acoustics. In the experiments reported here this “Direct Synthesis” technique is developed for normal speakers, with glued-on magnets, allowing us to train with parallel sensor and acoustic data. We describe three machine learning techniques for this task, based on Gaussian mixture models, deep neural networks, and recurrent neural networks (RNNs). We evaluate our techniques with objective acoustic distortion measures and subjective listening tests over spoken sentences read from novels (the CMU Arctic corpus). Our results show that the best performing technique is a bidirectional RNN (BiRNN), which employs both past and future contexts to predict the acoustics from the sensor data. BiRNNs are not suitable for synthesis in real time but fixed-lag RNNs give similar results and, because they only look a little way into the future, overcome this problem. Listening tests show that the speech produced by this method has a natural quality that preserves the identity of the speaker. Furthermore, we obtain up to 92% intelligibility on the challenging CMU Arctic material. To our knowledge, these are the best results obtained for a silent-speech system without a restricted vocabulary and with an unobtrusive device that delivers audio in close to real time. This work promises to lead to a technology that truly will give people whose larynx has been removed their voices back

    Human activity recognition for an intelligent knee orthosis

    Get PDF
    Dissertação para obtenção do Grau de Mestre em Engenharia BiomédicaActivity recognition with body-worn sensors is a large and growing field of research. In this thesis we evaluate the possibility to recognize human activities based on data from biosignal sensors solely placed on or under an existing passive knee orthosis, which will produce the needed information to integrate sensors into the orthosis in the future. The development of active orthotic knee devices will allow population to ambulate in a more natural, efficient and less painful manner than they might with a traditional orthosis. Thus, the term ’active orthosis’ refers to a device intended to increase the ambulatory ability of a person suffering from a knee pathology by applying forces to correct the position only when necessary and thereby make usable over longer periods of time. The contribution of this work is the evaluation of the ability to recognize activities with these restrictions on sensor placement as well as providing a proof-of-concept for the development of an activity recognition system for an intelligent orthosis. We use accelerometers and a goniometer placed on the orthosis and Electromyography (EMG) sensors placed on the skin under the orthosis to measure motion and muscle activity respectively. We segment signals in motion primitives semi-automatically and apply Hidden-Markov-Models (HMM) to classify the isolated motion primitives. We discriminate between seven activities like for example walking stairs up and ascend a hill. In a user study with six participants, we evaluate the systems performance for each of the different biosignal modalities alone as well as the combinations of them. For the best performing combination, we reach an average person-dependent accuracy of 98% and a person-independent accuracy of 79%

    Development of Speech Command Control Based TinyML System for Post-Stroke Dysarthria Therapy Device

    Get PDF
    Post-stroke dysarthria (PSD) is a widespread outcome of a stroke. To help in the objective evaluation of dysarthria, the development of pathological voice recognition and technology has a lot of attention. Soft robotics therapy devices have been received as an alternative rehabilitation and hand grasp assistance for improving activity daily living (ADL). Despite the significant progress in this field, most soft robotic therapy devices use a complex, bulky, lack of pathological voice recognition model, large computational power, and stationary controller. This study aims to develop a portable wirelessly multi-controller with a simulated dysarthric vowel speech in Bahasa Indonesia and non-dysarthric micro speech recognition, using tiny machine learning (TinyMl) system for hardware efficiency. The speech interface using INMP441, compute with a lightweight Deep Convolutional Neural network (DCNN) design and embedded into ESP-32. Feature model using Short Time Fourier Transform (STFT) and fed into CNN. This method has proven useful in micro-speech recognition with low computational power in both speech scenarios with a level of accuracy above 90%. Realtime inference performance on ESP-32 using hand prosthetics, with 3-level household noise intensity respectively 24db,42db, and 62db, and has respectively resulted from 95%, 85%, and 50% Accuracy. Wireless connectivity success rate with both controllers is around 0.2 - 0.5 ms

    Advancing Electromyographic Continuous Speech Recognition: Signal Preprocessing and Modeling

    Get PDF
    Speech is the natural medium of human communication, but audible speech can be overheard by bystanders and excludes speech-disabled people. This work presents a speech recognizer based on surface electromyography, where electric potentials of the facial muscles are captured by surface electrodes, allowing speech to be processed nonacoustically. A system which was state-of-the-art at the beginning of this book is substantially improved in terms of accuracy, flexibility, and robustness

    Advancing Electromyographic Continuous Speech Recognition: Signal Preprocessing and Modeling

    Get PDF
    Speech is the natural medium of human communication, but audible speech can be overheard by bystanders and excludes speech-disabled people. This work presents a speech recognizer based on surface electromyography, where electric potentials of the facial muscles are captured by surface electrodes, allowing speech to be processed nonacoustically. A system which was state-of-the-art at the beginning of this book is substantially improved in terms of accuracy, flexibility, and robustness

    Multi-view Temporal Alignment for Non-parallel Articulatory-to-Acoustic Speech Synthesis

    Get PDF
    Articulatory-to-acoustic (A2A) synthesis refers to the generation of audible speech from captured movement of the speech articulators. This technique has numerous applications, such as restoring oral communication to people who cannot longer speak due to illness or injury. Most successful techniquesso far adopt a supervised learning framework, in which timesynchronousarticulatory-and-speech recordings are used to train a supervised machine learning algorithm that can be used later to map articulator movements to speech. This, however, prevents the application of A2A techniques in cases where parallel data is unavailable, e.g., a person has already lost her/his voice and only articulatory data can be captured. In this work, we propose a solution to this problem based on the theory of multi-view learning. The proposed algorithm attempts to find an optimal temporal alignment between pairs of nonaligned articulatory-and-acoustic sequences with the same phonetic content by projecting them into a common latent space where both views are maximally correlated and then applying dynamic time warping. Several variants of this idea are discussed and explored. We show that the quality of speech generated in the non-aligned scenario is comparable to that obtained in the parallel scenario.This work was funded by the Spanish State Research Agency (SRA) under the grant PID2019-108040RBC22/ SRA/10.13039/501100011033. Jose A. Gonzalez-Lopez holds a Juan de la Cierva-Incorporation Fellowship from the Spanish Ministry of Science, Innovation and Universities (IJCI-2017-32926)
    corecore