55 research outputs found

    Silent speech: restoring the power of speech to people whose larynx has been removed

    Get PDF
    Every year, some 17,500 people in Europe and North America lose the power of speech after undergoing a laryngectomy, normally as a treatment for throat cancer. Several research groups have recently demonstrated that it is possible to restore speech to these people by using machine learning to learn the transformation from articulator movement to sound. In our project articulator movement is captured by a technique developed by our collaborators at Hull University called Permanent Magnet Articulography (PMA), which senses the changes of magnetic field caused by movements of small magnets attached to the lips and tongue. This solution, however, requires synchronous PMA-and-audio recordings for learning the transformation and, hence, it cannot be applied to people who have already lost their voice. Here we propose to investigate a variant of this technique in which the PMA data are used to drive an articulatory synthesiser, which generates speech acoustics by simulating the airflow through a computational model of the vocal tract. The project goals, participants, current status, and achievements of the project are discussed below.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

    Integrating user-centred design in the development of a silent speech interface based on permanent magnetic articulography

    Get PDF
    Abstract: A new wearable silent speech interface (SSI) based on Permanent Magnetic Articulography (PMA) was developed with the involvement of end users in the design process. Hence, desirable features such as appearance, port-ability, ease of use and light weight were integrated into the prototype. The aim of this paper is to address the challenges faced and the design considerations addressed during the development. Evaluation on both hardware and speech recognition performances are presented here. The new prototype shows a com-parable performance with its predecessor in terms of speech recognition accuracy (i.e. ~95% of word accuracy and ~75% of sequence accuracy), but significantly improved appearance, portability and hardware features in terms of min-iaturization and cost

    A silent speech system based on permanent magnet articulography and direct synthesis

    Get PDF
    In this paper we present a silent speech interface (SSI) system aimed at restoring speech communication for individuals who have lost their voice due to laryngectomy or diseases affecting the vocal folds. In the proposed system, articulatory data captured from the lips and tongue using permanent magnet articulography (PMA) are converted into audible speech using a speaker-dependent transformation learned from simultaneous recordings of PMA and audio signals acquired before laryngectomy. The transformation is represented using a mixture of factor analysers, which is a generative model that allows us to efficiently model non-linear behaviour and perform dimensionality reduction at the same time. The learned transformation is then deployed during normal usage of the SSI to restore the acoustic speech signal associated with the captured PMA data. The proposed system is evaluated using objective quality measures and listening tests on two databases containing PMA and audio recordings for normal speakers. Results show that it is possible to reconstruct speech from articulator movements captured by an unobtrusive technique without an intermediate recognition step. The SSI is capable of producing speech of sufficient intelligibility and naturalness that the speaker is clearly identifiable, but problems remain in scaling up the process to function consistently for phonetically rich vocabularies

    Restoring Speech Following Total Removal of the Larynx

    Get PDF
    By speech articulator movement and training a transformation to audio we can restore the power of speech to someone who has lost their larynx. We sense changes in magnetic field caused by movements of small magnets attached to the lips and tongue. The sensor transformation uses recurrent neural networks

    Towards an Intraoral-Based Silent Speech Restoration System for Post-laryngectomy Voice Replacement

    Full text link
    © Springer International Publishing AG 2017, Silent Speech Interfaces (SSIs) are alternative assistive speech technologies that are capable of restoring speech communication for those individuals who have lost their voice due to laryngectomy or diseases affecting the vocal cords. However, many of these SSIs are still deemed as impractical due to a high degree of intrusiveness and discomfort, hence limiting their transition to outside of the laboratory environment. We aim to address the hardware challenges faced in developing a practical SSI for post-laryngectomy speech rehabilitation. A new Permanent Magnet Articulography (PMA) system is presented which fits within the palatal cavity of the user’s mouth, giving unobtrusive appearance and high portability. The prototype is comprised of a miniaturized circuit constructed using commercial off-the-shelf (COTS) components and is implemented in the form of a dental retainer, which is mounted under roof of the user’s mouth and firmly clasps onto the upper teeth. Preliminary evaluation via speech recognition experiments demonstrates that the intraoral prototype achieves reasonable word recognition accuracy and is comparable to the external PMA version. Moreover, the intraoral design is expected to improve on its stability and robustness, with a much improved appearance since it can be completely hidden inside the user’s mouth

    Multi-view Temporal Alignment for Non-parallel Articulatory-to-Acoustic Speech Synthesis

    Get PDF
    Articulatory-to-acoustic (A2A) synthesis refers to the generation of audible speech from captured movement of the speech articulators. This technique has numerous applications, such as restoring oral communication to people who cannot longer speak due to illness or injury. Most successful techniquesso far adopt a supervised learning framework, in which timesynchronousarticulatory-and-speech recordings are used to train a supervised machine learning algorithm that can be used later to map articulator movements to speech. This, however, prevents the application of A2A techniques in cases where parallel data is unavailable, e.g., a person has already lost her/his voice and only articulatory data can be captured. In this work, we propose a solution to this problem based on the theory of multi-view learning. The proposed algorithm attempts to find an optimal temporal alignment between pairs of nonaligned articulatory-and-acoustic sequences with the same phonetic content by projecting them into a common latent space where both views are maximally correlated and then applying dynamic time warping. Several variants of this idea are discussed and explored. We show that the quality of speech generated in the non-aligned scenario is comparable to that obtained in the parallel scenario.This work was funded by the Spanish State Research Agency (SRA) under the grant PID2019-108040RBC22/ SRA/10.13039/501100011033. Jose A. Gonzalez-Lopez holds a Juan de la Cierva-Incorporation Fellowship from the Spanish Ministry of Science, Innovation and Universities (IJCI-2017-32926)

    Non-Parallel Articulatory-to-Acoustic Conversion Using Multiview-based Time Warping

    Get PDF
    This work was supported in part by the Spanish State Research Agency (SRA) grant number PID2019-108040RB-C22/SRA/10.13039/501100011033, and the FEDER/Junta de AndalucĂ­aConsejerĂ­a de TransformaciĂłn EconĂłmica, Industria, Conocimiento y Universidades project no. B-SEJ-570-UGR20.In this paper, we propose a novel algorithm called multiview temporal alignment by dependence maximisation in the latent space (TRANSIENCE) for the alignment of time series consisting of sequences of feature vectors with different length and dimensionality of the feature vectors. The proposed algorithm, which is based on the theory of multiview learning, can be seen as an extension of the well-known dynamic time warping (DTW) algorithm but, as mentioned, it allows the sequences to have different dimensionalities. Our algorithm attempts to find an optimal temporal alignment between pairs of nonaligned sequences by first projecting their feature vectors into a common latent space where both views are maximally similar. To do this, powerful, nonlinear deep neural network (DNN) models are employed. Then, the resulting sequences of embedding vectors are aligned using DTW. Finally, the alignment paths obtained in the previous step are applied to the original sequences to align them. In the paper, we explore several variants of the algorithm that mainly differ in the way the DNNs are trained. We evaluated the proposed algorithm on a articulatory-to-acoustic (A2A) synthesis task involving the generation of audible speech from motion data captured from the lips and tongue of healthy speakers using a technique known as permanent magnet articulography (PMA). In this task, our algorithm is applied during the training stage to align pairs of nonaligned speech and PMA recordings that are later used to train DNNs able to synthesis speech from PMA data. Our results show the quality of speech generated in the nonaligned scenario is comparable to that obtained in the parallel scenario.Spanish State Research Agency (SRA) PID2019-108040RB-C22/SRA/10.13039/501100011033FEDER/Junta de AndalucĂ­aConsejerĂ­a de TransformaciĂłn EconĂłmica, Industria, Conocimiento y Universidades project no. B-SEJ-570-UGR20
    • …
    corecore