35 research outputs found

    Diphthong Synthesis Using the Dynamic 3D Digital Waveguide Mesh

    Get PDF
    Articulatory speech synthesis has the potential to offer more natural sounding synthetic speech than established concatenative or parametric synthesis methods. Time-domain acoustic models are particularly suited to the dynamic nature of the speech signal, and recent work has demonstrated the potential of dynamic vocal tract models that accurately reproduce the vocal tract geometry. This paper presents a dynamic 3D digital waveguide mesh (DWM) vocal tract model, capable of movement to produce diphthongs. The technique is compared to existing dynamic 2D and static 3D DWM models, for both monophthongs and diphthongs. The results indicate that the proposed model provides improved formant accuracy over existing DWM vocal tract models. Furthermore, the computational requirements of the proposed method are significantly lower than those of comparable dynamic simulation techniques. This paper represents another step toward a fully functional articulatory vocal tract model which will lead to more natural speech synthesis systems for use across society

    Real-time dynamic articulations in the 2-D waveguide mesh vocal tract model

    Get PDF
    Time domain articulatory vocal tract modeling in one-dimensional (1-D) is well established. Previous studies into two-dimensional (2-D) simulation of wave propagation in the vocal tract have shown it to present accurate static vowel synthesis. However, little has been done to demonstrate how such a model might accommodate the dynamic tract shape changes necessary in modeling speech. Two methods of applying the area function to the 2-D digital waveguide mesh vocal tract model are presented here. First, a method based on mapping the cross-sectional area onto the number of waveguides across the mesh, termed a widthwise mapping approach is detailed. Discontinuity problems associated with the dynamic manipulation of the model are highlighted. Second, a new method is examined that uses a static-shaped rectangular mesh with the area function translated into an impedance map which is then applied to each waveguide. Two approaches for constructing such a map are demonstrated; one using a linear impedance increase to model a constriction to the tract and another using a raised cosine function. Recommendations are made towards the use of the cosine method as it allows for a wider central propagational channel. It is also shown that this impedance mapping approach allows for stable dynamic shape changes and also permits a reduction in sampling frequency leading to real-time interaction with the model

    Silent speech: restoring the power of speech to people whose larynx has been removed

    Get PDF
    Every year, some 17,500 people in Europe and North America lose the power of speech after undergoing a laryngectomy, normally as a treatment for throat cancer. Several research groups have recently demonstrated that it is possible to restore speech to these people by using machine learning to learn the transformation from articulator movement to sound. In our project articulator movement is captured by a technique developed by our collaborators at Hull University called Permanent Magnet Articulography (PMA), which senses the changes of magnetic field caused by movements of small magnets attached to the lips and tongue. This solution, however, requires synchronous PMA-and-audio recordings for learning the transformation and, hence, it cannot be applied to people who have already lost their voice. Here we propose to investigate a variant of this technique in which the PMA data are used to drive an articulatory synthesiser, which generates speech acoustics by simulating the airflow through a computational model of the vocal tract. The project goals, participants, current status, and achievements of the project are discussed below.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

    Acoustic modeling using the digital waveguide mesh

    Get PDF
    The digital waveguide mesh has been an active area of music acoustics research for over ten years. Although founded in 1-D digital waveguide modeling, the principles on which it is based are not new to researchers grounded in numerical simulation, FDTD methods, electromagnetic simulation, etc. This article has attempted to provide a considerable review of how the DWM has been applied to acoustic modeling and sound synthesis problems, including new 2-D object synthesis and an overview of recent research activities in articulatory vocal tract modeling, RIR synthesis, and reverberation simulation. The extensive, although not by any means exhaustive, list of references indicates that though the DWM may have parallels in other disciplines, it still offers something new in the field of acoustic simulation and sound synth

    Diphthong Synthesis using the Three-Dimensional Dynamic Digital Waveguide Mesh

    Get PDF
    The human voice is a complex and nuanced instrument, and despite many years of research, no system is yet capable of producing natural-sounding synthetic speech. This affects intelligibility for some groups of listeners, in applications such as automated announcements and screen readers. Furthermore, those who require a computer to speak - due to surgery or a degenerative disease - are limited to unnatural-sounding voices that lack expressive control and may not match the user's gender, age or accent. It is evident that natural, personalised and controllable synthetic speech systems are required. A three-dimensional digital waveguide model of the vocal tract, based on magnetic resonance imaging data, is proposed here in order to address these issues. The model uses a heterogeneous digital waveguide mesh method to represent the vocal tract airway and surrounding tissues, facilitating dynamic movement and hence speech output. The accuracy of the method is validated by comparison with audio recordings of natural speech, and perceptual tests are performed which confirm that the proposed model sounds significantly more natural than simpler digital waveguide mesh vocal tract models. Control of such a model is also considered, and a proof-of-concept study is presented using a deep neural network to control the parameters of a two-dimensional vocal tract model, resulting in intelligible speech output and paving the way for extension of the control system to the proposed three-dimensional vocal tract model. Future improvements to the system are also discussed in detail. This project considers both the naturalness and control issues associated with synthetic speech and therefore represents a significant step towards improved synthetic speech for use across society

    Modeling Voiced Stop Consonants Using the 3D Dynamic Digital Waveguide Mesh Vocal Tract Model

    Get PDF

    Silent Speech Interfaces for Speech Restoration: A Review

    Get PDF
    This work was supported in part by the Agencia Estatal de Investigacion (AEI) under Grant PID2019-108040RB-C22/AEI/10.13039/501100011033. The work of Jose A. Gonzalez-Lopez was supported in part by the Spanish Ministry of Science, Innovation and Universities under Juan de la Cierva-Incorporation Fellowship (IJCI-2017-32926).This review summarises the status of silent speech interface (SSI) research. SSIs rely on non-acoustic biosignals generated by the human body during speech production to enable communication whenever normal verbal communication is not possible or not desirable. In this review, we focus on the first case and present latest SSI research aimed at providing new alternative and augmentative communication methods for persons with severe speech disorders. SSIs can employ a variety of biosignals to enable silent communication, such as electrophysiological recordings of neural activity, electromyographic (EMG) recordings of vocal tract movements or the direct tracking of articulator movements using imaging techniques. Depending on the disorder, some sensing techniques may be better suited than others to capture speech-related information. For instance, EMG and imaging techniques are well suited for laryngectomised patients, whose vocal tract remains almost intact but are unable to speak after the removal of the vocal folds, but fail for severely paralysed individuals. From the biosignals, SSIs decode the intended message, using automatic speech recognition or speech synthesis algorithms. Despite considerable advances in recent years, most present-day SSIs have only been validated in laboratory settings for healthy users. Thus, as discussed in this paper, a number of challenges remain to be addressed in future research before SSIs can be promoted to real-world applications. If these issues can be addressed successfully, future SSIs will improve the lives of persons with severe speech impairments by restoring their communication capabilities.Agencia Estatal de Investigacion (AEI) PID2019-108040RB-C22/AEI/10.13039/501100011033Spanish Ministry of Science, Innovation and Universities under Juan de la Cierva-Incorporation Fellowship IJCI-2017-3292

    Pan European Voice Conference - PEVOC 11

    Get PDF
    The Pan European VOice Conference (PEVOC) was born in 1995 and therefore in 2015 it celebrates the 20th anniversary of its establishment: an important milestone that clearly expresses the strength and interest of the scientific community for the topics of this conference. The most significant themes of PEVOC are singing pedagogy and art, but also occupational voice disorders, neurology, rehabilitation, image and video analysis. PEVOC takes place in different European cities every two years (www.pevoc.org). The PEVOC 11 conference includes a symposium of the Collegium Medicorum Theatri (www.comet collegium.com
    corecore