2,218 research outputs found

    Speaker adaptation of an acoustic-to-articulatory inversion model using cascaded Gaussian mixture regressions

    No full text
    International audienceThe article presents a method for adapting a GMM-based acoustic-articulatory inversion model trained on a reference speaker to another speaker. The goal is to estimate the articulatory trajectories in the geometrical space of a reference speaker from the speech audio signal of another speaker. This method is developed in the context of a system of visual biofeedback, aimed at pronunciation training. This system provides a speaker with visual information about his/her own articulation, via a 3D orofacial clone. In previous work, we proposed to use GMM-based voice conversion for speaker adaptation. Acoustic-articulatory mapping was achieved in 2 consecutive steps: 1) converting the spectral trajectories of the target speaker (i.e. the system user) into spectral trajectories of the reference speaker (voice conversion), and 2) estimating the most likely articulatory trajectories of the reference speaker from the converted spectral features (acoustic-articulatory inversion). In this work, we propose to combine these two steps into the same statistical mapping framework, by fusing multiple regressions based on trajectory GMM and maximum likelihood criterion (MLE). The proposed technique is compared to two standard speaker adaptation techniques based respectively on MAP and MLLR

    A silent speech system based on permanent magnet articulography and direct synthesis

    Get PDF
    In this paper we present a silent speech interface (SSI) system aimed at restoring speech communication for individuals who have lost their voice due to laryngectomy or diseases affecting the vocal folds. In the proposed system, articulatory data captured from the lips and tongue using permanent magnet articulography (PMA) are converted into audible speech using a speaker-dependent transformation learned from simultaneous recordings of PMA and audio signals acquired before laryngectomy. The transformation is represented using a mixture of factor analysers, which is a generative model that allows us to efficiently model non-linear behaviour and perform dimensionality reduction at the same time. The learned transformation is then deployed during normal usage of the SSI to restore the acoustic speech signal associated with the captured PMA data. The proposed system is evaluated using objective quality measures and listening tests on two databases containing PMA and audio recordings for normal speakers. Results show that it is possible to reconstruct speech from articulator movements captured by an unobtrusive technique without an intermediate recognition step. The SSI is capable of producing speech of sufficient intelligibility and naturalness that the speaker is clearly identifiable, but problems remain in scaling up the process to function consistently for phonetically rich vocabularies

    Emotion Recognition via Continuous Mandarin Speech

    Get PDF

    Two Methods for Spoofing-Aware Speaker Verification: Multi-Layer Perceptron Score Fusion Model and Integrated Embedding Projector

    Full text link
    The use of deep neural networks (DNN) has dramatically elevated the performance of automatic speaker verification (ASV) over the last decade. However, ASV systems can be easily neutralized by spoofing attacks. Therefore, the Spoofing-Aware Speaker Verification (SASV) challenge is designed and held to promote development of systems that can perform ASV considering spoofing attacks by integrating ASV and spoofing countermeasure (CM) systems. In this paper, we propose two back-end systems: multi-layer perceptron score fusion model (MSFM) and integrated embedding projector (IEP). The MSFM, score fusion back-end system, derived SASV score utilizing ASV and CM scores and embeddings. On the other hand,IEP combines ASV and CM embeddings into SASV embedding and calculates final SASV score based on the cosine similarity. We effectively integrated ASV and CM systems through proposed MSFM and IEP and achieved the SASV equal error rates 0.56%, 1.32% on the official evaluation trials of the SASV 2022 challenge.Comment: 5 pages, 4 figures, 5 tables, accepted to 2022 Interspeech as a conference pape
    • …
    corecore