Search CORE

2,218 research outputs found

Speaker adaptation of an acoustic-to-articulatory inversion model using cascaded Gaussian mixture regressions

Author: Badin Pierre
Bailly Gérard
Elisei Frédéric
Hueber Thomas
Publication venue: HAL CCSD
Publication date: 26/08/2013
Field of study

International audienceThe article presents a method for adapting a GMM-based acoustic-articulatory inversion model trained on a reference speaker to another speaker. The goal is to estimate the articulatory trajectories in the geometrical space of a reference speaker from the speech audio signal of another speaker. This method is developed in the context of a system of visual biofeedback, aimed at pronunciation training. This system provides a speaker with visual information about his/her own articulation, via a 3D orofacial clone. In previous work, we proposed to use GMM-based voice conversion for speaker adaptation. Acoustic-articulatory mapping was achieved in 2 consecutive steps: 1) converting the spectral trajectories of the target speaker (i.e. the system user) into spectral trajectories of the reference speaker (voice conversion), and 2) estimating the most likely articulatory trajectories of the reference speaker from the converted spectral features (acoustic-articulatory inversion). In this work, we propose to combine these two steps into the same statistical mapping framework, by fusing multiple regressions based on trajectory GMM and maximum likelihood criterion (MLE). The proposed technique is compared to two standard speaker adaptation techniques based respectively on MAP and MLLR

Hal - Université Grenoble Alpes

A silent speech system based on permanent magnet articulography and direct synthesis

Author: Bai Jie
Cheah Lam A.
Ell Stephen R.
Gilbert James M.
Gonzalez Jose A.
Green Phil D.
Moore Roger K.
Publication venue: 'Elsevier BV'
Publication date: 14/03/2016
Field of study

In this paper we present a silent speech interface (SSI) system aimed at restoring speech communication for individuals who have lost their voice due to laryngectomy or diseases affecting the vocal folds. In the proposed system, articulatory data captured from the lips and tongue using permanent magnet articulography (PMA) are converted into audible speech using a speaker-dependent transformation learned from simultaneous recordings of PMA and audio signals acquired before laryngectomy. The transformation is represented using a mixture of factor analysers, which is a generative model that allows us to efficiently model non-linear behaviour and perform dimensionality reduction at the same time. The learned transformation is then deployed during normal usage of the SSI to restore the acoustic speech signal associated with the captured PMA data. The proposed system is evaluated using objective quality measures and listening tests on two databases containing PMA and audio recordings for normal speakers. Results show that it is possible to reconstruct speech from articulator movements captured by an unobtrusive technique without an intermediate recognition step. The SSI is capable of producing speech of sufficient intelligibility and naturalness that the speaker is clearly identifiable, but problems remain in scaling up the process to function consistently for phonetically rich vocabularies

Repository@Hull - Worktribe

Emotion Recognition via Continuous Mandarin Speech

Author: Jun-Heng Yeh
Tsang-Long Pao
Yu-Te Chen
Publication venue: 'IntechOpen'
Publication date: 01/10/2008
Field of study

IntechOpen

Two Methods for Spoofing-Aware Speaker Verification: Multi-Layer Perceptron Score Fusion Model and Integrated Embedding Projector

Author: Heo Jungwoo
Kim Ju-ho
Shin Hyun-seo
Publication venue: 'International Speech Communication Association'
Publication date: 28/06/2022
Field of study

The use of deep neural networks (DNN) has dramatically elevated the performance of automatic speaker verification (ASV) over the last decade. However, ASV systems can be easily neutralized by spoofing attacks. Therefore, the Spoofing-Aware Speaker Verification (SASV) challenge is designed and held to promote development of systems that can perform ASV considering spoofing attacks by integrating ASV and spoofing countermeasure (CM) systems. In this paper, we propose two back-end systems: multi-layer perceptron score fusion model (MSFM) and integrated embedding projector (IEP). The MSFM, score fusion back-end system, derived SASV score utilizing ASV and CM scores and embeddings. On the other hand,IEP combines ASV and CM embeddings into SASV embedding and calculates final SASV score based on the cosine similarity. We effectively integrated ASV and CM systems through proposed MSFM and IEP and achieved the SASV equal error rates 0.56%, 1.32% on the official evaluation trials of the SASV 2022 challenge.Comment: 5 pages, 4 figures, 5 tables, accepted to 2022 Interspeech as a conference pape

arXiv.org e-Print Archive

Low-resource automatic speech recognition and error analyses of oral cancer speech

Author: Feng S.
Halpern B.M.
Scharenborg O.
van den Brekel M.
van Son R.
Publication venue: 'Elsevier BV'
Publication date: 01/06/2022
Field of study

International Migration, Integration and Social Cohesion online publications

Low-resource automatic speech recognition and error analyses of oral cancer speech

Author: Feng S.
Halpern B.M.
Scharenborg O.
van den Brekel M.
van Son R.
Publication venue: 'Elsevier BV'
Publication date: 01/06/2022
Field of study

International Migration, Integration and Social Cohesion online publications