Search CORE

500 research outputs found

Repeat after me: Self-supervised learning of acoustic-to-articulatory mapping by vocal imitation

Author: Diard Julien
Georges Marc-Antoine
Girin Laurent
Hueber Thomas
Schwartz Jean-Luc
Publication venue
Publication date: 05/04/2022
Field of study

We propose a computational model of speech production combining a pre-trained neural articulatory synthesizer able to reproduce complex speech stimuli from a limited set of interpretable articulatory parameters, a DNN-based internal forward model predicting the sensory consequences of articulatory commands, and an internal inverse model based on a recurrent neural network recovering articulatory commands from the acoustic speech input. Both forward and inverse models are jointly trained in a self-supervised way from raw acoustic-only speech data from different speakers. The imitation simulations are evaluated objectively and subjectively and display quite encouraging performances

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

HAL Descartes

HAL Université de Savoie

Acoustic Space Learning for Sound Source Separation and Localization on Binaural Manifolds

Author: Deleforge Antoine
Forbes Florence
Horaud Radu
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 20/03/2014
Field of study

In this paper we address the problems of modeling the acoustic space generated by a full-spectrum sound source and of using the learned model for the localization and separation of multiple sources that simultaneously emit sparse-spectrum sounds. We lay theoretical and methodological grounds in order to introduce the binaural manifold paradigm. We perform an in-depth study of the latent low-dimensional structure of the high-dimensional interaural spectral data, based on a corpus recorded with a human-like audiomotor robot head. A non-linear dimensionality reduction technique is used to show that these data lie on a two-dimensional (2D) smooth manifold parameterized by the motor states of the listener, or equivalently, the sound source directions. We propose a probabilistic piecewise affine mapping model (PPAM) specifically designed to deal with high-dimensional data exhibiting an intrinsic piecewise linear structure. We derive a closed-form expectation-maximization (EM) procedure for estimating the model parameters, followed by Bayes inversion for obtaining the full posterior density function of a sound source direction. We extend this solution to deal with missing data and redundancy in real world spectrograms, and hence for 2D localization of natural sound sources such as speech. We further generalize the model to the challenging case of multiple sound sources and we propose a variational EM framework. The associated algorithm, referred to as variational EM for source separation and localization (VESSL) yields a Bayesian estimation of the 2D locations and time-frequency masks of all the sources. Comparisons of the proposed approach with several existing methods reveal that the combination of acoustic-space learning with Bayesian inference enables our method to outperform state-of-the-art methods.Comment: 19 pages, 9 figures, 3 table

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Detecting autism, emotions and social signals using AdaBoost

Author: Busa-Fekete Róbert
Gosztolya Gábor
Tóth László
Publication venue: Interspeech
Publication date: 01/01/2013
Field of study

SZTE Publicatio Repozitórium - SZTE - Repository of Publications

Sound pressure distribution in a long, narrow hallway: Measurements versus results from a computer model with scattering from surface roughness and diffraction

Author: Christensen Claus Lynge
Rathsam Jonathan
Rindel Jens Holger
Wang Lily M.
Publication venue: 'Acoustical Society of America (ASA)'
Publication date: 01/01/2005
Field of study

Online Research Database In Technology

Articulatory features for conversational speech recognition

Author: Metze Florian
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2005
Field of study

KITopen

EMG-to-Speech: Direct Generation of Speech from Facial Electromyographic Signals

Author: Janke Matthias
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2016
Field of study

The general objective of this work is the design, implementation, improvement and evaluation of a system that uses surface electromyographic (EMG) signals and directly synthesizes an audible speech output: EMG-to-speech

KITopen

Articulatory Modeling Based on Semi-polar Coordinates and Guided PCA Technique

Author: Busset Julie
Cai Jun
Hirsch Fabrice
Laprie Yves
Publication venue: HAL CCSD
Publication date: 07/09/2009
Field of study

International audienceResearch on 2-dimensional static articulatory modeling has been performed by using the semi-polar system and the guided PCA analysis of lateral X-ray images of vocal tract. The density of the grid lines in the semi-polar system has been increased to have a better descriptive precision. New parameters have been introduced to describe the movements of tongue apex. An extra feature, the tongue root, has been extracted as one of the elementary factors in order to improve the precision of tongue model. New methods still remain to be developed for describing the movements of tongue apex

INRIA a CCSD electronic archive server