Search CORE

27 research outputs found

Predicting Tongue Positions from Acoustics and Facial Features

Author: Ouni Slim
Toutios Asterios
Publication venue: HAL CCSD
Publication date: 28/08/2011
Field of study

International audienceWe test the hypothesis that adding information regarding the positions of electromagnetic articulograph (EMA) sensors on the lips and jaw can improve the results of a typical acoustic-to-EMA mapping system, based on support vector regression, that targets the tongue sensors. Our initial motivation is to use such a system in the context of adding a tongue animation to a talking head built on the basis of concatenating bimodal acoustic-visual units. For completeness, we also train a system that maps only jaw and lip information to tongue information

INRIA a CCSD electronic archive server

HAL-Rennes 1

Protocol for a Model-based Evaluation of a Dynamic Acoustic-to-Articulatory Inversion Method using Electromagnetic Articulography

Author: Laprie Yves
Ouni Slim
Toutios Asterios
Publication venue: HAL CCSD
Publication date: 01/01/2008
Field of study

International audienceAcoustic-to-articulatory maps based on articulatory models have typically been evaluated in terms of acoustic accuracy, that is, the distance between mapped and observed acoustic parameters. In this paper we present a method that would allow for the evaluation of such maps in the articulatory domain. The proposed method estimates the parameters of Maeda's articulatory model on the basis of electromagnetic articulograph data, thus producing full midsagittal views of the vocal tract from the positions of a limited number of sensors attached on articulators

CiteSeerX

INRIA a CCSD electronic archive server

HAL-Rennes 1

Variation in compensatory strategies as a function of target constriction degree in post-glossectomy speech

Author: Goldstein Louis
Hagedorn Christina
Lu Yijing
Narayanan Shrikanth
Sinha Uttam
Toutios Asterios
Publication venue: CUNY Academic Works
Publication date: 22/04/2022
Field of study

Individuals who have undergone treatment for oral cancer oftentimes exhibit compensatory behavior in consonant production. This pilot study investigates whether compensatory mechanisms utilized in the production of speech sounds with a given target constriction location vary systematically depending on target manner of articulation. The data reveal that compensatory strategies used to produce target alveolar segments vary systematically as a function of target manner of articulation in subtle yet meaningful ways. When target constriction degree at a particular constriction location cannot be preserved, individuals may leverage their ability to finely modulate constriction degree at multiple constriction locations along the vocal tract

City University of New York

PubMed Central

Setup for Acoustic-Visual Speech Synthesis by Concatenating Bimodal Units

Author: Berger Marie-Odile
Colotte Vincent
Musti Utpala
Ouni Slim
Toutios Asterios
Wrobel-Dautcourt Brigitte
Publication venue: HAL CCSD
Publication date: 01/01/2010
Field of study

International audienceThis paper presents preliminary work on building a system able to synthesize concurrently the speech signal and a 3D animation of the speaker's face. This is done by concatenating bimodal diphone units, that is, units that comprise both acoustic and visual information. The latter is acquired using a stereovision technique. The proposed method addresses the problems of asyn- chrony and incoherence inherent in classic approaches to au- diovisual synthesis. Unit selection is based on classic target and join costs from acoustic-only synthesis, which are augmented with a visual join cost. Preliminary results indicate the benefits of the approach, since both the synthesized speech signal and the face animation are of good quality. Planned improvements and enhancements to the system are outlined

CiteSeerX

INRIA a CCSD electronic archive server

HAL-Rennes 1

HMM-based Automatic Visual Speech Segmentation Using Facial Data

Author: Berger Marie-Odile
Colotte Vincent
Musti Utpala
Ouni Slim
Toutios Asterios
Wrobel-Dautcourt Brigitte
Publication venue: HAL CCSD
Publication date: 01/01/2010
Field of study

International audienceWe describe automatic visual speech segmentation using facial data captured by a stereo-vision technique. The segmentation is performed using an HMM-based forced alignment mechanism widely used in automatic speech recognition. The idea is based on the assumption that using visual speech data alone for the training might capture the uniqueness in the facial compo- nent of speech articulation, asynchrony (time lags) in visual and acoustic speech segments and significant coarticulation effects. This should provide valuable information that helps to show the extent to which a phoneme may affect surrounding phonemes visually. This should provide information valuable in labeling the visual speech segments based on dominant coarticulatory contexts

CiteSeerX

INRIA a CCSD electronic archive server

HAL-Rennes 1

Towards a True Acoustic-Visual Speech Synthesis

Author: Berger Marie-Odile
Colotte Vincent
Musti Utpala
Ouni Slim
Toutios Asterios
Wrobel-Dautcourt Brigitte
Publication venue: HAL CCSD
Publication date: 30/09/2010
Field of study

International audienceThis paper presents an initial bimodal acoustic-visual synthesis system able to generate concurrently the speech signal and a 3D animation of the speaker's face. This is done by concatenating bimodal diphone units that consist of both acoustic and visual information. The latter is acquired using a stereovision technique. The proposed method addresses the problems of asyn- chrony and incoherence inherent in classic approaches to audiovisual synthesis. Unit selection is based on classic target and join costs from acoustic-only synthesis, which are augmented with a visual join cost. Preliminary results indicate the benefits of this approach, since both the synthesized speech signal and the face animation are of good quality

INRIA a CCSD electronic archive server

HAL-Rennes 1

A multispeaker dataset of raw and reconstructed speech production real-time MRI video and 3D volumetric images

Author: Bliesener Yannick
Byrd Dani
Chen Weiyi
Godinez Bianca
Goldstein Louis
Harper Sarah
Lee Yoonjeong
Lim Yongwan
Lingala Sajan Goud
Montesserin Mairym Lloréns
Narayanan Shrikanth S.
Nayak Krishna S.
Oh Miran
Smith Caitlin
Sorensen Tanner
Tian Ye
Toutios Asterios
Töger Johannes
Vaz Colin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 15/02/2021
Field of study

Real-time magnetic resonance imaging (RT-MRI) of human speech production is enabling significant advances in speech science, linguistics, bio-inspired speech technology development, and clinical applications. Easy access to RT-MRI is however limited, and comprehensive datasets with broad access are needed to catalyze research across numerous domains. The imaging of the rapidly moving articulators and dynamic airway shaping during speech demands high spatio-temporal resolution and robust reconstruction methods. Further, while reconstructed images have been published, to-date there is no open dataset providing raw multi-coil RT-MRI data from an optimized speech production experimental setup. Such datasets could enable new and improved methods for dynamic image reconstruction, artifact correction, feature extraction, and direct extraction of linguistically-relevant biomarkers. The present dataset offers a unique corpus of 2D sagittal-view RT-MRI videos along with synchronized audio for 75 subjects performing linguistically motivated speech tasks, alongside the corresponding first-ever public domain raw RT-MRI data. The dataset also includes 3D volumetric vocal tract MRI during sustained speech sounds and high-resolution static anatomical T2-weighted upper airway MRI for each subject.Comment: 27 pages, 6 figures, 5 tables, submitted to Nature Scientific Dat

arXiv.org e-Print Archive

Directory of Open Access Journals

Voice and speech processing and recognition: on the use of stochastic methods for the extraction of phonetic sub-phonetic features from the speech signal

Author: Toutios Asterios
Τούτιος Αστέριος
Publication venue: 'National Documentation Centre (EKT)'
Publication date: 01/01/2006
Field of study

Hellenic National Archive of Doctoral Dissertations

Learning Articulation from Cepstral Coefficients

Author: Asterios Toutios
Konstantinos Margaritis
Publication venue
Publication date
Field of study

We work on a special case of the speech inversion problem, namely the mapping from Mel Frequency Cepstral Coeeficients onto articulatory trajectories, derived by EMA. We employ Support Vector Regression, and use PCA and ICA as means to account for the spatial structure of the problem. Our results are comparable to those achieved by older attempts on the same task, indicating probably some natural limitation on the mapping itself

CiteSeerX