Search CORE

41 research outputs found

Predicting Tongue Positions from Acoustics and Facial Features

Author: Ouni Slim
Toutios Asterios
Publication venue: HAL CCSD
Publication date: 28/08/2011
Field of study

International audienceWe test the hypothesis that adding information regarding the positions of electromagnetic articulograph (EMA) sensors on the lips and jaw can improve the results of a typical acoustic-to-EMA mapping system, based on support vector regression, that targets the tongue sensors. Our initial motivation is to use such a system in the context of adding a tongue animation to a talking head built on the basis of concatenating bimodal acoustic-visual units. For completeness, we also train a system that maps only jaw and lip information to tongue information

INRIA a CCSD electronic archive server

HAL-Rennes 1

Protocol for a Model-based Evaluation of a Dynamic Acoustic-to-Articulatory Inversion Method using Electromagnetic Articulography

Author: Laprie Yves
Ouni Slim
Toutios Asterios
Publication venue: HAL CCSD
Publication date: 01/01/2008
Field of study

International audienceAcoustic-to-articulatory maps based on articulatory models have typically been evaluated in terms of acoustic accuracy, that is, the distance between mapped and observed acoustic parameters. In this paper we present a method that would allow for the evaluation of such maps in the articulatory domain. The proposed method estimates the parameters of Maeda's articulatory model on the basis of electromagnetic articulograph data, thus producing full midsagittal views of the vocal tract from the positions of a limited number of sensors attached on articulators

CiteSeerX

INRIA a CCSD electronic archive server

HAL-Rennes 1

Variation in compensatory strategies as a function of target constriction degree in post-glossectomy speech

Author: Goldstein Louis
Hagedorn Christina
Lu Yijing
Narayanan Shrikanth
Sinha Uttam
Toutios Asterios
Publication venue: CUNY Academic Works
Publication date: 22/04/2022
Field of study

Individuals who have undergone treatment for oral cancer oftentimes exhibit compensatory behavior in consonant production. This pilot study investigates whether compensatory mechanisms utilized in the production of speech sounds with a given target constriction location vary systematically depending on target manner of articulation. The data reveal that compensatory strategies used to produce target alveolar segments vary systematically as a function of target manner of articulation in subtle yet meaningful ways. When target constriction degree at a particular constriction location cannot be preserved, individuals may leverage their ability to finely modulate constriction degree at multiple constriction locations along the vocal tract

City University of New York

PubMed Central

Setup for Acoustic-Visual Speech Synthesis by Concatenating Bimodal Units

Author: Berger Marie-Odile
Colotte Vincent
Musti Utpala
Ouni Slim
Toutios Asterios
Wrobel-Dautcourt Brigitte
Publication venue: HAL CCSD
Publication date: 01/01/2010
Field of study

International audienceThis paper presents preliminary work on building a system able to synthesize concurrently the speech signal and a 3D animation of the speaker's face. This is done by concatenating bimodal diphone units, that is, units that comprise both acoustic and visual information. The latter is acquired using a stereovision technique. The proposed method addresses the problems of asyn- chrony and incoherence inherent in classic approaches to au- diovisual synthesis. Unit selection is based on classic target and join costs from acoustic-only synthesis, which are augmented with a visual join cost. Preliminary results indicate the benefits of the approach, since both the synthesized speech signal and the face animation are of good quality. Planned improvements and enhancements to the system are outlined

CiteSeerX

INRIA a CCSD electronic archive server

HAL-Rennes 1

HMM-based Automatic Visual Speech Segmentation Using Facial Data

Author: Berger Marie-Odile
Colotte Vincent
Musti Utpala
Ouni Slim
Toutios Asterios
Wrobel-Dautcourt Brigitte
Publication venue: HAL CCSD
Publication date: 01/01/2010
Field of study

International audienceWe describe automatic visual speech segmentation using facial data captured by a stereo-vision technique. The segmentation is performed using an HMM-based forced alignment mechanism widely used in automatic speech recognition. The idea is based on the assumption that using visual speech data alone for the training might capture the uniqueness in the facial compo- nent of speech articulation, asynchrony (time lags) in visual and acoustic speech segments and significant coarticulation effects. This should provide valuable information that helps to show the extent to which a phoneme may affect surrounding phonemes visually. This should provide information valuable in labeling the visual speech segments based on dominant coarticulatory contexts

CiteSeerX

INRIA a CCSD electronic archive server

HAL-Rennes 1

Towards a True Acoustic-Visual Speech Synthesis

Author: Berger Marie-Odile
Colotte Vincent
Musti Utpala
Ouni Slim
Toutios Asterios
Wrobel-Dautcourt Brigitte
Publication venue: HAL CCSD
Publication date: 30/09/2010
Field of study

International audienceThis paper presents an initial bimodal acoustic-visual synthesis system able to generate concurrently the speech signal and a 3D animation of the speaker's face. This is done by concatenating bimodal diphone units that consist of both acoustic and visual information. The latter is acquired using a stereovision technique. The proposed method addresses the problems of asyn- chrony and incoherence inherent in classic approaches to audiovisual synthesis. Unit selection is based on classic target and join costs from acoustic-only synthesis, which are augmented with a visual join cost. Preliminary results indicate the benefits of this approach, since both the synthesized speech signal and the face animation are of good quality

INRIA a CCSD electronic archive server

HAL-Rennes 1

A multispeaker dataset of raw and reconstructed speech production real-time MRI video and 3D volumetric images

Author: Bliesener Yannick
Byrd Dani
Chen Weiyi
Godinez Bianca
Goldstein Louis
Harper Sarah
Lee Yoonjeong
Lim Yongwan
Lingala Sajan Goud
Montesserin Mairym Lloréns
Narayanan Shrikanth S.
Nayak Krishna S.
Oh Miran
Smith Caitlin
Sorensen Tanner
Tian Ye
Toutios Asterios
Töger Johannes
Vaz Colin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 15/02/2021
Field of study

Real-time magnetic resonance imaging (RT-MRI) of human speech production is enabling significant advances in speech science, linguistics, bio-inspired speech technology development, and clinical applications. Easy access to RT-MRI is however limited, and comprehensive datasets with broad access are needed to catalyze research across numerous domains. The imaging of the rapidly moving articulators and dynamic airway shaping during speech demands high spatio-temporal resolution and robust reconstruction methods. Further, while reconstructed images have been published, to-date there is no open dataset providing raw multi-coil RT-MRI data from an optimized speech production experimental setup. Such datasets could enable new and improved methods for dynamic image reconstruction, artifact correction, feature extraction, and direct extraction of linguistically-relevant biomarkers. The present dataset offers a unique corpus of 2D sagittal-view RT-MRI videos along with synchronized audio for 75 subjects performing linguistically motivated speech tasks, alongside the corresponding first-ever public domain raw RT-MRI data. The dataset also includes 3D volumetric vocal tract MRI during sustained speech sounds and high-resolution static anatomical T2-weighted upper airway MRI for each subject.Comment: 27 pages, 6 figures, 5 tables, submitted to Nature Scientific Dat

arXiv.org e-Print Archive

Directory of Open Access Journals

Prediction of Human Intestinal Absorption by GA Feature Selection and Support Vector Machine Regression

Author: Abraham
Agatonovic-Kustrin
Aixia Yan
Cruciani
Davis
Ertl
Gasteiger
Gasteiger
Gasteiger
Gasteiger
Hou
Kleinoeder
Klopman
Leardi
Leardi
Leardi
Lipinski
Niwa
Norinder
Osterberg
Perez
Simon
Sun
Toutios
Varma
Wagener
Wang
Wegner
Wessel
Wessel
Xue
Yan
Yan
Yan
Zhao
Zhi Wang
Zongyuan Cai
Zupan
Publication venue: Molecular Diversity Preservation International (MDPI)
Publication date: 01/10/2008
Field of study

QSAR (Quantitative Structure Activity Relationships) models for the prediction of human intestinal absorption (HIA) were built with molecular descriptors calculated by ADRIANA.Code, Cerius2 and a combination of them. A dataset of 552 compounds covering a wide range of current drugs with experimental HIA values was investigated. A Genetic Algorithm feature selection method was applied to select proper descriptors. A Kohonen's self-organizing Neural Network (KohNN) map was used to split the whole dataset into a training set including 380 compounds and a test set consisting of 172 compounds. First, the six selected descriptors from ADRIANA.Code and the six selected descriptors from Cerius2 were used as the input descriptors for building quantitative models using Partial Least Square (PLS) analysis and Support Vector Machine (SVM) Regression. Then, another two models were built based on nine descriptors selected by a combination of ADRIANA.Code and Cerius2 descriptors using PLS and SVM, respectively. For the three SVM models, correlation coefficients (r) of 0.87, 0.89 and 0.88 were achieved; and standard deviations (s) of 10.98, 9.72 and 9.14 were obtained for the test set

Multidisciplinary Digital Publishing Institute

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

Vocal Tract Images Reveal Neural Representations of Sensorimotor Transformation During Speech Imitation

Author: Arsenault
Boersma
Bohland
Bouchard
Bouchard
Bronwen G. Evans
Brown
Carey
Carey
Carolyn McGettigan
Cheung
Cogan
Correia
D'Ausilio
Daniel Carey
Davis
Davis
Diana
Doyon
Du
Evans
Evans
Geranmayeh
Golestani
Grabski
Guenther
Guenther
Hickok
Hickok
Hickok
Hickok
Holland
Joanisse
Kartushina
Kim
Kleiner
Kriegeskorte
Kriegeskorte
Kriegeskorte
Kumar
Leonard
Liberman
Lingala
Marc E. Miquel
Markiewicz
McGettigan
Meier
Mesgarani
Moser
Möttönen
Niebergall
Niziolek
O'Shaughnessy
Paine
Parker Jones
Patti Adank
Perani
Pülvermüller
Ranganath
Rauschecker
Riecker
Scott
Scott
Scott
Segawa
Simmonds
Simmonds
Simmonds
Takai
Toutios
Vaquez Miloro
Wells
Wilson
Publication venue: 'Oxford University Press (OUP)'
Publication date: 14/03/2017
Field of study

Imitating speech necessitates the transformation from sensory targets to vocal tract motor output, yet little is known about the representational basis of this process in the human brain. Here, we address this question by using real-time MR imaging (rtMRI) of the vocal tract and functional MRI (fMRI) of the brain in a speech imitation paradigm. Participants trained on imitating a native vowel and a similar nonnative vowel that required lip rounding. Later, participants imitated these vowels and an untrained vowel pair during separate fMRI and rtMRI runs. Univariate fMRI analyses revealed that regions including left inferior frontal gyrus were more active during sensorimotor transformation (ST) and production of nonnative vowels, compared with native vowels; further, ST for nonnative vowels activated somatomotor cortex bilaterally, compared with ST of native vowels. Using test representational similarity analysis (RSA) models constructed from participants' vocal tract images and from stimulus formant distances, we found that RSA searchlight analyses of fMRI data showed either type of model could be represented in somatomotor, temporal, cerebellar, and hippocampal neural activation patterns during ST. We thus provide the first evidence of widespread and robust cortical and subcortical neural representation of vocal tract and/or formant parameters, during prearticulatory ST

Crossref

Royal Holloway - Pure

UCL Discovery

Queen Mary Research Online

A multilinear tongue model derived from speech related MRI data of the human vocal tract

Author: Alexander Hewer
Allen
Ananthakrishnan
Badin
Badin
Badin
Baer
Beautemps
Bijar
Blandin
Blanz
Bolkart
Botsch
Brunner
Buchaillard
Buchaillard
Burdumy
De Silva
Demolin
Dryden
Elie
Engwall
Engwall
Engwall
Eryildirim
Fang
Foldvik
Fu
Fuchs
Geng
Harandi
Harandi
Harshman
Harshman
Hewer
Hewer
Honda
Hoole
Hoole
Ingmar Steiner
International Phonetic Association
Jackson
Johnson
Kaburagi
Kiers
Kim
Korin Richmond
Kröger
Ladefoged
Ladefoged
Le Maguer
Lee
Li
Lingala
Lingala
Liu
McGurk
Mermelstein
Narayanan
Narayanan
Narayanan
Niebergall
Otsu
Peng
Raeesy
Richmond
Rodrigues
Rosset
Rudy
Scott
Serrurier
Shadle
Stefanie Wuhrer
Steiner
Stone
Stone
Stone
Styner
Tiede
Toutios
Tucker
Valdés Vargas
Valdés Vargas
Weickert
Weirich
Weirich
Woo
Woo
Wu
Yunusova
Zheng
Publication venue: 'Elsevier BV'
Publication date: 21/02/2018
Field of study

We present a multilinear statistical model of the human tongue that captures anatomical and tongue pose related shape variations separately. The model is derived from 3D magnetic resonance imaging data of 11 speakers sustaining speech related vocal tract configurations. The extraction is performed by using a minimally supervised method that uses as basis an image segmentation approach and a template fitting technique. Furthermore, it uses image denoising to deal with possibly corrupt data, palate surface information reconstruction to handle palatal tongue contacts, and a bootstrap strategy to refine the obtained shapes. Our evaluation concludes that limiting the degrees of freedom for the anatomical and speech related variations to 5 and 4, respectively, produces a model that can reliably register unknown data while avoiding overfitting effects. Furthermore, we show that it can be used to generate a plausible tongue animation by tracking sparse motion capture data

arXiv.org e-Print Archive

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Edinburgh Research Explorer