Search CORE

4 research outputs found

Audiovisual Generation of Social Attitudes from Neutral Stimuli

Author: Bailly Gérard
Barbulescu Adela
Pouget Maël
Ronfard Rémi
Publication venue: 'The International Fiscal Association of Korea'
Publication date: 11/09/2015
Field of study

International audienceThe focus of this study is the generation of expressive audiovisual speech from neutral utterances for 3D virtual actors. Taking into account the segmental and suprasegmental aspects of audiovisual speech, we propose and compare several computational frameworks for the generation of expressive speech and face animation. We notably evaluate a standard frame-based conversion approach with two other methods that postulate the existence of global prosodic audiovisual patterns that are characteristic of social attitudes. The proposed approaches are tested on a database of " Exercises in Style " [1] performed by two semi-professional actors and results are evaluated using crowdsourced perceptual tests. The first test performs a qualitative validation of the animation platform while the second is a comparative study between several expressive speech generation methods. We evaluate how the expressiveness of our audiovisual performances is perceived in comparison to resynthesized original utterances and the outputs of a purely frame-based conversion system

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL-Rennes 1

Adaptive Latency for Part-of-Speech Tagging in Incremental Text-to-Speech Synthesis

Author: Bailly Gérard
Hueber Thomas
Nahorna Olha
Pouget Maël
Publication venue: 'International Speech Communication Association'
Publication date: 01/09/2016
Field of study

International audienceIncremental text-to-speech systems aim at synthesizing a text 'on-the-fly', while the user is typing a sentence. In this context, this article addresses the problem of the part-of-speech tagging (POS, i.e. lexical category) which is a critical step for accurate grapheme-to-phoneme conversion and prosody estimation. Here, the main challenge is to estimate the POS of a given word without knowing its 'right context' (i.e. the following words which are not available yet). To address this issue, we propose a method based on a set of decision trees estimating online whether a given POS tag is likely to be modified when more right-contextual information becomes available. In such a case, the synthesis is delayed until POS stability is guaranteed. This results in delivering the synthetic voice in word chunks of variable length. Objective evaluation on French shows that the proposed method is able to estimate POS tags with more than a 92% accuracy (compared to a non-incremental system) while minimizing the synthesis latency (between 1 and 4 words). Perceptual evaluation (ranking test) is then carried in the context of HMM-based speech synthesis. Experimental results show that the word grouping resulting from the proposed method is rated more acceptable than word-byword incremental synthesis

Crossref

Hal - Université Grenoble Alpes

Diagnosis of acute promyelocytic leukemia based on routine biological parameters using machine learning

Author: Adriana Plesa
Dan Gugenheim
Estelle Cheli
Jenny Pouget
Lydie Andre
Marion Eveillard
Maël Heiblig
Nicolas Autexier
Nicolas Chapuis
Olivier Kosmider
Olivier Theisen
Pascal Mossuz
Pierre Sujobert
Simon Chevalier
Vincent Alcazer
Publication venue: Ferrata Storti Foundation
Publication date: 01/02/2022
Field of study

Directory of Open Access Journals

PubMed Central

HMM Training Strategy for Incremental Speech Synthesis

Author: Bailly Gérard
Baumann Timo
Hueber Thomas
Pouget Maël
Publication venue: HAL CCSD
Publication date: 06/09/2015
Field of study

International audienceIncremental speech synthesis aims at delivering the synthetic voice while the sentence is still being typed. One of the main challenges is the online estimation of the target prosody from a partial knowledge of the sentence's syntactic structure. In the context of HMM-based speech synthesis, this typically results in missing segmental and suprasegmental features, which describe the linguistic context of each phoneme. This study describes a voice training procedure which integrates explicitly a potential uncertainty on some contextual features. The proposed technique is compared to a baseline approach (previously published), which consists in substituting a missing contextual feature by a default value calculated on the training set. Both techniques were implemented in a HMM-based Text-To-Speech system for French, and compared using objective and perceptual measurements. Experimental results show that the proposed strategy outperforms the baseline technique for this language

Hal - Université Grenoble Alpes