1,224 research outputs found
Optimization-based modeling of Lombard speech articulation:Supraglottal characteristics
This paper shows that a highly simplified model of speech production based on the optimization of articulatory effort versus intelligibility can account for some observed articulatory consequences of signal-to-noise ratio. Simulations of static vowels in the presence of various background noise levels show that the model predicts articulatory and acoustic modifications of the type observed in Lombard speech. These features were obtained only when the constraint applied to articulatory effort decreases as the level of background noise increases. These results support the hypothesis that Lombard speech is listener oriented and speakers adapt their articulation in noisy environments.</p
Optimal control of speech with context-dependent articulatory targets
This paper presents a computational implementation of phonetic planning which consists of choosing the position of articulatory targets which satisfy conflicting linguistic and extra-linguistic requirements. We present a minimal model that considers intelligibility and least effort as task requirements. To achieve the context-dependent variability of targets, our model approximates intelligibility as a function of target phoneme recognition probability given a vector of articulatory parameters. Preliminary experiments show that our minimal computational model of phonetic planning is able to predict two types of hypoarticulation by adjusting the weight assigned to effort: vowel centralization and stop consonant lenition.Peer reviewe
Copy synthesis of phrase-level utterances
International audience—This paper presents a simulation framework for synthesizing speech from anatomically realistic data of the vocal tract. The acoustic propagation paradigm is appropriately chosen so that it can deal with complex geometries and a time-varying length of the vocal tract. The glottal source model designed in this paper allows partial closure of the glottis by branching a posterior chink in parallel to a classic lumped mass-spring model of the vocal folds. Temporal scenarios for the dynamic shapes of the vocal tract and the glottal configurations may be derived from the simultaneous acquisition of X-ray images and audio recording. Copy synthesis of a few French sentences shows the accuracy of the simulation framework to reproduce acoustic cues of natural phrase-level utterances containing most of French natural classes while considering the real geometric shape of the speaker
A glottal chink model for the synthesis of voiced fricatives
International audienceThis paper presents a simulation framework that enables a glottal chink model to be integrated into a time-domain continuous speech synthesizer along with self-oscillating vocal folds. The glottis is then made up of two main separated components: a self-oscillating part and a constantly open chink. This feature allows the simulation of voiced fricatives, thanks to a self-oscillating model of the vocal folds to generate the voiced source, and the glottal opening that is necessary to generate the frication noise. Numerical simulations show the accuracy of the model to simulate voiced fricative, and also phonetic assimilation, such as sonorization and devoicing. The simulation framework is also used to show that the phonatory/articulatory space for generating voiced fricatives is different according to the desired sound: for instance, the minimal glottal opening for generating frica-tion noise is shorter for /z/ than for /Z/
Estimation de la longueur du conduit vocal pour l'inversion acoustique-articulatoire
National audienceLa géométrie complexe du conduit vocal rend le problème d'inversion acoustique-articulatoire difficile, notamment de par son caractère fortement mal-posé. La régularisation passe par l'ajout de contraintes, soit articulatoires (modèle articulatoire, nécessitant peu de paramètres, mais nécessitant d'être adapté à chaque locuteur), soit sur les valeurs des fonctions d'aires. Dans ce cas, la longueur du conduit vocal est généralement fixée à une certaine valeur arbitraire, ne permettant pas d'analyser des éventuelles protrusions ou des élongations/raccourcissements du pharynx. L'étude présentée ici propose une approche permettant d'estimer la longueur du conduit vocal de tout locuteur à partir de l'enregistrement du signal de parole. La méthode utilisée est une méthode analyse par synthèse consistant à retrouver la fonction d'aire générant les formants estimés du signal de parole du locuteur. Elle est effectuée à partir d'une fonction d'aire initiale que l'on modifie itérativement selon la méthode des fonctions de sensibilités, d'après la théorie développée par Fant et Pauli sur les perturbations de sections à l'intérieur du conduit vocal. Les travaux présent dans la littérature utilisant cette méthode imposent cependant une longueur fixe des fonctions d'aire, et par conséquent une longueur du conduit vocal fixe. Notre approche permet de régler ce problème en prenant en compte aussi les perturbations de longueur du conduit vocal. Une étude numérique et expérimentale permet de valider la technique dans le cas de voyelles orales du français
Acoustic impact of the gradual glottal abduction on the production of fricatives: A numerical study
International audienceThe paper presents a numerical study about the acoustic impact of the glottal chink opening on the production of fricatives. Sustained fricatives are simulated by using classic lumped circuit element methods to compute the propagation of the acoustic wave along the vocal tract. A recent glottis model is connected to the wave solver to simulate a partial abduction of the vocal folds during their self-oscillating cycles. Area functions of fricatives at the three places of articulation of French (palato-alveolar, alveolar, and labiodental) have been extracted from static MRI acquisitions. Simulations highlight the existence of three distinct regimes, named A, B, and C, depending on the chink opening. They are characterized by the frication noise level: A exhibits a low frication noise level, B is a mixed noise/voice signal, and C contains only frication noise. They have significant impacts on the first spectral moments. Boundaries of these regimes are defined in terms of minimal abduction of the vocal folds, and simulations show that they depend on articulatory and glottal configurations. Regime B is shown to be unstable: it requires very specific configurations in comparison with other regimes, and acoustic features are very sensitive in this regime
- …