Search CORE

83 research outputs found

Reconnaissance automatique de la parole : génération des prononciations non natives pour l'enrichissement du lexique

Author: Bada Ismael
Fohr Dominique
Illina Irina
Publication venue: AFCP
Publication date: 01/01/2020
Field of study

Dans cet article nous proposons une méthode d’adaptation du lexique, destinée à améliorer les systèmes de la reconnaissance automatique de la parole (SRAP) des locuteurs non natifs. En effet, la reconnaissance automatique souffre d’une chute significative de ses performances quand elle est utilisée pour reconnaître la parole des locuteurs non natifs, car les phonèmes de la langue étrangère sont fréquemment mal prononcés par ces locuteurs. Pour prendre en compte ce problème de prononciations erronées, notre approche propose d’intégrer les prononciations non natives dans le lexique et par la suite d’utiliser ce lexique enrichi pour la reconnaissance. Pour réaliser notre approche nous avons besoin d’un petit corpus de parole non native et de sa transcription. Pour générer les prononciations non natives, nous proposons de tenir compte des correspondances graphèmes-phonèmes en vue de générer de manière automatique des règles de création de nouvelles prononciations. Ces nouvelles prononciations seront ajoutées au lexique. Nous présentons une évaluation de notre méthode sur un corpus de locuteurs non natifs français s’exprimant en anglais

INRIA a CCSD electronic archive server

Amélioration des Performances des Systèmes Automatiques de Reconnaissance de la Parole pour la Parole Non Native

Author: Bouselmi Ghazi
Fohr Dominique
Haton Jean-Paul
Illina Irina
Publication venue: HAL CCSD
Publication date: 22/05/2007
Field of study

International audienceIn this article, we present an approach for non native automatic speech recognition (ASR). We propose two methods to adapt existing ASR systems to the non-native accents. The first method is based on the modification of acoustic models through integration of acoustic models from the mother tong. The phonemes of the target language are pronounced in a similar manner to the native language of speakers. We propose to combine the models of confused phonemes so that the ASR system could recognize both concurrent pronounciations. The second method we propose is a refinment of the pronounciation error detection through the introduction of graphemic constraints. Indeed, non native speakers may rely on the writing of words in their uttering. Thus, the pronounctiation errors might depend on the characters composing the words. The average error rate reduction that we observed is (22.5%) relative for the sentence error rate, and 34.5% (relative) in word error rate

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Reconnaissance de parole non native fondée sur l'utilisation de confusion phonétique et de contraintes graphèmiques

Author: Bouselmi Ghazi
Fohr Dominique
Haton Jean-Paul
Illina Irina
Publication venue: HAL CCSD
Publication date: 12/06/2006
Field of study

This paper presents a fully automated approach for the recognition of non native speech based on acoustic model modification. For a native language (LM) and a spoken language (LP), pronunciation variants of the phones of LP are automatically extracted from an existing non native database. These variants are stored in a confusion matrix between phones of LP and sequences of phones of LM. This confusion concept deals with the problem of non existence of match between some LM and LP phones. The confusion matrix is then used to modify the acoustic models (HMMs) of LP phones by integrating corresponding LM phone models as alternative HMM paths. We introduce graphemic contraints in the confusion extraction process. We claim that prononciation errors may depend on the graphemes related to each phone. The modified ASR system achieved a significant improvement varying between 20.3% and 43.2% (relative) in ``sentence error rate'' and between 26.6% and 50.0% (relative) in ``word error rate''. The introduction of graphemic contraints in the phonetic confusion allowed improvements while using the word-loop grammar

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Towards the automatic processing of Yongning Na (Sino-Tibetan): developing a 'light' acoustic model of the target language and testing 'heavyweight' models from five national languages

Author: Castelli Eric
Do Thi-Ngoc-Diep
Michaud Alexis
Publication venue: HAL CCSD
Publication date: 14/05/2014
Field of study

International audienceAutomatic speech processing technologies hold great potential to facilitate the urgent task of documenting the world's languages. The present research aims to explore the application of speech recognition tools to a little-documented language, with a view to facilitating processes of annotation, transcription and linguistic analysis. The target language is Yongning Na (a.k.a. Mosuo), an unwritten Sino-Tibetan language with less than 50,000 speakers. An acoustic model of Na was built using CMU Sphinx. In addition to this 'light' model, trained on a small data set (only 4 hours of speech from 1 speaker), 'heavyweight' models from five national languages (English, French, Chinese, Vietnamese and Khmer) were also applied to the same data. Preliminary results are reported, and perspectives for the long road ahead are outlined

Hal - Université Grenoble Alpes

Automatic Feedback for L2 Prosody Learning

Author: Anne Bonneau
Vincent Colotte
Publication venue: 'IntechOpen'
Publication date: 21/06/2011
Field of study

International audienceWe have designed automatic feedback for the realisation of the prosody of a foreign language. Besides classical F0 displays, two kinds of feedback are provided to learners, each of them based upon a comparison between a reference and the learner's production. The first feedback, a diagnosis, provided both in the form of a short text and visual displays such as arrows, comes from an acoustic evaluation of the learner's realisation; it deals with two prosodic cues: the melodic curve, and phoneme duration. The second feedback is perceptual and consists in a replacement of the learner's prosodic cues (duration and F0) by those of the reference. A pilot experiment has been undertaken to test the immediate impact of the "advanced" feedback proposed here. We have chosen to test the production of English lexical accent in isolated words by French speakers. It shows that feedback based upon diagnosis and speech modification enables French learners with a low production level to improve their realisations of English lexical accents more than (simple) auditory feedback. On the contrary, for advanced learners involved in this study, auditory feedback appears to be as efficient as more elaborated feedback

IntechOpen

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Automatic prosodic analysis for computer aided pronunciation teaching

Author: Bagshaw Paul Christopher
Publication venue: The University of Edinburgh
Publication date: 01/01/1994
Field of study

Correct pronunciation of spoken language requires the appropriate modulation of acoustic characteristics of speech to convey linguistic information at a suprasegmental level. Such prosodic modulation is a key aspect of spoken language and is an important component of foreign language learning, for purposes of both comprehension and intelligibility. Computer aided pronunciation teaching involves automatic analysis of the speech of a non-native talker in order to provide a diagnosis of the learner's performance in comparison with the speech of a native talker. This thesis describes research undertaken to automatically analyse the prosodic aspects of speech for computer aided pronunciation teaching. It is necessary to describe the suprasegmental composition of a learner's speech in order to characterise significant deviations from a native-like prosody, and to offer some kind of corrective diagnosis. Phonological theories of prosody aim to describe the suprasegmental composition of speech..

CiteSeerX

Edinburgh Research Archive

Détection et classification de traits paralinguistiques par des métriques rythmiques de la parole

Author: Gharsallaoui Soumaya
Publication venue
Publication date: 01/01/2016
Field of study

Dépôt numérique de UQTR

Constitution d'un Corpus de Français Langue Etrangère destiné aux Apprenants Allemands

Author: Bonneau Anne
Colotte Vincent
Fauth Camille
Fohr Dominique
Jouvet Denis
Laprie Yves
Mella Odile
Trouvain Jürgen
Publication venue: 'EDP Sciences'
Publication date: 01/01/2014
Field of study

International audienceLa plupart des corpus en langue se focalisent sur les phénomènes linguistiques écrits et concernent l’anglais (voir le site web : « Learner corpora around the world » de l’Université de Louvain - Belgique). La recherche phonétique sur l’acquisition d’une L2 est généralement orientée vers l’étude des phénomènes segmentaux et la plupart des études ont également l’anglais comme langue cible. Les modèles de parole en L2 actuels - voir par exemple Speech Learning Model (Flege, 1995) ou Best’s Perceptual Assimilation Model (Best, 1995) – négligent bien souvent les aspects prosodiques. Notre étude concerne le français en tant que langue seconde et s’inscrit dans un projet plus vaste mené en partenariat avec une université allemande, dont l’un des buts est le développement de l’apprentissage des langues par ordinateur. (Projet ANR-DFG – Agence Nationale de la Recherche et Deutsche Forschungsgemeinschaft attribué à l’équipe Parole du LORIA UMR 7503, Nancy – France et à l’Equipe de Linguistique Computationnelle et de Phonétique FR 4.7 de l’Université de la Sarre Sarrebruck – Allemagne) dans lequel le français et l’allemand sont des langues cibles. Pour la paire allemand-français, peu de corpus parallèles sont disponibles. Nous présentons ici l’élaboration d’un corpus de productions orales de locuteurs natifs et non natifs pour la paire allemand-français. Notre corpus entend mettre au jour les déviations phonétiques et phonologiques que les locuteurs allemands produisent lorsqu’ils apprennent le français. Ce travail s’insère dans un projet plus global, Ce projet entend étudier les difficultés que les locuteurs français rencontrent lorsqu’ils apprennent l’allemand, et réciproquement. Aussi, cinquante locuteurs allemands seront recrutés dans des milieux universitaires et scolaires (niveau lycée) en Allemagne et cinquante locuteurs français dans les mêmes milieux en France. Il s’agit pour les deux populations de produire d’une part le corpus en langue étrangère (en langue française pour les locuteurs allemands et en langue allemande pour les locuteurs français) mais également le corpus en langue maternelle (en allemand pour les allemands et en français pour les français). Les corpus ainsi obtenus devraient nous permettre d’identifier les difficultés que les locuteurs allemands ou français rencontrent lorsqu’ils apprennent le français ou l’allemand. Les données de contrôle sont doubles puisque l’on pourra à la fois se référer aux productions des apprenants dans leur langue maternelle (ici l’allemand), mais également à celles de locuteurs natifs (ici germanophones). Nous ne présenterons ici que la constitution du corpus en français

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

INRIA a CCSD electronic archive server

Automatic Speech recognition, with large vocabulary, robustness, independence of speaker and multilingual processing

Author: CAON D. R. S.
Publication venue: Mestrado em Informática
Publication date: 27/08/2010
Field of study

Durante todo o trabalho, o sistema de reconhecimento de fala contínua de grande vocabulário Julius é utilizado em conjunto com o Hidden Markov Model Toolkit(HTK). O sistema Julius tem suas principais características descritas, tendo inclusive sido modificado. Inicialmente, a teoria de reconhecimento de sinais de fala é demonstrada. Experimentos são feitos com adaptação de modelos ocultos de Marvov e com a técnica de validação cruzada K-Fold. Resultados de reconhecimento de fala após adaptação acústica à um locutor específico (e da criação de modelos de linguagem específicos para um cenário de demonstração do sistema) demonstraram 86.39% de taxa de acerto de sentença para os modelos acústicos holandeses. Os mesmos dados demonstram 94.44% de taxa de acerto semântico de sentença

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositório Institucional da Universidade Federal do Espirito Santo