Search CORE

Scientific Publications of the University of Toulouse II Le Mirail

Hal-Diderot

Rhythmic unit extraction and modelling for automatic language identification

Author: André-Obrecht Régine
Farinas Jérôme
Pellegrino François
Rouas Jean-Luc
Publication venue: Elsevier : North-Holland
Publication date: 01/01/2005
Field of study

International audienceThis paper deals with an approach to Automatic Language Identification based on rhythmic modelling. Beside phonetics and phonotactics, rhythm is actually one of the most promising features to be considered for language identification, even if its extraction and modelling are not a straightforward issue. Actually, one of the main problems to address is what to model. In this paper, an algorithm of rhythm extraction is described: using a vowel detection algorithm, rhythmic units related to syllables are segmented. Several parameters are extracted (consonantal and vowel duration, cluster complexity) and modelled with a Gaussian Mixture. Experiments are performed on read speech for 7 languages (English, French, German, Italian, Japanese, Mandarin and Spanish) and results reach up to 86 ± 6% of correct discrimination between stress-timed mora-timed and syllable-timed classes of languages, and to 67 ± 8% percent of correct language identification on average for the 7 languages with utterances of 21 seconds. These results are commented and compared with those obtained with a standard acoustic Gaussian mixture modelling approach (88 ± 5% of correct identification for the 7-languages identification task)

Scientific Publications of the University of Toulouse II Le Mirail

Hal-Diderot

Merging Segmental And Rhythmic Features For Automatic Language Identification

Author: André-Obrecht Régine
Farinas Jérôme
Pellegrino François
Rouas Jean-Luc
Publication venue: HAL CCSD
Publication date: 01/01/2002
Field of study

International audienc

Scientific Publications of the University of Toulouse II Le Mirail

Automatic intelligibility measures applied to speech signals simulating age-related hearing loss

Author: Aumont Xavier
Farinas Jérôme
Ferrané Isabelle
Fontan Lionel
Pinquier Julien
Publication venue: 'International Speech Communication Association'
Publication date: 01/01/2015
Field of study

International audienceThis research work forms the first part of a long-term project designed to provide a framework for facilitating hearing aids tuning. The present study focuses on the setting up of automatic measures of speech intelligibility for the recognition of isolated words and sentences. Both materials were degraded in order to simulate presbycusis effects on speech perception. Automatic measures based on an Automatic Speech Recognition (ASR) system were applied to an audio corpus simulating the effects of presbycusis at nine severity stages. The results are compared to reference intelligibility scores collected from 60 French listeners. The aim of this system being to produce measures as close as possible to human behaviour, good performances were achieved since strong correlations between subjective and objective scores are observed

Scientific Publications of the University of Toulouse II Le Mirail

Comparaison de mesures perceptives et automatiques de l'intelligibilité : application à de la parole simulant la presbyacousie

Author: Aumont Xavier
Farinas Jérôme
Ferrané Isabelle
Fontan Lionel
Gaillard Pascal
Magnen Cynthia
Pinquier Julien
Tardieu Julien
Publication venue: ATALA (Association pour le Traitement Automatique des Langues)
Publication date: 01/01/2014
Field of study

International audienceCet article présente une étude comparative entre mesures perceptives et mesures automatiques de l'intelligibilité de la parole sur de la parole dégradée par une simulation de la presbyacousie. L'objectif est de répondre à la question : peut-on se rapprocher d'une mesure perceptive humaine en utilisant un système de reconnaissance automatique de la parole ? Pour ce faire, un corpus de parole dégradée a été spécifiquement constitué puis utilisé pour des tests perceptifs et enfin soumis à un traitement automatique. De fortes corrélations entre les performances humaines et les scores de reconnaissance automatique sont observées

Scientific Publications of the University of Toulouse II Le Mirail

Automatic speech recognition predicts speech intelligibility and comprehension for listeners with simulated age-related hearing loss

Author: Aumont Xavier
Farinas Jérôme
Ferrané Isabelle
Fontan Lionel
Füllgrabe Christian
Gaillard Pascal
Magnen Cynthia
Pinquier Julien
Tardieu Julien
Publication venue: American Speech-Language-Hearing Association
Publication date: 14/03/2017
Field of study

Purpose: To assess speech processing for listeners with simulated age-related hearing loss (ARHL) and to investigate whether the observed performance can be replicated using an Automatic Speech Recognition (ASR) system. The long-term goal of this research is to develop a system that will assist audiologists/hearing-aid dispensers in the fine-tuning of hearing aids. Method: Sixty young normal-hearing participants listened to speech materials mimicking the perceptual consequences of ARHL at different levels of severity. Two intelligibility tests (repetition of words and sentences) and one comprehension test (responding to oral commands by moving virtual objects) were administered. Several language models were developed and used by the ASR system in order to fit human performances. Results: Strong significant positive correlations were observed between human and ASR scores, with coefficients up to .99. However, the spectral smearing used to simulate losses in frequency selectivity caused larger declines in ASR performance than in human performance. Conclusion: Both intelligibility and comprehension scores for listeners with simulated ARHL are highly correlated with the performances of an ASR-based system. In the future, it needs to be determined if the ASR system is similarly successful in predicting speech processing in noise and by older people with ARHL

Nottingham ePrints

Nottingham eTheses

Crossref

Repository@Nottingham

Scientific Publications of the University of Toulouse II Le Mirail

Audio Indexing on the Web: a Preliminary Study of Some Audio Descriptors

Author: Farinas Jérôme
Fohr Dominique
Illina Irina
Magrin-Chagnolleau Ivan
Mella Odile
Pinquier Julien
Rouas Jean-Luc
Sénac Christine
Vallès-Parlangeau Nathalie
Publication venue: HAL CCSD
Publication date: 01/01/2003
Field of study

Colloque avec actes et comité de lecture. internationale.International audienceThe "Invisible Web" is composed of documents which can not be currently accessed by Web search engines, because they have a dynamic URL or are not textual, like video or audio documents. For audio documents, one solution is automatic indexing. It consists in finding good descriptors of audio documents which can be used as indexes for archiving and search. This paper presents an overview and recent results of the RAIVES project, a French research project on audio indexing. We present speech/music segmentation, speaker tracking, and keywords detection. We also give a few perspectives of the RAIVES project

INRIA a CCSD electronic archive server

Chapman University Digital Commons

Projet RAIVES (Recherche Automatique d'Informations Verbales Et Sonores) vers l'extraction et la structuration de données radiophoniques sur Internet

Author: André-Obrecht Régine
Farinas Jérôme
Fohr Dominique
Illina Irina
Janiszek David
Magrin-Chagnolleau Ivan
Mella Odile
Pellegrino François
Pinquier Julien
Rouas Jean-Luc
Smaïli Kamel
Sénac Christine
Vallès-Parlangeau Nathalie
Publication venue: HAL CCSD
Publication date: 01/01/2002
Field of study

Rapport de contrat.Internet est devenu un vecteur important de la communication. Il permet la diffusion et l'échange d'un volume croissant de données. Il ne s'agit donc plus seulement de collecter des masses importantes " d'informations électroniques ", mais surtout de les répertorier, de les classer pour faciliter l'accès à l'information utile. Une information, aussi importante soit-elle, sur un site non répertorié, est méconnue. Il ne faut donc pas négliger la part du " Web invisible ". Le Web invisible peut se définir comme l'ensemble des informations non indexées, soit parce qu'elles ne sont pas répertoriées, soit parce que les pages les contenant sont dynamiques, soit encore parce que leur nature n'est pas ou difficilement indexable. En effet, la plupart des moteurs de recherche se basent sur une analyse textuelle du contenu des pages, mais ne peuvent prendre en compte le contenu des documents sonores ou visuels. Il faut donc fournir un ensemble d'éléments descripteurs du contenu pour structurer les documents afin que l'information soit accessible aux moteurs de recherche. S'agissant de documents sonores, le but de notre projet est donc, d'une part, d'extraire ces informations et, d'autre part, de fournir une structuration des documents afin de faciliter l'accès au contenu. L'indexation par le contenu de documents sonores s'appuie sur des techniques utilisées en traitement automatique de la parole, mais doit être distinguée de l'alignement automatique d'un texte sur un flux sonore ou encore de la reconnaissance automatique de la parole. Ce serait alors réduire le contenu d'un document sonore à sa seule composante verbale. Or, la composante non-verbale d'un document sonore est importante et correspond souvent à une structuration particulière du document. Par exemple, dans le cas de documents radiophoniques, on voit l'alternance de parole et de musique, plus particulièrement de jingles, pour annoncer les informations. Ainsi, nous pouvons considérer un ensemble de descripteurs du contenu d'un document radiophonique : segments de Parole/Musique, " sons clés ", langue, changements de locuteurs associés à une éventuelle identification de ces locuteurs, mots clés et thèmes. Cet ensemble peut être bien entendu enrichi. Extraire l'ensemble des descripteurs est sans doute suffisant pour référencer un document sur Internet. Mais il est intéressant d'aller plus loin et de donner accès à des parties précises du document. Chaque descripteur doit être associé à un marqueur temporel qui donne accès directement à l'information. Cependant, l'ensemble des descripteurs appartenant à des niveaux de description différents, leur organisation n'est pas linéaire dans le temps : un même locuteur peut parler en deux langues sur un même segment de parole, ou encore sur un segment de parole dans une langue donnée, plusieurs locuteurs peuvent intervenir. Il faut donc aussi être capable de fournir une structuration de l'information sur différents niveaux de représentation

Scientific Publications of the University of Toulouse II Le Mirail

INRIA a CCSD electronic archive server

Scientific Publications of the University of Toulouse II Le Mirail

Automatic Assessment of Speech Capability Loss in Disordered Speech

Author: Bernstein J.
Charlotte Alazard-Guiu
Galliano S.
Julie Mauclair
Jérôme Farinas
Kanters S.
Kitzin P.
Lionel Fontan
Luo D.
Luo D.
Marina Robert
Mauclair J.
Peggy Gatignol
Saz O.
Sevenster B.
Strik H.
Thomas Pellegrini
Young S. J.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/05/2015
Field of study

International audienceIn this article, we report on the use of an automatic technique to assess pronunciation in the context of several types of speech disorders. Even if such tools already exist, they are more widely used in a different context, namely, Computer-Assisted Language Learning, in which the objective is to assess nonnative pronunciation by detecting learners' mispronunciations at segmental and/or suprasegmental levels. In our work, we sought to determine if the Goodness of Pronunciation (GOP) algorithm, which aims to detect phone-level mispronunciations by means of automatic speech recognition, could also detect segmental deviances in disordered speech. Our main experiment is an analysis of speech from people with unilateral facial palsy. This pathology may impact the realization of certain phonemes such as bilabial plosives and sibilants. Speech read by 32 speakers at four different clinical severity grades was automatically aligned and GOP scores were computed for each phone realization. The highest scores, which indicate large dissimilarities with standard phone realizations, were obtained for the most severely impaired speakers. The corresponding speech subset was manually transcribed at phone level; 8.3% of the phones differed from standard pronunciations extracted from our lexicon. The GOP technique allowed the detection of 70.2% of mispronunciations with an equal rate of about 30% of false rejections and false acceptances. Finally, to broaden the scope of the study, we explored the correlation between GOP values and speech comprehensibility scores on a second corpus, composed of sentences recorded by six people with speech impairments due to cancer surgery or neurological disorders. Strong correlations were achieved between GOP scores and subjective comprehensibility scores (about 0.7 absolute). Results from both experiments tend to validate the use of GOP to measure speech capability loss, a dimension that could be used as a complement to physiological measures in pathologies causing speech disorders

Crossref