103 research outputs found

    Extraction automatique de paramĂštres prosodiques pour l'identification automatique des langues

    Get PDF
    International audienceThe aim of this study is to propose a new approach to Automatic Language Identi - cation: it is based on rhythmic modelling and fundamental frequency modelling and does not require any hand labelled data. First we need to investigate how prosodic or rhythmic information can be taken into account for Automatic Language Identi cation. A new automatically extracted unit, the pseudo syllable, is introduced. Rhythmic and intonative features are then automatically extracted from this unit. Elementary decision modules are de ned with gaussian mixture models. These prosodic modellings are combined with a more classical approach, a vocalic system acoustic modelling. Experiments are conducted on the ve European languages of the MULTEXT corpus: English, French, German, Italian and Spanish. The relevance of the rhythmic parameters and the ef ciency of each system (rhythmic model, fundamental frequency model and vowel system model) are evaluated. The in uence of these approaches on the performances of automatic language identi cation system is addressed. We obtain 91 % of correct identi cation with 21 s. utterances using all the information sources

    Rhythmic unit extraction and modelling for automatic language identification

    Get PDF
    International audienceThis paper deals with an approach to Automatic Language Identification based on rhythmic modelling. Beside phonetics and phonotactics, rhythm is actually one of the most promising features to be considered for language identification, even if its extraction and modelling are not a straightforward issue. Actually, one of the main problems to address is what to model. In this paper, an algorithm of rhythm extraction is described: using a vowel detection algorithm, rhythmic units related to syllables are segmented. Several parameters are extracted (consonantal and vowel duration, cluster complexity) and modelled with a Gaussian Mixture. Experiments are performed on read speech for 7 languages (English, French, German, Italian, Japanese, Mandarin and Spanish) and results reach up to 86 ± 6% of correct discrimination between stress-timed mora-timed and syllable-timed classes of languages, and to 67 ± 8% percent of correct language identification on average for the 7 languages with utterances of 21 seconds. These results are commented and compared with those obtained with a standard acoustic Gaussian mixture modelling approach (88 ± 5% of correct identification for the 7-languages identification task)

    Automatic intelligibility measures applied to speech signals simulating age-related hearing loss

    Get PDF
    International audienceThis research work forms the first part of a long-term project designed to provide a framework for facilitating hearing aids tuning. The present study focuses on the setting up of automatic measures of speech intelligibility for the recognition of isolated words and sentences. Both materials were degraded in order to simulate presbycusis effects on speech perception. Automatic measures based on an Automatic Speech Recognition (ASR) system were applied to an audio corpus simulating the effects of presbycusis at nine severity stages. The results are compared to reference intelligibility scores collected from 60 French listeners. The aim of this system being to produce measures as close as possible to human behaviour, good performances were achieved since strong correlations between subjective and objective scores are observed

    Comparaison de mesures perceptives et automatiques de l'intelligibilité : application à de la parole simulant la presbyacousie

    Get PDF
    International audienceCet article présente une étude comparative entre mesures perceptives et mesures automatiques de l'intelligibilité de la parole sur de la parole dégradée par une simulation de la presbyacousie. L'objectif est de répondre à la question : peut-on se rapprocher d'une mesure perceptive humaine en utilisant un systÚme de reconnaissance automatique de la parole ? Pour ce faire, un corpus de parole dégradée a été spécifiquement constitué puis utilisé pour des tests perceptifs et enfin soumis à un traitement automatique. De fortes corrélations entre les performances humaines et les scores de reconnaissance automatique sont observées

    Automatic speech recognition predicts speech intelligibility and comprehension for listeners with simulated age-related hearing loss

    Get PDF
    Purpose: To assess speech processing for listeners with simulated age-related hearing loss (ARHL) and to investigate whether the observed performance can be replicated using an Automatic Speech Recognition (ASR) system. The long-term goal of this research is to develop a system that will assist audiologists/hearing-aid dispensers in the fine-tuning of hearing aids. Method: Sixty young normal-hearing participants listened to speech materials mimicking the perceptual consequences of ARHL at different levels of severity. Two intelligibility tests (repetition of words and sentences) and one comprehension test (responding to oral commands by moving virtual objects) were administered. Several language models were developed and used by the ASR system in order to fit human performances. Results: Strong significant positive correlations were observed between human and ASR scores, with coefficients up to .99. However, the spectral smearing used to simulate losses in frequency selectivity caused larger declines in ASR performance than in human performance. Conclusion: Both intelligibility and comprehension scores for listeners with simulated ARHL are highly correlated with the performances of an ASR-based system. In the future, it needs to be determined if the ASR system is similarly successful in predicting speech processing in noise and by older people with ARHL

    Audio Indexing on the Web: a Preliminary Study of Some Audio Descriptors

    Get PDF
    Colloque avec actes et comité de lecture. internationale.International audienceThe "Invisible Web" is composed of documents which can not be currently accessed by Web search engines, because they have a dynamic URL or are not textual, like video or audio documents. For audio documents, one solution is automatic indexing. It consists in finding good descriptors of audio documents which can be used as indexes for archiving and search. This paper presents an overview and recent results of the RAIVES project, a French research project on audio indexing. We present speech/music segmentation, speaker tracking, and keywords detection. We also give a few perspectives of the RAIVES project

    Projet RAIVES (Recherche Automatique d'Informations Verbales Et Sonores) vers l'extraction et la structuration de données radiophoniques sur Internet

    Get PDF
    Rapport de contrat.Internet est devenu un vecteur important de la communication. Il permet la diffusion et l'Ă©change d'un volume croissant de donnĂ©es. Il ne s'agit donc plus seulement de collecter des masses importantes " d'informations Ă©lectroniques ", mais surtout de les rĂ©pertorier, de les classer pour faciliter l'accĂšs Ă  l'information utile. Une information, aussi importante soit-elle, sur un site non rĂ©pertoriĂ©, est mĂ©connue. Il ne faut donc pas nĂ©gliger la part du " Web invisible ". Le Web invisible peut se dĂ©finir comme l'ensemble des informations non indexĂ©es, soit parce qu'elles ne sont pas rĂ©pertoriĂ©es, soit parce que les pages les contenant sont dynamiques, soit encore parce que leur nature n'est pas ou difficilement indexable. En effet, la plupart des moteurs de recherche se basent sur une analyse textuelle du contenu des pages, mais ne peuvent prendre en compte le contenu des documents sonores ou visuels. Il faut donc fournir un ensemble d'Ă©lĂ©ments descripteurs du contenu pour structurer les documents afin que l'information soit accessible aux moteurs de recherche. S'agissant de documents sonores, le but de notre projet est donc, d'une part, d'extraire ces informations et, d'autre part, de fournir une structuration des documents afin de faciliter l'accĂšs au contenu. L'indexation par le contenu de documents sonores s'appuie sur des techniques utilisĂ©es en traitement automatique de la parole, mais doit ĂȘtre distinguĂ©e de l'alignement automatique d'un texte sur un flux sonore ou encore de la reconnaissance automatique de la parole. Ce serait alors rĂ©duire le contenu d'un document sonore Ă  sa seule composante verbale. Or, la composante non-verbale d'un document sonore est importante et correspond souvent Ă  une structuration particuliĂšre du document. Par exemple, dans le cas de documents radiophoniques, on voit l'alternance de parole et de musique, plus particuliĂšrement de jingles, pour annoncer les informations. Ainsi, nous pouvons considĂ©rer un ensemble de descripteurs du contenu d'un document radiophonique : segments de Parole/Musique, " sons clĂ©s ", langue, changements de locuteurs associĂ©s Ă  une Ă©ventuelle identification de ces locuteurs, mots clĂ©s et thĂšmes. Cet ensemble peut ĂȘtre bien entendu enrichi. Extraire l'ensemble des descripteurs est sans doute suffisant pour rĂ©fĂ©rencer un document sur Internet. Mais il est intĂ©ressant d'aller plus loin et de donner accĂšs Ă  des parties prĂ©cises du document. Chaque descripteur doit ĂȘtre associĂ© Ă  un marqueur temporel qui donne accĂšs directement Ă  l'information. Cependant, l'ensemble des descripteurs appartenant Ă  des niveaux de description diffĂ©rents, leur organisation n'est pas linĂ©aire dans le temps : un mĂȘme locuteur peut parler en deux langues sur un mĂȘme segment de parole, ou encore sur un segment de parole dans une langue donnĂ©e, plusieurs locuteurs peuvent intervenir. Il faut donc aussi ĂȘtre capable de fournir une structuration de l'information sur diffĂ©rents niveaux de reprĂ©sentation

    Automatic Assessment of Speech Capability Loss in Disordered Speech

    Get PDF
    International audienceIn this article, we report on the use of an automatic technique to assess pronunciation in the context of several types of speech disorders. Even if such tools already exist, they are more widely used in a different context, namely, Computer-Assisted Language Learning, in which the objective is to assess nonnative pronunciation by detecting learners' mispronunciations at segmental and/or suprasegmental levels. In our work, we sought to determine if the Goodness of Pronunciation (GOP) algorithm, which aims to detect phone-level mispronunciations by means of automatic speech recognition, could also detect segmental deviances in disordered speech. Our main experiment is an analysis of speech from people with unilateral facial palsy. This pathology may impact the realization of certain phonemes such as bilabial plosives and sibilants. Speech read by 32 speakers at four different clinical severity grades was automatically aligned and GOP scores were computed for each phone realization. The highest scores, which indicate large dissimilarities with standard phone realizations, were obtained for the most severely impaired speakers. The corresponding speech subset was manually transcribed at phone level; 8.3% of the phones differed from standard pronunciations extracted from our lexicon. The GOP technique allowed the detection of 70.2% of mispronunciations with an equal rate of about 30% of false rejections and false acceptances. Finally, to broaden the scope of the study, we explored the correlation between GOP values and speech comprehensibility scores on a second corpus, composed of sentences recorded by six people with speech impairments due to cancer surgery or neurological disorders. Strong correlations were achieved between GOP scores and subjective comprehensibility scores (about 0.7 absolute). Results from both experiments tend to validate the use of GOP to measure speech capability loss, a dimension that could be used as a complement to physiological measures in pathologies causing speech disorders
    • 

    corecore