Search CORE

2 research outputs found

Hybrid language models for speech transcription

Author: Jouvet Denis
Orosanu Luiza
Publication venue: HAL CCSD
Publication date: 14/09/2014
Field of study

International audienceThis paper analyzes the use of hybrid language models for automatic speech transcription. The goal is to later use such an approach as a support for helping communication with deaf people, and to run it on an embedded decoder on a portable device, which introduces constraints on the model size. The main linguistic units considered for this task are the words and the syllables. Various lexicon sizes are studied by setting thresholds on the word occurrence frequencies in the training data, the less frequent words being therefore syllabified. A recognizer using this kind of language model can output between 62% and 96% of words (with respect to the thresholds on the word occurrence frequencies; the other recognized lexical units are syllables). By setting different thresholds on the confidence measures associated to the recognized words, the most reliable word hypotheses can be identified, and they have correct recognition rates between 70% and 92%

INRIA a CCSD electronic archive server

Reconnaissance de la parole pour l’aide à la communication pour les sourds et malentendants

Author: Orosanu Luiza
Publication venue: HAL CCSD
Publication date: 11/12/2015
Field of study

This thesis is part of the RAPSODIE project which aims at proposing a speech recognition device specialized on the needs of deaf and hearing impaired people. Two aspects are studied: optimizing the lexical models and extracting para-lexical information. Regarding the lexical modeling, we focused on optimizing the choice of lexical units defining the vocabulary and the associated language model. We evaluated various lexical units, such as phonemes and words, and proposed the use of syllables.We also proposed a new approach based on the combination of words and syllables in a hybrid language model. This kind of model aims to ensure proper recognition of the most frequent words and to offer sequences of syllables for speech segments corresponding to out-of-vocabulary words. Another focus was on adding new words into the language model, in order to ensure proper recognition of specific words in a certain area. We proposed and evaluated a new approach based on a principle of similarity between words ; two words are similar if they have similar neighbor distributions. The approach involves three steps: using a few examples of sentences including the new word, looking for invocabulary words similar to the new word, defining the n-grams associated with the new word based on the n-grams of its similar in-vocabulary words.Regarding the extraction of para-lexical information, we focused mainly on the detection of questions and statements, in order to inform the deaf and hearing impaired people when a question is addressed to them. In our study, several approaches were analyzed using only prosodic features (extracted from the audio signal), using only linguistic features (extracted from word sequences and sequences of POS tags) or using both types of information. The evaluation of the classifiers is performed using linguistic and prosodic features (alone or in combination) extracted from automatic transcriptions (to study the performance under real conditions) and from manual transcriptions (to study the performance under ideal conditions).Cette thèse fait partie du projet RAPSODIE dont l’objectif est de proposer une reconnaissance vocale spécialisée sur les besoins des personnes sourdes et malentendantes. Deux axes sont étudiées : la modélisation lexicale et l’extraction d’informations para-lexicales.En ce qui concerne la modélisation lexicale, nous nous sommes intéressés au choix des unités lexicales définissant le lexique et le modèle de langage associé. Nous avons évalué différentes unités lexicales, comme les phonèmes et les mots, et proposé l’utilisation des syllabes. Nous avons également proposé une nouvelle approche reposant sur la combinaison de mots et de syllabes dans un seul modèle de langage, dit hybride. L’utilisation d’un tel modèle vise à assurer une reconnaissance correcte des mots les plus fréquents et à proposer des suites de syllabes pour les segments de parole correspondant à des mots hors vocabulaire. Afin d’assurer une bonne reconnaissance des mots spécifiques à un certain domaine, nous avons approfondi l’ajout de nouveaux mots dans le modèle de langage. Nous avons proposé et évalué une nouvelle approche qui repose sur un principe de similarité entre mots ; deux mots sont considérés comme similaires s’ils ont des distributions similaires de voisins. L’approche implique plusieurs étapes : utiliser quelques phrases exemples pour le nouveau mot, chercher dans le modèle de langage des mots similaires au nouveau mot, puis définir les n-grammes associés à ce nouveau mot à partir des n-grammes des mots similaires.Concernant l’extraction d’informations para-lexicales, nous nous sommes intéressés principalement à la détection des questions et des affirmations, afin de signaler aux personnes sourdes ou malentendantes quand une question leur est adressée. Dans notre étude, plusieurs approches ont été analysées reposant sur l’utilisation des paramètres prosodiques (extraits du signal audio), des paramètres linguistiques (extraits des séquences de mots et de classes grammaticales) ou des deux types d’information. L’extraction des informations est faite à partir des signaux audio et des transcriptions automatiques ou des transcriptions manuelles, ce qui permet de comparer les performances des classifieurs dans ces deux conditions (avec ou sans erreurs sur les mots)

Thèses en Ligne

INRIA a CCSD electronic archive server

HAL-Rennes 1