9 research outputs found

    Combinaison de mots et de syllabes pour transcrire la parole

    Get PDF
    International audienceCombining words and syllables for speech transcription This paper analyzes the use of hybrid language models for automatic speech transcription. The goal is to later use such an approach as a support for helping communication with deaf people, and to run it on an embedded decoder on a portable device, which introduces constraints on the model size. The main linguistic units considered for this task are the words and the syllables. Various lexicon sizes are studied by setting thresholds on the word occurrence frequencies in the training data, the less frequent words being therefore syllabified. Using this kind of language model, the recognizer can output between 69% and 96% of the words (whereas the other words, will be represented by syllables). By setting different thresholds on the confidence measures associated to the recognized words, the most reliable word hypotheses can be identified, and they have correct recognition rates between 70% and 92%.Cet article analyse l'intérêt de modèles de langage hybrides pour transcrire de la parole. L'objectif est d'utiliser une telle solution pour aider à la communication avec des personnes sourdes, et de la mettre en oeuvre sur un terminal portable, ce qui introduit des contraintes sur la taille du modèle. Les unités linguistiques considérées pour cette tâche sont les mots et les syllabes. Des lexiques de différentes tailles sont obtenus en variant le seuil de sélection associé aux fréquences d'occurrence des mots dans les données d'apprentissage, les mots les moins fréquents sont alors décomposés en syllabes. Ce type de modèle de langage peut reconnaître entre 69% et 96% des mots (le reste étant représenté par des syllabes). En ajustant le seuil sur les mesures de confiance associées aux mots reconnus, les hypothèses de mots les plus fiables peuvent être identifiées (à un taux de bonne reconnaissance variant entre 70% et 92%)

    TEXT-TO-SPEECH CONVERSION (FOR BAHASA MELAYU)

    Get PDF
    Text-to-Speech (TTS) is an application that help user in having the text given to be read out loud. This project highlighted in creating a TTS system that allows text reading in Standard Malay Language (Bahasa Melayu). There is a lack of computer aided learning (CAL) tools that emphasize in Malay linguistic and misconception that people have regarding the usage of English-based TTS to read Bahasa Melayu text derived the development ofthis project. The end result is the TTS conversion prototype for Bahasa Melayu that reads by syllable using syllabification techniques through the employment ofMaximum Onset Principle (MOP) and produce syllable sounding speech by using syllable to sound mapping method

    Problemas de compreensão oral inglesa por brasileiros

    Get PDF
    Orientador: Michael WatkinsDissertação (mestrado) - Universidade Federal do Paraná, Setor de Ciências Humanas, Letras e Artes. Curso de Pós-Graduação em Estudos Linguísticos. Defesa: Curitiba, 27/08/2004Inclui bibliografiaÁrea de concentração: Teorias de aquisição de segunda línguaResumo : A presente dissertação revê alguns modelos da compreensão de fala em L1 e L2 e sumariza alguns problemas enfrentados por ouvintes de segunda língua. Ela também aborda as diferenças fonológicas entre o inglês e o português brasileiro no que diz respeito à estrutura silábica, acento, ritmo e modificações fonológicas. A pesquisa descrita nesta dissertação consiste da investigação de alguns erros feitos por alunos brasileiros de inglês como língua estrangeira ao ouvir um texto autêntico em L2. Uma entrevista com um falante americano foi utilizada como teste, que consistia de 92 frases cortadas e com a inserção de silêncios entre elas, para que os participantes pudessem ouvir e transcrever imediatamente. Na análise dessas frases, apesar de os problemas causados pelos aspectos fônicos serem a maioria, foi percebido que o desconhecimento sintático e lexical também são muito importantes. Também foi mostrado que os erros dos alunos seguem um padrão, como manter a métrica, manter o ataque das palavras e/ou sílabas acentuadas e também a tendência de ouvir um som cujo lugar de articulação é o mesmo ou próximo do som pronunciado. A pesquisa endossa a dificuldade que os alunos têm ao ouvir o inglês fora do contexto de sala de aula.Abstract : The present thesis reviews some LI and L2 speech comprehension models and summarizes some problems faced by second language listeners. It also looks at the phonological differences between English and Brazilian Portuguese concerning syllable structure, stress, rhythm and phonological modifications. The research in this thesis consists of the investigation of some mistakes made by Brazilian students of English as a foreign language when listening to authentic oral text in L2. An interview with an American speaker was used as a test, containing 92 phrases with some periods of silence inserted among them, so that the participants could listen and transcribe immediately. When analyzing these phrases, it was noticed that even though the phonic aspects were the majority, the syntactic and lexical knowledge are really important. It was further shown that the participants' mistakes follow a pattern, such as maintaining the metrical structure and the onset of stressed words and/or syllables and also the tendency to hear a sound whose place of articulation is the same or close to the sound uttered. The research confirms the difficulty faced by students when listening to English outside the classrooms

    Reconnaissance de la parole pour l’aide à la communication pour les sourds et malentendants

    Get PDF
    This thesis is part of the RAPSODIE project which aims at proposing a speech recognition device specialized on the needs of deaf and hearing impaired people. Two aspects are studied: optimizing the lexical models and extracting para-lexical information. Regarding the lexical modeling, we focused on optimizing the choice of lexical units defining the vocabulary and the associated language model. We evaluated various lexical units, such as phonemes and words, and proposed the use of syllables.We also proposed a new approach based on the combination of words and syllables in a hybrid language model. This kind of model aims to ensure proper recognition of the most frequent words and to offer sequences of syllables for speech segments corresponding to out-of-vocabulary words. Another focus was on adding new words into the language model, in order to ensure proper recognition of specific words in a certain area. We proposed and evaluated a new approach based on a principle of similarity between words ; two words are similar if they have similar neighbor distributions. The approach involves three steps: using a few examples of sentences including the new word, looking for invocabulary words similar to the new word, defining the n-grams associated with the new word based on the n-grams of its similar in-vocabulary words.Regarding the extraction of para-lexical information, we focused mainly on the detection of questions and statements, in order to inform the deaf and hearing impaired people when a question is addressed to them. In our study, several approaches were analyzed using only prosodic features (extracted from the audio signal), using only linguistic features (extracted from word sequences and sequences of POS tags) or using both types of information. The evaluation of the classifiers is performed using linguistic and prosodic features (alone or in combination) extracted from automatic transcriptions (to study the performance under real conditions) and from manual transcriptions (to study the performance under ideal conditions).Cette thèse fait partie du projet RAPSODIE dont l’objectif est de proposer une reconnaissance vocale spécialisée sur les besoins des personnes sourdes et malentendantes. Deux axes sont étudiées : la modélisation lexicale et l’extraction d’informations para-lexicales.En ce qui concerne la modélisation lexicale, nous nous sommes intéressés au choix des unités lexicales définissant le lexique et le modèle de langage associé. Nous avons évalué différentes unités lexicales, comme les phonèmes et les mots, et proposé l’utilisation des syllabes. Nous avons également proposé une nouvelle approche reposant sur la combinaison de mots et de syllabes dans un seul modèle de langage, dit hybride. L’utilisation d’un tel modèle vise à assurer une reconnaissance correcte des mots les plus fréquents et à proposer des suites de syllabes pour les segments de parole correspondant à des mots hors vocabulaire. Afin d’assurer une bonne reconnaissance des mots spécifiques à un certain domaine, nous avons approfondi l’ajout de nouveaux mots dans le modèle de langage. Nous avons proposé et évalué une nouvelle approche qui repose sur un principe de similarité entre mots ; deux mots sont considérés comme similaires s’ils ont des distributions similaires de voisins. L’approche implique plusieurs étapes : utiliser quelques phrases exemples pour le nouveau mot, chercher dans le modèle de langage des mots similaires au nouveau mot, puis définir les n-grammes associés à ce nouveau mot à partir des n-grammes des mots similaires.Concernant l’extraction d’informations para-lexicales, nous nous sommes intéressés principalement à la détection des questions et des affirmations, afin de signaler aux personnes sourdes ou malentendantes quand une question leur est adressée. Dans notre étude, plusieurs approches ont été analysées reposant sur l’utilisation des paramètres prosodiques (extraits du signal audio), des paramètres linguistiques (extraits des séquences de mots et de classes grammaticales) ou des deux types d’information. L’extraction des informations est faite à partir des signaux audio et des transcriptions automatiques ou des transcriptions manuelles, ce qui permet de comparer les performances des classifieurs dans ces deux conditions (avec ou sans erreurs sur les mots)
    corecore