4 research outputs found

    Real Time Context-Independent Phone Recognition Using a Simplified Statistical Training Algorithm

    Get PDF
    International audienceIn this paper we present our own real time speaker-independent continuous phone recognition (Spirit) using Context-Independent Continuous Density HMMs (CI-CDHMMs) modeled by Gaussian Mixtures Models (GMMs). All the parameters of our system are estimated directly from data by using an improved Viterbi alignment process instead of the classical Baum-Welch estimation procedure. Generally, in the literature the Viterbi training algorithm is used as a pretreatment to initialize HMMs models that will be most often re-estimated by using complex re-estimation formula. In order to evaluate and compare the performance of our system with other previous works, we use the TIMIT database. The duration test of our recognition system for each sentence is between 2 seconds (for short sentences) to 12 seconds (for long sentences). We get, by combining the 64 possible phones into 39 phonetic classes, a phone recognition correct rate of 71.06% and an accuracy rate of 65.25%. These results compare favorably with previously published works

    Reconnaissance Statistique de la Parole Continue pour Voix Laryngée et Alaryngée

    Get PDF
    La Reconnaissance Automatique de la Parole (RAP) demeure depuis toujours un défiscientifique. Au cours de ces dernières années de grands efforts de recherche ont étéconcrétisés, afin de développer des systèmes d’aide et des solutions permettant d’effectuercertaine tâches jusqu’ici réservées aux humains. La parole est un mode de communicationnaturel, et un moyen facile pour échanger des informations entre humains. Unepersonne laryngectomisée, n’a pas la capacité de parler normalement puisqu’elle est dépourvuede ses cordes vocales suite à une ablation chirurgicale du larynx. Ainsi, le patientperd toute possibilité de communication avec une voix laryngée. Néanmoins, la rééducationavec un orthophoniste lui permet d’acquérir une voix de substitution dite “oesophagienne”.Contrairement à la parole laryngée (normale), cette parole oesophagienne(alaryngée) est rauque, faible en énergie et en intelligibilité ce qui la rend difficile à comprendre.L’objectif de cette thèse est la réalisation d’un système de reconnaissance automatiquede la parole oesophagienne (alaryngée). Ce système devrait être en mesure de restituer,la plus grande partie des informations phonétiques contenues dans le signal dela parole oesophagienne. Cette information textuelle fournie par la partie décodage de cesystème pourra être utilisée par un synthétiseur texte-parole (Text-To-Speech) dans le butde reconstruire une voix laryngée. Un tel système permettrait aux personnes laryngectomisées,une communication orale plus facile avec d’autres personnes.Notre première contribution est relative au développement d’un système de reconnaissanceautomatique de la parole laryngée en utilisant des modèles de Markov cachés.Les rares corpus de parole oesophagienne existants, ne sont pas dédiés à la reconnaissance,à cause d’un manque de données (souvent quelques dizaines de phrases sont enregistrées).Pour cette raison, nous avons conçu notre propre base de données dédiée àla reconnaissance de la parole oesophagienne contenant 480 phases prononcées par unlocuteur laryngectomisé.Dans une seconde partie, le système de reconnaissance de la parolelaryngée créé a été adapté et appliqué à cette parole oesophagienne. Notre dernièrecontribution au sujet de cette thèse concerne la réalisation d’un système hybride (correction= conversion + reconnaissance) fondé sur la conversion de la voix en projetant lesvecteurs acoustiques de la parole oesophagienne dans un espace moins perturbé et relatifà la parole laryngée. Nous montrons que ce système hybride est capable d’améliorer lareconnaissance de cette parole alaryngée.Automatic Speech Recognition (ASR) has always been a scientist challenge. Many researchefforts have been made over recent years to offer solutions and aiding systems inorder to carry out various tasks previously dedicated only to humans. Speech is consideredthe most natural mode of communication, and an easy way for exchanging informationbetween humans. A laryngectomee person lacks the ability of speaking normallybecause he/her lost his/her vocal cords after a surgical ablation of the larynx. Thus, thepatient loses the phonation ability. Only a reeducation by a speech therapist allows thisperson to provide a new substitution voice called “esophageal”. Unlike laryngeal speech(normal), esophageal speech (alaryngeal) is hoarse, weak in intensity and in intelligibilitywhichmakes it difficult to understand.The goal of this thesis is the implementation of an automatic esophageal speech (alaryngeal)recognition system. This system should be able to provide most of the phoneticinformation contained in the esophageal speech signal. The decoding part of this systemconnected to a text-to-speech synthesizer should allow the reconstruction of a laryngealvoice. Such a system should permit laryngectomees an easier oral communication withother people.Our first contribution concerns the development of an automatic laryngeal speech recognitionsystem using hidden Markov models. The few existing corpora of esophagealspeech, are not dedicated to recognition, because of a lack of data (only a few dozen sentencesare registered in practice). For this reason, we designed our own database dedicatedto esophageal speech recognition containing 480 sentences spoken by a laryngectomeespeaker. In the second part, our devoted laryngeal speech recognition system hasbeen adapted and applied to this esophageal speech. Our last contribution of this thesisconcerns the realization of a hybrid system (correction = conversion + recognition) basedon voice conversion by projecting the acoustic feature vectors of esophageal speech in aless disturbed space related to laryngeal speech. We demonstrate that this hybrid systemis able to improve the recognition of alaryngeal speech

    Improving the recognition of pathological voice using the discriminant HLDA transformation

    Get PDF
    International audienceIn this paper, we propose a simple and fast method for evaluating the pathological voice (esophageal) by applying the continuous speech recognition in a speaker dependent mode, on our own database of the pathological voice, we call FPSD (French Pathological Speech Database). The recognition system used is implemented using the HTK platform, based on HMM/GMM monophone models. The acoustic vectors are linearly transformed by the HLDA (Heteroscedastic Linear Discriminant Analysis) method to reduce their size in a smaller space with good discriminative properties. The obtained phone recognition rate (63.59 %) is very promising when we know that esophageal voice contains unnatural sounds, difficult to understand

    A preliminary study on improving the recognition of esophageal speech using a hybrid system based on statistical voice conversion

    Get PDF
    International audienceIn this paper, we propose a hybrid system based on a modified statistical GMM voice conversion algorithm for improving the recognition of esophageal speech. This hybrid system aims to compensate for the distorted information present in the esophageal acoustic features by using a voice conversion method. The esophageal speech is converted into a “target” laryngeal speech using an iterative statistical estimation of a transformation function. We did not apply a speech synthesizer for reconstructing the converted speech signal, given that the converted Mel cepstral vectors are used directly as input of our speech recognition system. Furthermore the feature vectors are linearly transformed by the HLDA (heteroscedastic linear discriminant analysis) method to reduce their size in a smaller space having good discriminative properties. The experimental results demonstrate that our proposed system provides an improvement of the phone recognition accuracy with an absolute increase of 3.40 % when compared with the phone recognition accuracy obtained with neither HLDA nor voice conversion
    corecore