5 research outputs found

    Time-Domain Isolated Phoneme Classification Using Reconstructed Phase Spaces

    Get PDF
    This paper introduces a novel time-domain approach to modeling and classifying speech phoneme waveforms. The approach is based on statistical models of reconstructed phase spaces, which offer significant theoretical benefits as representations that are known to be topologically equivalent to the state dynamics of the underlying production system. The lag and dimension parameters of the reconstruction process for speech are examined in detail, comparing common estimation heuristics for these parameters with corresponding maximum likelihood recognition accuracy over the TIMIT data set. Overall accuracies are compared with a Mel-frequency cepstral baseline system across five different phonetic classes within TIMIT, and a composite classifier using both cepstral and phase space features is developed. Results indicate that although the accuracy of the phase space approach by itself is still currently below that of baseline cepstral methods, a combined approach is capable of increasing speaker independent phoneme accuracy

    Evolutionary Speech Recognition

    Get PDF
    Automatic speech recognition systems are becoming ever more common and are increasingly deployed in more variable acoustic conditions, by very different speakers. So these systems, generally conceived in a laboratory, must be robust in order to provide optimal performance in real situations. This article explores the possibility of gaining robustness by designing speech recognition systems able to auto-modify in real time, in order to adapt to the changes of acoustic environment. As a starting point, the adaptive capacities of living organisms were considered in relation to their environment. Analogues of these mechanisms were then applied to automatic speech recognition systems. It appeared to be interesting to imagine a system adapting to the changing acoustic conditions in order to remain effective regardless of its conditions of use

    Investigating Stranded GMM for Improving Automatic Speech Recognition

    Get PDF
    International audienceThis paper investigates recently proposed Stranded Gaussian Mixture acoustic Model (SGMM) for Automatic Speech Recognition (ASR). This model extends conventional hidden Markov model (HMM-GMM) by explicitly introducing dependencies between components of the observation Gaussian mixture densities. The main objective of the paper is to experimentally study, how useful SGMM can be for dealing with data, which contains different sources of acoustic variability. First studied sources of variability are age and gender in quiet environment (TIdigits task including child speech). Second, the SGMM modeling is applied on data produced by different speakers and corrupted by non-stationary noise (CHiME 2013 challenge data). Finally, SGMM is applied on the same noisy data, but after performing speech enhancement (i.e., the remaining variability mostly comes from residual noise and different speakers). Although SGMM was originally proposed for robust speech recognition of noisy data, in this work it was found, that the model is more efficient for handling speaker variability in quiet environment

    L'analyse factorielle pour la modélisation acoustique des systèmes de reconnaissance de la parole

    Get PDF
    Dans cette thèse, nous proposons d utiliser des techniques fondées sur l analyse factorielle pour la modélisation acoustique pour le traitement automatique de la parole, notamment pour la Reconnaissance Automatique de la parole. Nous nous sommes, dans un premier temps, intéressés à la réduction de l empreinte mémoire des modèles acoustiques. Notre méthode à base d analyse factorielle a démontré une capacité de mutualisation des paramètres des modèles acoustiques, tout en maintenant des performances similaires à celles des modèles de base. La modélisation proposée nous conduit à décomposer l ensemble des paramètres des modèles acoustiques en sous-ensembles de paramètres indépendants, ce qui permet une grande flexibilité pour d éventuelles adaptations (locuteurs, genre, nouvelles tâches).Dans les modélisations actuelles, un état d un Modèle de Markov Caché (MMC) est représenté par un mélange de Gaussiennes (GMM : Gaussian Mixture Model). Nous proposons, comme alternative, une représentation vectorielle des états : les fac- teur d états. Ces facteur d états nous permettent de mesurer efficacement la similarité entre les états des MMC au moyen d une distance euclidienne, par exemple. Grâce à cette représenation vectorielle, nous proposons une méthode simple et efficace pour la construction de modèles acoustiques avec des états partagés. Cette procédure s avère encore plus efficace dans le cas de langues peu ou très peu dotées en ressouces et enconnaissances linguistiques. Enfin, nos efforts se sont portés sur la robustesse des systèmes de reconnaissance de la parole face aux variabilités acoustiques, et plus particulièrement celles générées par l environnement. Nous nous sommes intéressés, dans nos différentes expérimentations, à la variabilité locuteur, à la variabilité canal et au bruit additif. Grâce à notre approche s appuyant sur l analyse factorielle, nous avons démontré la possibilité de modéliser ces différents types de variabilité acoustique nuisible comme une composante additive dans le domaine cepstral. Nous soustrayons cette composante des vecteurs cepstraux pour annuler son effet pénalisant pour la reconnaissance de la paroleIn this thesis, we propose to use techniques based on factor analysis to build acoustic models for automatic speech processing, especially Automatic Speech Recognition (ASR). Frstly, we were interested in reducing the footprint memory of acoustic models. Our factor analysis-based method demonstrated that it is possible to pool the parameters of acoustic models and still maintain performance similar to the one obtained with the baseline models. The proposed modeling leads us to deconstruct the ensemble of the acoustic model parameters into independent parameter sub-sets, which allow a great flexibility for particular adaptations (speakers, genre, new tasks etc.). With current modeling techniques, the state of a Hidden Markov Model (HMM) is represented by a combination of Gaussians (GMM : Gaussian Mixture Model). We propose as an alternative a vector representation of states : the factors of states. These factors of states enable us to accurately measure the similarity between the states of the HMM by means of an euclidean distance for example. Using this vector represen- tation, we propose a simple and effective method for building acoustic models with shared states. This procedure is even more effective when applied to under-resourced languages. Finally, we concentrated our efforts on the robustness of the speech recognition sys- tems to acoustic variabilities, particularly those generated by the environment. In our various experiments, we examined speaker variability, channel variability and additive noise. Through our factor analysis-based approach, we demonstrated the possibility of modeling these different types of acoustic variability as an additive component in the cepstral domain. By compensation of this component from the cepstral vectors, we are able to cancel out the harmful effect it has on speech recognitionAVIGNON-Bib. numérique (840079901) / SudocSudocFranceF

    Aportación a la extracción paramétrica en reconocimiento de voz robusto basada en la aplicación de conocimiento de fonética acústica

    Full text link
    This thesis is based on the following hypothesis: the introduction of direct knowledge from the acoustic-phonetic field to the speech recognition problem, especially in the feature extraction step, may constitute a solid base of analysis for the determination of the behavior and capabilities of those systems and their improvement, as well. Most of the complexity of this Ph.D. thesis comes from the different subjects related with the speech processing área. The application of acoustic-phonetic information to the speech recognition research área implies a deep knowledge of both subjects. The research carried out in this work has been divided in two main parts: analysis of the current feature extraction methods and a study of several possible procedures about the incorporation of phonetic-acoustic knowledge to those systems. Abundant recognition and related quality measure results are presented for 50 different parameter extraction models. Details about the real-time implementation on a DSP platform (TMS3230C31-60) of two different parameter extraction models are presented. Finally, a set of computer tools developed for building and testing new speech recognition systems has been produced. Besides, the application of several results from this work can be extended to other speech processing áreas, such as computer assisted language learning, linguistic rehabilitation, etc.---ABSTRACT---La hipótesis en la que se basa el desarrollo de esta tesis, se centra en la suposición de que la aportación de conocimiento directo, proveniente del campo de la fonética acústica, al problema del reconocimiento automático de la voz, en concreto a la etapa de extracción de características, puede constituir una base sólida con la que poder analizar el comportamiento y capacidad de discriminación de dichos sistemas, así como una forma de mejorar sus prestaciones. Parte de la complejidad que presenta esta tesis doctoral, viene motivada por las diferentes disciplinas que están relacionadas con el área de procesamiento de la voz. La aplicación de información fonética-acústica al campo de investigación del reconocimiento del habla requiere un amplio conocimiento de ambas materias. Las investigaciones desarrolladas en este trabajo se han dividido en dos bloques fundamentales: análisis de los métodos actuales de extracción de rasgos fonéticos y un estudio de algunas posibles formas de incorporación de conocimiento fonético-acústico a dichos sistemas. En esta tesis se ofrecen abundantes resultados relativos a tasas de reconocimiento y medidas acerca de la calidad de este proceso, para un total de 50 modelos de extracción de parámetros. Así mismo se incluyen los detalles de la implementación en tiempo real para una plataforma DSP, en concreto TMS320C31-60, de dos diferentes modelos de extracción de rasgos. Además, se ha desarrollado un conjunto de las herramientas informáticas que pueden servir de base para construir y validar de forma sencilla, nuevos sistemas de reconocimiento. La aplicación de algunos de los resultados del trabajo puede extenderse también a otras áreas del tratamiento de la voz, tales como la enseñanza de una segunda lengua, logopedia, etc
    corecore