20 research outputs found

    Design of reservoir computing systems for the recognition of noise corrupted speech and handwriting

    Get PDF

    Large vocabulary recognition for online Turkish handwriting with sublexical units

    Get PDF
    We present a system for large vocabulary recognition of online Turkish handwriting, using hidden Markov models. While using a traditional approach for the recognizer, we have identified and developed solutions for the main problems specific to Turkish handwriting recognition. First, since large amounts of Turkish handwriting samples are not available, the system is trained and optimized using the large UNIPEN dataset of English handwriting, before extending it to Turkish using a small Turkish dataset. The delayed strokes, which pose a significant source of variation in writing order due to the large number of diacritical marks in Turkish, are removed during preprocessing. Finally, as a solution to the high out-of-vocabulary rates encountered when using a fixed size lexicon in general purpose recognition, a lexicon is constructed from sublexical units (stems and endings) learned from a large Turkish corpus. A statistical bigram language model learned from the same corpus is also applied during the decoding process. The system obtains a 91.7% word recognition rate when tested on a small Turkish handwritten word dataset using a medium sized (1950 words) lexicon corresponding to the vocabulary of the test set and 63.8% using a large, general purpose lexicon (130,000 words). However, with the proposed stem+ending lexicon (12,500 words) and bigram language model with lattice expansion, a 67.9% word recognition accuracy is obtained, surpassing the results obtained with the general purpose lexicon while using a much smaller one

    Support Vector Machines for Speech Recognition

    Get PDF
    Hidden Markov models (HMM) with Gaussian mixture observation densities are the dominant approach in speech recognition. These systems typically use a representational model for acoustic modeling which can often be prone to overfitting and does not translate to improved discrimination. We propose a new paradigm centered on principles of structural risk minimization using a discriminative framework for speech recognition based on support vector machines (SVMs). SVMs have the ability to simultaneously optimize the representational and discriminative ability of the acoustic classifiers. We have developed the first SVM-based large vocabulary speech recognition system that improves performance over traditional HMM-based systems. This hybrid system achieves a state-of-the-art word error rate of 10.6% on a continuous alphadigit task ? a 10% improvement relative to an HMM system. On SWITCHBOARD, a large vocabulary task, the system improves performance over a traditional HMM system from 41.6% word error rate to 40.6%. This dissertation discusses several practical issues that arise when SVMs are incorporated into the hybrid system

    Speech Recognition

    Get PDF
    Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes

    Deep Learning for Distant Speech Recognition

    Full text link
    Deep learning is an emerging technology that is considered one of the most promising directions for reaching higher levels of artificial intelligence. Among the other achievements, building computers that understand speech represents a crucial leap towards intelligent machines. Despite the great efforts of the past decades, however, a natural and robust human-machine speech interaction still appears to be out of reach, especially when users interact with a distant microphone in noisy and reverberant environments. The latter disturbances severely hamper the intelligibility of a speech signal, making Distant Speech Recognition (DSR) one of the major open challenges in the field. This thesis addresses the latter scenario and proposes some novel techniques, architectures, and algorithms to improve the robustness of distant-talking acoustic models. We first elaborate on methodologies for realistic data contamination, with a particular emphasis on DNN training with simulated data. We then investigate on approaches for better exploiting speech contexts, proposing some original methodologies for both feed-forward and recurrent neural networks. Lastly, inspired by the idea that cooperation across different DNNs could be the key for counteracting the harmful effects of noise and reverberation, we propose a novel deep learning paradigm called network of deep neural networks. The analysis of the original concepts were based on extensive experimental validations conducted on both real and simulated data, considering different corpora, microphone configurations, environments, noisy conditions, and ASR tasks.Comment: PhD Thesis Unitn, 201

    Hidden Markov Models

    Get PDF
    Hidden Markov Models (HMMs), although known for decades, have made a big career nowadays and are still in state of development. This book presents theoretical issues and a variety of HMMs applications in speech recognition and synthesis, medicine, neurosciences, computational biology, bioinformatics, seismology, environment protection and engineering. I hope that the reader will find this book useful and helpful for their own research

    Aportaciones al reconocimiento automático de texto manuscrito

    Full text link
    En esta tesis se estudia el problema de la robustez en los sistemas de reconocimiento automático de texto manuscrito off-line. Los sistemas de reconocimiento automático de texto manuscrito estarán maduros para su uso generalizado, cuando sean capaces de ofrecer a cualquier usuario, sin ningún tipo de preparación o adiestramiento para su utilización, una productividad razonable. Se hace necesario pues, construir sistemas flexibles y robustos en cuanto a la entrada, de tal manera que no se requiera del escritor ningún esfuerzo extra, que no haría si escribiese para ser leído por un humano. La intención del preproceso de la señal es hacer el sistema invariante a fuentes de variabilidad que no ayuden a la clasificación. En la actualidad no hay definida una solución general para conseguir invariabilidad al estilo de escritura, y cada sistema desarrolla la suya ad-hoc. En esta tesis se explorarán diferentes métodos de normalización de la señal de entrada off-line. Para ello se hace un amplio estudio de algoritmos de preproceso, tanto a nivel de toda la imagen: umbralización, reducción del ruido y corrección del desencuadre; como a nivel de texto: slope, slant y normalización del tamaño de los caracteres. Los sistemas dependientes del escritor obtienen mejores tasas de acierto que los independientes del escritor. Por otra parte, los sistemas independientes del escritor tienen más facilidad para reunir muestras de entrenamiento. En esta tesis seestudiará la adaptación de sistemas independientes del escritor para su utilizaciónpor un único escritor, con la intención de que a partir de una pocas muestras producidas por este escritor se mejore la productividad del sistema (para este escritor), o lo que es lo mismo, que éste pueda escribir de manera más relajada sin que el sistema pierda productividad. Los sistemas de reconocimiento de texto manuscrito no están exentos de errores. No sólo interesa saber el número de errores que produciráPastor Gadea, M. (2007). Aportaciones al reconocimiento automático de texto manuscrito [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/1832Palanci
    corecore