2,090 research outputs found

    SVMs for Automatic Speech Recognition: a Survey

    Get PDF
    Hidden Markov Models (HMMs) are, undoubtedly, the most employed core technique for Automatic Speech Recognition (ASR). Nevertheless, we are still far from achieving high-performance ASR systems. Some alternative approaches, most of them based on Artificial Neural Networks (ANNs), were proposed during the late eighties and early nineties. Some of them tackled the ASR problem using predictive ANNs, while others proposed hybrid HMM/ANN systems. However, despite some achievements, nowadays, the preponderance of Markov Models is a fact. During the last decade, however, a new tool appeared in the field of machine learning that has proved to be able to cope with hard classification problems in several fields of application: the Support Vector Machines (SVMs). The SVMs are effective discriminative classifiers with several outstanding characteristics, namely: their solution is that with maximum margin; they are capable to deal with samples of a very higher dimensionality; and their convergence to the minimum of the associated cost function is guaranteed. These characteristics have made SVMs very popular and successful. In this chapter we discuss their strengths and weakness in the ASR context and make a review of the current state-of-the-art techniques. We organize the contributions in two parts: isolated-word recognition and continuous speech recognition. Within the first part we review several techniques to produce the fixed-dimension vectors needed for original SVMs. Afterwards we explore more sophisticated techniques based on the use of kernels capable to deal with sequences of different length. Among them is the DTAK kernel, simple and effective, which rescues an old technique of speech recognition: Dynamic Time Warping (DTW). Within the second part, we describe some recent approaches to tackle more complex tasks like connected digit recognition or continuous speech recognition using SVMs. Finally we draw some conclusions and outline several ongoing lines of research

    Phoneme Recognition on the TIMIT Database

    Get PDF

    Speech and neural network dynamics

    Get PDF

    An acoustic-phonetic approach in automatic Arabic speech recognition

    Get PDF
    In a large vocabulary speech recognition system the broad phonetic classification technique is used instead of detailed phonetic analysis to overcome the variability in the acoustic realisation of utterances. The broad phonetic description of a word is used as a means of lexical access, where the lexicon is structured into sets of words sharing the same broad phonetic labelling. This approach has been applied to a large vocabulary isolated word Arabic speech recognition system. Statistical studies have been carried out on 10,000 Arabic words (converted to phonemic form) involving different combinations of broad phonetic classes. Some particular features of the Arabic language have been exploited. The results show that vowels represent about 43% of the total number of phonemes. They also show that about 38% of the words can uniquely be represented at this level by using eight broad phonetic classes. When introducing detailed vowel identification the percentage of uniquely specified words rises to 83%. These results suggest that a fully detailed phonetic analysis of the speech signal is perhaps unnecessary. In the adopted word recognition model, the consonants are classified into four broad phonetic classes, while the vowels are described by their phonemic form. A set of 100 words uttered by several speakers has been used to test the performance of the implemented approach. In the implemented recognition model, three procedures have been developed, namely voiced-unvoiced-silence segmentation, vowel detection and identification, and automatic spectral transition detection between phonemes within a word. The accuracy of both the V-UV-S and vowel recognition procedures is almost perfect. A broad phonetic segmentation procedure has been implemented, which exploits information from the above mentioned three procedures. Simple phonological constraints have been used to improve the accuracy of the segmentation process. The resultant sequence of labels are used for lexical access to retrieve the word or a small set of words sharing the same broad phonetic labelling. For the case of having more than one word-candidates, a verification procedure is used to choose the most likely one

    Evaluation of preprocessors for neural network speaker verification

    Get PDF

    Whole Word Phonetic Displays for Speech Articulation Training

    Get PDF
    The main objective of this dissertation is to investigate and develop speech recognition technologies for speech training for people with hearing impairments. During the course of this work, a computer aided speech training system for articulation speech training was also designed and implemented. The speech training system places emphasis on displays to improve children\u27s pronunciation of isolated Consonant-Vowel-Consonant (CVC) words, with displays at both the phonetic level and whole word level. This dissertation presents two hybrid methods for combining Hidden Markov Models (HMMs) and Neural Networks (NNs) for speech recognition. The first method uses NN outputs as posterior probability estimators for HMMs. The second method uses NNs to transform the original speech features to normalized features with reduced correlation. Based on experimental testing, both of the hybrid methods give higher accuracy than standard HMM methods. The second method, using the NN to create normalized features, outperforms the first method in terms of accuracy. Several graphical displays were developed to provide real time visual feedback to users, to help them to improve and correct their pronunciations

    A segment-based speaker verification system using SUMMIT

    Get PDF
    Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1997.Includes bibliographical references (p. 75-79).by Sridevi Vedula Sarma.M.S
    • …
    corecore