30,307 research outputs found

    A Speech Recognition System Based Improved Algorithm of Dual-template HMM

    Get PDF
    AbstractThe hidden Markov (HMM) and speech recognition algorithm based this model were studied in the paper. In addition the model and recognition algorithm of HMM got be improved based on the traditional the HMM. In the process of modeling, through the training of multiple observe sequence to achieve the recognition of non-specific people, and according to the different number of HMM states to establish the double-template of rough and high precision, and through the second matching algorithm to achieve higher recognition rate. A speech recognition system combined MFCC parameters and HMM algorithm was constructed based improved HMM algorithm. Experimental result shown the speech recognition rate of large vocabulary of non-specific people was greatly improved

    Integrate template matching and statistical modeling for continuous speech recognition

    Get PDF
    Title from PDF of title page (University of Missouri--Columbia, viewed on May 30, 2012).The entire thesis text is included in the research.pdf file; the official abstract appears in the short.pdf file; a non-technical public abstract appears in the public.pdf file.Dissertation advisor: Dr. Yunxin ZhaoVita.Ph. D. University of Missouri--Columbia 2011"December 2011"In this dissertation, a novel approach of integrating template matching with statistical modeling is proposed to improve continuous speech recognition. Commonly used Hidden Markov Models (HMMs) are ineffective in modeling details of speech temporal evolutions, which can be overcome by template-based methods. However, template-based methods are difficult to be extended in large vocabulary continuous speech recognition (LVCSR). Our proposed approach takes advantages of both statistical modeling and template matching to overcome the weaknesses of traditional HMMs and conventional template-based methods. We use multiple Gaussian Mixture Model indices to represent each frame of speech templates. The local distances of log likelihood ratio and Kullback-Leibler divergence are proposed for dynamic time warping based template matching. In order to reduce computational complexity and storage space, we propose methods of minimum distance template selection and maximum log-likelihood template selection, and investigate a template compression method on top of template selection to further improve recognition performance. Experimental results on the TIMIT phone recognition task and a LVCSR task of telehealth captioning demonstrated that the proposed approach significantly improved the performance of recognition accuracy over the HMM baselines, and on the TIMIT task, the proposed method showed consistent performance improvements over progressively enhanced HMM baselines. Moreover, the template selection methods largely reduced computation and storage complexities. Finally, an investigation was made to combine acoustic scores in triphone template matching with scores of prosodic features, which showed positive effects on vowels in LVCSR.Includes bibliographical reference

    Design and implementation of a user-oriented speech recognition interface: the synergy of technology and human factors

    Get PDF
    The design and implementation of a user-oriented speech recognition interface are described. The interface enables the use of speech recognition in so-called interactive voice response systems which can be accessed via a telephone connection. In the design of the interface a synergy of technology and human factors is achieved. This synergy is very important for making speech interfaces a natural and acceptable form of human-machine interaction. Important concepts such as interfaces, human factors and speech recognition are discussed. Additionally, an indication is given as to how the synergy of human factors and technology can be realised by a sketch of the interface's implementation. An explanation is also provided of how the interface might be integrated in different applications fruitfully

    Continuous Action Recognition Based on Sequence Alignment

    Get PDF
    Continuous action recognition is more challenging than isolated recognition because classification and segmentation must be simultaneously carried out. We build on the well known dynamic time warping (DTW) framework and devise a novel visual alignment technique, namely dynamic frame warping (DFW), which performs isolated recognition based on per-frame representation of videos, and on aligning a test sequence with a model sequence. Moreover, we propose two extensions which enable to perform recognition concomitant with segmentation, namely one-pass DFW and two-pass DFW. These two methods have their roots in the domain of continuous recognition of speech and, to the best of our knowledge, their extension to continuous visual action recognition has been overlooked. We test and illustrate the proposed techniques with a recently released dataset (RAVEL) and with two public-domain datasets widely used in action recognition (Hollywood-1 and Hollywood-2). We also compare the performances of the proposed isolated and continuous recognition algorithms with several recently published methods

    Statistical assessment of speech system performance

    Get PDF
    Methods for the normalization of performance tests results of speech recognition systems are presented. Technological accomplishments in speech recognition systems, as well as planned research activities are described
    • …
    corecore