122 research outputs found

    Fast speaker independent large vocabulary continuous speech recognition [online]

    Get PDF

    Connectionist model combination for large vocabulary speech recognition

    Get PDF
    Reports in the statistics and neural networks literature have expounded the benefits of merging multiple models to improve classification and prediction performance. The Cambridge University connectionist speech group has developed a hybrid connectionist-hidden Markov model system for large vocabulary talker independent speech recognition. The performance of this system has been greatly enhanced through the merging of connectionist acoustic models. This paper presents and compares a number of different approaches to connectionist model merging and evaluates them on the TIMIT phone recognition and ARPA Wall Street Journal word recognition tasks

    FEATURE AND SCORE LEVEL COMBINATION OF SUBSPACE GAUSSIANS IN LVCSR TASK

    Get PDF
    In this paper, we investigate employment of discriminatively trained acoustic features modeled by Subspace Gaussian Mixture Models (SGMMs) for Rich Transcription meeting recognition. More specifically, first, we focus on exploiting various types of complex features estimated using neural network combined with conventional cepstral features and modeled by standard HMM/GMMs and SGMMs. Then, outputs (word sequences) from individual recognizers trained using different features are also combined on a score-level using ROVER for the both acoustic modeling techniques. Experimental results indicate three important findings: (1) SGMMs consistently outperform HMM/GMMs (relative improvement on average by about 6% in terms of WER) when both techniques are exploited on single features; (2) SGMMs benefit much less from feature-level combination (1% relative improvement) as opposed to HMM/GMMs (4% relative improvement) which can eventually match the performance of SGMMs; (3) SGMMs can be significantly improved when individual systems are combined on a score-level. This suggests that the SGMM systems provide complementary recognition outputs. Overall relative improvements of the combined SGMM and HMM/GMM systems are 21% and 17% respectively compared to a standard ASR baseline

    Performance Analysis of Advanced Front Ends on the Aurora Large Vocabulary Evaluation

    Get PDF
    Over the past few years, speech recognition technology performance on tasks ranging from isolated digit recognition to conversational speech has dramatically improved. Performance on limited recognition tasks in noiseree environments is comparable to that achieved by human transcribers. This advancement in automatic speech recognition technology along with an increase in the compute power of mobile devices, standardization of communication protocols, and the explosion in the popularity of the mobile devices, has created an interest in flexible voice interfaces for mobile devices. However, speech recognition performance degrades dramatically in mobile environments which are inherently noisy. In the recent past, a great amount of effort has been spent on the development of front ends based on advanced noise robust approaches. The primary objective of this thesis was to analyze the performance of two advanced front ends, referred to as the QIO and MFA front ends, on a speech recognition task based on the Wall Street Journal database. Though the advanced front ends are shown to achieve a significant improvement over an industry-standard baseline front end, this improvement is not operationally significant. Further, we show that the results of this evaluation were not significantly impacted by suboptimal recognition system parameter settings. Without any front end-specific tuning, the MFA front end outperforms the QIO front end by 9.6% relative. With tuning, the relative performance gap increases to 15.8%. Finally, we also show that mismatched microphone and additive noise evaluation conditions resulted in a significant degradation in performance for both front ends

    CONNECTIONIST SPEECH RECOGNITION - A Hybrid Approach

    Get PDF

    Improving wordspotting performance with limited training data

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1995.Includes bibliographical references (leaves 149-155).by Eric I-Chao Chang.Ph.D

    Comparison of four approaches to automatic language identification of telephone speech

    Full text link

    The second 'CHiME' Speech Separation and Recognition Challenge: Datasets, tasks and baselines

    Get PDF
    International audienceDistant-microphone automatic speech recognition (ASR) remains a challenging goal in everyday environments involving multiple background sources and reverberation. This paper is intended to be a reference on the 2nd 'CHiME' Challenge, an initiative designed to analyze and evaluate the performance of ASR systems in a real-world domestic environment. Two separate tracks have been proposed: a small-vocabulary task with small speaker movements and a medium-vocabulary task without speaker movements. We discuss the rationale for the challenge and provide a detailed description of the datasets, tasks and baseline performance results for each track

    Speech Recognition

    Get PDF
    Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes
    • 

    corecore