468 research outputs found

    Speech Recognition

    Get PDF
    Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes

    Utterance verification in large vocabulary spoken language understanding system

    Get PDF
    Thesis (M.Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1998.Includes bibliographical references (leaves 87-89).by Huan Yao.M.Eng

    Enhancing posterior based speech recognition systems

    Get PDF
    The use of local phoneme posterior probabilities has been increasingly explored for improving speech recognition systems. Hybrid hidden Markov model / artificial neural network (HMM/ANN) and Tandem are the most successful examples of such systems. In this thesis, we present a principled framework for enhancing the estimation of local posteriors, by integrating phonetic and lexical knowledge, as well as long contextual information. This framework allows for hierarchical estimation, integration and use of local posteriors from the phoneme up to the word level. We propose two approaches for enhancing the posteriors. In the first approach, phoneme posteriors estimated with an ANN (particularly multi-layer Perceptron – MLP) are used as emission probabilities in HMM forward-backward recursions. This yields new enhanced posterior estimates integrating HMM topological constraints (encoding specific phonetic and lexical knowledge), and long context. In the second approach, a temporal context of the regular MLP posteriors is post-processed by a secondary MLP, in order to learn inter and intra dependencies among the phoneme posteriors. The learned knowledge is integrated in the posterior estimation during the inference (forward pass) of the second MLP, resulting in enhanced posteriors. The use of resulting local enhanced posteriors is investigated in a wide range of posterior based speech recognition systems (e.g. Tandem and hybrid HMM/ANN), as a replacement or in combination with the regular MLP posteriors. The enhanced posteriors consistently outperform the regular posteriors in different applications over small and large vocabulary databases

    Interactive translation of conversational speech

    Get PDF

    Decision fusion for multi-modal person authentication.

    Get PDF
    Hui Pak Sum Henry.Thesis (M.Phil.)--Chinese University of Hong Kong, 2006.Includes bibliographical references (leaves [147]-152).Abstracts in English and Chinese.Chapter 1. --- Introduction --- p.1Chapter 1.1. --- Objectives --- p.4Chapter 1.2. --- Thesis Outline --- p.5Chapter 2. --- Background --- p.6Chapter 2.1. --- User Authentication Systems --- p.6Chapter 2.2. --- Biometric Authentication --- p.9Chapter 2.2.1. --- Speaker Verification System --- p.9Chapter 2.2.2. --- Face Verification System --- p.10Chapter 2.2.3. --- Fingerprint Verification System --- p.11Chapter 2.3. --- Verbal Information Verification (VIV) --- p.12Chapter 2.4. --- Combining SV and VIV --- p.15Chapter 2.5. --- Biometric Decision Fusion Techniques --- p.17Chapter 2.6. --- Fuzzy Logic --- p.20Chapter 2.6.1. --- Fuzzy Membership Function and Fuzzy Set --- p.21Chapter 2.6.2. --- Fuzzy Operators --- p.22Chapter 2.6.3. --- Fuzzy Rules --- p.22Chapter 2.6.4. --- Defuzzification --- p.23Chapter 2.6.5. --- Advantage of Using Fuzzy Logic in Biometric Fusion --- p.23Chapter 2.7. --- Chapter Summary --- p.25Chapter 3. --- Experimental Data --- p.26Chapter 3.1. --- Data for Multi-biometric Fusion --- p.26Chapter 3.1.1. --- Speech Utterances --- p.30Chapter 3.1.2. --- Face Movement Video Frames --- p.31Chapter 3.1.3. --- Fingerprint Images --- p.32Chapter 3.2. --- Data for Speech Authentication Fusion --- p.33Chapter 3.2.1. --- SV Training Data for Speaker Model --- p.34Chapter 3.2.2. --- VIV Training Data for Speaker Independent Model --- p.34Chapter 3.2.3. --- Validation Data --- p.34Chapter 3.3. --- Chapter Summary --- p.36Chapter 4. --- Authentication Modules --- p.37Chapter 4.1. --- Biometric Authentication --- p.38Chapter 4.1.1. --- Speaker Verification --- p.38Chapter 4.1.2. --- Face Verification --- p.38Chapter 4.1.3. --- Fingerprint Verification --- p.39Chapter 4.1.4. --- Individual Biometric Performance --- p.39Chapter 4.2. --- Verbal Information Verification (VIV) --- p.42Chapter 4.3. --- Chapter Summary --- p.44Chapter 5. --- Weighted Average Fusion for Multi-Modal Biometrics --- p.46Chapter 5.1. --- Experimental Setup and Results --- p.46Chapter 5.2. --- Analysis of Weighted Average Fusion Results --- p.48Chapter 5.3. --- Chapter Summary --- p.59Chapter 6. --- Fully Adaptive Fuzzy Logic Decision Fusion Framework --- p.61Chapter 6.1. --- Factors Considered in the Estimation of Biometric Sample Quality --- p.62Chapter 6.1.1. --- Factors for Speech --- p.63Chapter 6.1.2. --- Factors for Face --- p.65Chapter 6.1.3. --- Factors for Fingerprint --- p.70Chapter 6.2. --- Fuzzy Logic Decision Fusion Framework --- p.76Chapter 6.2.1. --- Speech Fuzzy Sets --- p.77Chapter 6.2.2. --- Face Fuzzy Sets --- p.79Chapter 6.2.3. --- Fingerprint Fuzzy Sets --- p.80Chapter 6.2.4. --- Output Fuzzy Sets --- p.81Chapter 6.2.5. --- Fuzzy Rules and Other Information --- p.83Chapter 6.3. --- Experimental Setup and Results --- p.84Chapter 6.4. --- Comparison Between Weighted Average and Fuzzy Logic Decision Fusion --- p.86Chapter 6.5. --- Chapter Summary --- p.95Chapter 7. --- Factors Affecting VIV Performance --- p.97Chapter 7.1. --- Factors from Verbal Messages --- p.99Chapter 7.1.1. --- Number of Distinct-Unique Responses --- p.99Chapter 7.1.2. --- Distribution of Distinct-Unique Responses --- p.101Chapter 7.1.3. --- Inter-person Lexical Choice Variations --- p.103Chapter 7.1.4. --- Intra-person Lexical Choice Variations --- p.106Chapter 7.2. --- Factors from Utterance Verification --- p.108Chapter 7.2.1. --- Thresholding --- p.109Chapter 7.2.2. --- Background Noise --- p.113Chapter 7.3. --- VIV Weight Estimation Using PDP --- p.115Chapter 7.4. --- Chapter Summary --- p.119Chapter 8. --- Adaptive Fusion for SV and VIV --- p.121Chapter 8.1. --- Weighted Average fusion of SV and VIV --- p.122Chapter 8.1.1. --- Scores Normalization --- p.123Chapter 8.1.2. --- Experimental Setup --- p.123Chapter 8.2. --- Adaptive Fusion for SV and VIV --- p.124Chapter 8.2.1. --- Components of Adaptive Fusion --- p.126Chapter 8.2.2. --- Three Categories Design --- p.129Chapter 8.2.3. --- Fusion Strategy for Each Category --- p.132Chapter 8.2.4. --- SV Driven Approach --- p.133Chapter 8.3. --- SV and Fixed-Pass Phrase VIV Fusion Results --- p.133Chapter 8.4. --- SV and Key-Pass Phrase VIV Fusion Results --- p.136Chapter 8.5. --- Chapter Summary --- p.141Chapter 9. --- Conclusions and Future Work --- p.143Chapter 9.1. --- Conclusions --- p.143Chapter 9.2. --- Future Work --- p.145Bibliography --- p.147Appendix A Detail of BSC Speech --- p.153Appendix B Fuzzy Rules for Multimodal Biometric Fusion --- p.155Appendix C Full Example for Multimodal Biometrics Fusion --- p.157Appendix DReason for Having a Flat Error Surface --- p.161Appendix E Reason for Having a Relative Peak Point in the Middle of the Error Surface --- p.164Appendix F Illustration on Fuzzy Logic Weight Estimation --- p.166Appendix GExamples for SV and Key-Pass Phrase VIV Fusion --- p.17

    Searching Spontaneous Conversational Speech:Proceedings of ACM SIGIR Workshop (SSCS2008)

    Get PDF

    Essential Speech and Language Technology for Dutch: Results by the STEVIN-programme

    Get PDF
    Computational Linguistics; Germanic Languages; Artificial Intelligence (incl. Robotics); Computing Methodologie

    Proceedings: Voice Technology for Interactive Real-Time Command/Control Systems Application

    Get PDF
    Speech understanding among researchers and managers, current developments in voice technology, and an exchange of information concerning government voice technology efforts are discussed

    Arabic Continuous Speech Recognition System using Sphinx-4

    Get PDF
    Speech is the most natural form of human communication and speech processing has been one of the most exciting areas of the signal processing. Speech recognition technology has made it possible for computer to follow human voice commands and understand human languages. The main goal of speech recognition area is to develop techniques and systems for speech input to machine and treat this speech to be used in many applications. As Arabic is one of the most widely spoken languages in the world. Statistics show that it is the first language (mother-tongue) of 206 million native speakers ranked as fourth after Mandarin, Spanish and English. In spite of its importance, research effort on Arabic Automatic Speech Recognition (ASR) is unfortunately still inadequate[7]. This thesis proposes and describes an efficient and effective framework for designing and developing a speaker-independent continuous automatic Arabic speech recognition system based on a phonetically rich and balanced speech corpus. The developing Arabic speech recognition system is based on the Carnegie Mellon university Sphinx tools. To build the system, we develop three basic components. The dictionary which contains all possible phonetic pronunciations of any word in the domain vocabulary. The second one is the language model such a model tries to capture the properties of a sequence of words by means of a probability distribution, and to predict the next word in a speech sequence. The last one is the acoustic model which will be created by taking audio recordings of speech, and their text transcriptions, and using software to create statistical representations of the sounds that make up each word. The system use the rich and balanced database that contains 367 sentences, a total of 14232 words. The phonetic dictionary contains about 23,841 definitions corresponding to the database words. And the language model contains14233 mono-gram and 32813 bi-grams and 37771 tri-grams. The engine uses 3-emmiting states Hidden Markov Models (HMMs) for tri-phone-based acoustic models
    • …
    corecore