9 research outputs found

    Nyelvészeti tudásforrások integrálási lehetőségei diszkriminatív szegmens-alapú bészédfelismerő rendszerekbe

    Get PDF
    A gépi beszédfelismerésben jelenleg kizárólag csak statisztikai elven működő algoritmusokat használnak. Ezek egyszerű matematikai modelleken alapulnak, amelyek paramétereiket hatalmas adatbázisokon automatikusan hangolják be. Az algoritmikai szempontok sajnos háttérbe szorítják a fonetikai/nyelvészeti ismereteket, igy ezek a modellek irreális egyszerűsítő feltevéssekel élnek a beszédkommunikáció természetére nézve. Egy lehetséges alternatíva az ún. szegmentális modellek használata, amelyek - a statisztikai alapelv feladása nélkül - enyhébb megszorításokra épülnek. Ebben a cikkben bemutatjuk a tanszékünkön fejlesztett OASIS szegmens-alapú felismerőt, amely diszkriminatív elven, azaz posteriori valószínűségek ősszekombinálásával dolgozik. Ennek további előnye, hogy nagyobb rugalmasságot biztosit a különféle szintű (de továbbra is statisztikai jellegű) nyelvi információk integrálására, mint a hagyományos rejtett Markovmodell

    The effect of lattice pruning on MMIE training

    Full text link

    Margin Based Learning Framework with Geometric Margin Minimum Classification Error for Robust Speech Recognition

    Get PDF
    Statistical learning theorycombines empirical risk and generalization functionin single optimized objective function of margin based learning for optimization. Margin concept incorporating in Hidden Markov Model (HMM)for speech recognition, Margin based learning frame work based on minimum classification error (MCE) training criteria show higher capability over any other conventional DT methods in improvingclassification robustness (generalization capability) of the acoustic model by increasing the functional margin of the acoustic model. This paper introduces Geometric Margin based separation measure in the loss function definition of margin based learning frame work instead of functional margin separation measure to develop a mathematical framework of new optimize objective function of soft margin estimation (SME) for ASR. Derived SME objective function based on Geometric Margin based separation (misclassification) measure would be capable for representing the strength of margin based learning framework in term of classification robustness by minimizing the classification error probability as well asmaximizing the geometric margin

    Articulatory features for conversational speech recognition

    Get PDF

    I. Magyar Számítógépes Nyelvészeti Konferencia

    Get PDF

    Multi-level acoustic modeling for automatic speech recognition

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012.Cataloged from PDF version of thesis.Includes bibliographical references (p. 183-192).Context-dependent acoustic modeling is commonly used in large-vocabulary Automatic Speech Recognition (ASR) systems as a way to model coarticulatory variations that occur during speech production. Typically, the local phoneme context is used as a means to define context-dependent units. Because the number of possible context-dependent units can grow exponentially with the length of the contexts, many units will not have enough training examples to train a robust model, resulting in a data sparsity problem. For nearly two decades, this data sparsity problem has been dealt with by a clustering-based framework which systematically groups different context-dependent units into clusters such that each cluster can have enough data. Although dealing with the data sparsity issue, the clustering-based approach also makes all context-dependent units within a cluster have the same acoustic score, resulting in a quantization effect that can potentially limit the performance of the context-dependent model. In this work, a multi-level acoustic modeling framework is proposed to address both the data sparsity problem and the quantization effect. Under the multi-level framework, each context-dependent unit is associated with classifiers that target multiple levels of contextual resolution, and the outputs of the classifiers are linearly combined for scoring during recognition. By choosing the classifiers judiciously, both the data sparsity problem and the quantization effect can be dealt with. The proposed multi-level framework can also be integrated into existing large-vocabulary ASR systems, such as FST-based ASR systems, and is compatible with state-of-the-art error reduction techniques for ASR systems, such as discriminative training methods. Multiple sets of experiments have been conducted to compare the performance of the clustering-based acoustic model and the proposed multi-level model. In a phonetic recognition experiment on TIMIT, the multi-level model has about 8% relative improvement in terms of phone error rate, showing that the multi-level framework can help improve phonetic prediction accuracy. In a large-vocabulary transcription task, combining the proposed multi-level modeling framework with discriminative training can provide more than 20% relative improvement over a clustering baseline model in terms of Word Error Rate (WER), showing that the multi-level framework can be integrated into existing large-vocabulary decoding frameworks and that it combines well with discriminative training methods. In speaker adaptive transcription task, the multi-level model has about 14% relative WER improvement, showing that the proposed framework can adapt better to new speakers, and potentially to new environments than the conventional clustering-based approach.by Hung-An Chang.Ph.D
    corecore