351,493 research outputs found

    RNN Language Model with Word Clustering and Class-based Output Layer

    Get PDF
    The recurrent neural network language model (RNNLM) has shown significant promise for statistical language modeling. In this work, a new class-based output layer method is introduced to further improve the RNNLM. In this method, word class information is incorporated into the output layer by utilizing the Brown clustering algorithm to estimate a class-based language model. Experimental results show that the new output layer with word clustering not only improves the convergence obviously but also reduces the perplexity and word error rate in large vocabulary continuous speech recognition

    Incorporating Class-based Language Model for Named Entity Recognition in Factorized Neural Transducer

    Full text link
    In spite of the excellent strides made by end-to-end (E2E) models in speech recognition in recent years, named entity recognition is still challenging but critical for semantic understanding. In order to enhance the ability to recognize named entities in E2E models, previous studies mainly focus on various rule-based or attention-based contextual biasing algorithms. However, their performance might be sensitive to the biasing weight or degraded by excessive attention to the named entity list, along with a risk of false triggering. Inspired by the success of the class-based language model (LM) in named entity recognition in conventional hybrid systems and the effective decoupling of acoustic and linguistic information in the factorized neural Transducer (FNT), we propose a novel E2E model to incorporate class-based LMs into FNT, which is referred as C-FNT. In C-FNT, the language model score of named entities can be associated with the name class instead of its surface form. The experimental results show that our proposed C-FNT presents significant error reduction in named entities without hurting performance in general word recognition

    Morphologically motivated word classes for very large vocabulary speech recognition of Finnish and Estonian

    Get PDF
    We study class-based n-gram and neural network language models for very large vocabulary speech recognition of two morphologically rich languages: Finnish and Estonian. Due to morphological processes such as derivation, inflection and compounding, the models need to be trained with vocabulary sizes of several millions of word types. Class-based language modelling is in this case a powerful approach to alleviate the data sparsity and reduce the computational load. For a very large vocabulary, bigram statistics may not be an optimal way to derive the classes. We thus study utilizing the output of a morphological analyzer to achieve efficient word classes. We show that efficient classes can be learned by refining the morphological classes to smaller equivalence classes using merging, splitting and exchange procedures with suitable constraints. This type of classification can improve the results, particularly when language model training data is not very large. We also extend the previous analyses by rescoring the hypotheses obtained from a very large vocabulary recognizer using class-based neural network language models. We show that despite the fixed vocabulary, carefully constructed classes for word-based language models can in some cases result in lower error rates than subword-based unlimited vocabulary language models.We study class-based n-gram and neural network language models for very large vocabulary speech recognition of two morphologically rich languages: Finnish and Estonian. Due to morphological processes such as derivation, inflection and compounding, the models need to be trained with vocabulary sizes of several millions of word types. Class-based language modelling is in this case a powerful approach to alleviate the data sparsity and reduce the computational load. For a very large vocabulary, bigram statistics may not be an optimal way to derive the classes. We thus study utilizing the output of a morphological analyzer to achieve efficient word classes. We show that efficient classes can be learned by refining the morphological classes to smaller equivalence classes using merging, splitting and exchange procedures with suitable constraints. This type of classification can improve the results, particularly when language model training data is not very large. We also extend the previous analyses by rescoring the hypotheses obtained from a very large vocabulary recognizer using class-based neural network language models. We show that despite the fixed vocabulary, carefully constructed classes for word-based language models can in some cases result in lower error rates than subword-based unlimited vocabulary language models.Peer reviewe

    Language modeling for speech recognition of spoken Cantonese.

    Get PDF
    Yeung, Yu Ting.Thesis (M.Phil.)--Chinese University of Hong Kong, 2009.Includes bibliographical references (leaves 84-93).Abstracts in English and Chinese.Acknowledgement --- p.iiiAbstract --- p.ivChapter 1 --- Introduction --- p.1Chapter 1.1 --- Cantonese Speech Recognition --- p.3Chapter 1.2 --- Objectives --- p.4Chapter 1.3 --- Thesis Outline --- p.5Chapter 2 --- Fundamentals of Large Vocabulary Continuous Speech Recognition --- p.7Chapter 2.1 --- Problem Formulation --- p.7Chapter 2.2 --- Feature Extraction --- p.8Chapter 2.3 --- Acoustic Models --- p.9Chapter 2.4 --- Decoding --- p.10Chapter 2.5 --- Statistical Language Modeling --- p.12Chapter 2.5.1 --- N-gram Language Models --- p.12Chapter 2.5.2 --- N-gram Smoothing --- p.13Chapter 2.5.3 --- Complexity of Language Model --- p.15Chapter 2.5.4 --- Class-based Langauge Model --- p.16Chapter 2.5.5 --- Language Model Pruning --- p.17Chapter 2.6 --- Performance Evaluation --- p.18Chapter 3 --- The Cantonese Dialect --- p.19Chapter 3.1 --- Phonology of Cantonese --- p.19Chapter 3.2 --- Orthographic Representation of Cantonese --- p.22Chapter 3.3 --- Classification of Cantonese speech --- p.25Chapter 3.4 --- Cantonese-English Code-mixing --- p.27Chapter 4 --- Rule-based Translation Method --- p.29Chapter 4.1 --- Motivations --- p.29Chapter 4.2 --- Transformation-based Learning --- p.30Chapter 4.2.1 --- Algorithm Overview --- p.30Chapter 4.2.2 --- Learning of Translation Rules --- p.32Chapter 4.3 --- Performance Evaluation --- p.35Chapter 4.3.1 --- The Learnt Translation Rules --- p.35Chapter 4.3.2 --- Evaluation of the Rules --- p.37Chapter 4.3.3 --- Analysis of the Rules --- p.37Chapter 4.4 --- Preparation of Training Data for Language Modeling --- p.41Chapter 4.5 --- Discussion --- p.43Chapter 5 --- Language Modeling for Cantonese --- p.44Chapter 5.1 --- Training Data --- p.44Chapter 5.1.1 --- Text Corpora --- p.44Chapter 5.1.2 --- Preparation of Formal Cantonese Text Data --- p.45Chapter 5.2 --- Training of Language Models --- p.46Chapter 5.2.1 --- Language Models for Standard Chinese --- p.46Chapter 5.2.2 --- Language Models for Formal Cantonese --- p.46Chapter 5.2.3 --- Language models for Colloquial Cantonese --- p.47Chapter 5.3 --- Evaluation of Language Models --- p.48Chapter 5.3.1 --- Speech Corpora for Evaluation --- p.48Chapter 5.3.2 --- Perplexities of Formal Cantonese Language Models --- p.49Chapter 5.3.3 --- Perplexities of Colloquial Cantonese Language Models --- p.51Chapter 5.4 --- Speech Recognition Experiments --- p.53Chapter 5.4.1 --- Speech Corpora --- p.53Chapter 5.4.2 --- Experimental Setup --- p.54Chapter 5.4.3 --- Results on Formal Cantonese Models --- p.55Chapter 5.4.4 --- Results on Colloquial Cantonese Models --- p.56Chapter 5.5 --- Analysis of Results --- p.58Chapter 5.6 --- Discussion --- p.59Chapter 5.6.1 --- Cantonese Language Modeling --- p.59Chapter 5.6.2 --- Interpolated Language Models --- p.59Chapter 5.6.3 --- Class-based Language Models --- p.60Chapter 6 --- Towards Language Modeling of Code-mixing Speech --- p.61Chapter 6.1 --- Data Collection --- p.61Chapter 6.1.1 --- Data Collection --- p.62Chapter 6.1.2 --- Filtering of Collected Data --- p.63Chapter 6.1.3 --- Processing of Collected Data --- p.63Chapter 6.2 --- Clustering of Chinese and English Words --- p.64Chapter 6.3 --- Language Modeling for Code-mixing Speech --- p.64Chapter 6.3.1 --- Language Models from Collected Data --- p.64Chapter 6.3.2 --- Class-based Language Models --- p.66Chapter 6.3.3 --- Performance Evaluation of Code-mixing Language Models --- p.67Chapter 6.4 --- Speech Recognition Experiments with Code-mixing Language Models --- p.69Chapter 6.4.1 --- Experimental Setup --- p.69Chapter 6.4.2 --- Monolingual Cantonese Recognition --- p.70Chapter 6.4.3 --- Code-mixing Speech Recognition --- p.72Chapter 6.5 --- Discussion --- p.74Chapter 6.5.1 --- Data Collection from the Internet --- p.74Chapter 6.5.2 --- Speech Recognition of Code-mixing Speech --- p.75Chapter 7 --- Conclusions and Future Work --- p.77Chapter 7.1 --- Conclusions --- p.77Chapter 7.1.1 --- Rule-based Translation Method --- p.77Chapter 7.1.2 --- Cantonese Language Modeling --- p.78Chapter 7.1.3 --- Code-mixing Language Modeling --- p.78Chapter 7.2 --- Future Works --- p.79Chapter 7.2.1 --- Rule-based Translation --- p.79Chapter 7.2.2 --- Training data --- p.80Chapter 7.2.3 --- Code-mixing speech --- p.80Chapter A --- Equation Derivation --- p.82Chapter A.l --- Relationship between Average Mutual Information and Perplexity --- p.82Bibliography --- p.8
    • …
    corecore