3 research outputs found

    A discriminative HMM/N-gram-based retrieval approach for Mandarin spoken documents

    Get PDF
    In recent years, statistical modeling approaches have steadily gained in popularity in the field of information retrieval. This article presents an HMM/N-gram-based retrieval approach for Mandarin spoken documents. The underlying characteristics and the various structures of this approach were extensively investigated and analyzed. The retrieval capabilities were verified by tests with word- and syllable-level indexing features and comparisons to the conventional vector-space model approach. To further improve the discrimination capabilities of the HMMs, both the expectation-maximization (EM) and minimum classification error (MCE) training algorithms were introduced in training. Fusion of information via indexing word- and syllable-level features was also investigated. The spoken document retrieval experiments were performed on the Topic Detection and Tracking Corpora (TDT-2 and TDT-3). Very encouraging retrieval performance was obtained

    Mixture Language Models for Call Routing

    Get PDF
    Abstract Our goal is to extract information from a telephone call in order to route the call to one of a number of destinations. We assume that we do not know "a priori" the vocabulary used in the application and so we use phonetic recognition followed by identification of salient phone sequences. In previous work, we showed that using a separate language model during recognition for each route gave improved performance over using a single model. However, this technique decodes each utterance in terms of the salient sequences of each call route, which leads to insertion and substitution errors that degrade performance. In this paper, we introduce the use of mixture language models for speech recognition in the context of call route classification. The benefit of technique can has the efficiency of multiple language models to get accurate recognition on salient phoneme sequences; on the other hand, it can give help in classification, even if the size of some call routes have just 50~60 utterances. It avoids building HMMs for some salient phoneme sequences to decide whether it is correct of occurring in the utterance

    Optimal Mixture Models in IR

    No full text
    We explore the use of Optimal Mixture Models to represent topics. We analyze two broad classes of mixture models: set-based and weighted. We provide an original proof that estimation of set-based models is NP-hard, and therefore not feasible. We argue that weighted models are superior to set-based models, and the solution can be estimated by a simple gradient descent technique. We demonstrate that Optimal Mixture Models can be successfully applied to the task of document retrieval. Our experiments show that weighted mixtures outperform a simple language modeling baseline. We also observe that weighted mixtures are more robust than other approaches of estimating topical models
    corecore