Search CORE

5 research outputs found

Use of contexts in language model interpolation and adaptation

Author: Bahl
Bellegarda
Bengio
Blei
Brants
Bulyko
Bulyko
Caseiro
Chen
Chen
Cheng
Chien
Clarkson
Darroch
Della Pietra
Doumpiotis
Federico
Federico
Gildea
Gopalakrishnan
Hermansky
Hieronymus
Hinton
Hsu
Iyer
Iyer
Jelinek
Jelinek
Kaiser
Katz
Kneser
Kneser
Liu
Liu
Liu
Liu
Liu
M.J.F. Gales
McDonough
Mohri
Mohri
Mohri
Mohri
Mrva
Mrva
Och
Oonishi
P.C. Woodland
Povey
Rosenfeld
Rosenfeld
Rosenfeld
Schwenk
Seymore
Sinha
Stolcke
Tam
Woodland
X. Liu
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crowd-supervised training of spoken language systems

Author: McGraw Ian C. (Ian Carmichael)
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2012
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012.Cataloged from PDF version of thesis.Includes bibliographical references (p. 155-166).Spoken language systems are often deployed with static speech recognizers. Only rarely are parameters in the underlying language, lexical, or acoustic models updated on-the-fly. In the few instances where parameters are learned in an online fashion, developers traditionally resort to unsupervised training techniques, which are known to be inferior to their supervised counterparts. These realities make the development of spoken language interfaces a difficult and somewhat ad-hoc engineering task, since models for each new domain must be built from scratch or adapted from a previous domain. This thesis explores an alternative approach that makes use of human computation to provide crowd-supervised training for spoken language systems. We explore human-in-the-loop algorithms that leverage the collective intelligence of crowds of non-expert individuals to provide valuable training data at a very low cost for actively deployed spoken language systems. We also show that in some domains the crowd can be incentivized to provide training data for free, as a byproduct of interacting with the system itself. Through the automation of crowdsourcing tasks, we construct and demonstrate organic spoken language systems that grow and improve without the aid of an expert. Techniques that rely on collecting data remotely from non-expert users, however, are subject to the problem of noise. This noise can sometimes be heard in audio collected from poor microphones or muddled acoustic environments. Alternatively, noise can take the form of corrupt data from a worker trying to game the system - for example, a paid worker tasked with transcribing audio may leave transcripts blank in hopes of receiving a speedy payment. We develop strategies to mitigate the effects of noise in crowd-collected data and analyze their efficacy. This research spans a number of different application domains of widely-deployed spoken language interfaces, but maintains the common thread of improving the speech recognizer's underlying models with crowd-supervised training algorithms. We experiment with three central components of a speech recognizer: the language model, the lexicon, and the acoustic model. For each component, we demonstrate the utility of a crowd-supervised training framework. For the language model and lexicon, we explicitly show that this framework can be used hands-free, in two organic spoken language systems.by Ian C. McGraw.Ph.D

DSpace@MIT

Dynamic Language Model Adaptation using Variational Bayes Inference

Author: Schultz Tanja
Tam Yik-Cheung
Publication venue
Publication date: 16/06/2008
Field of study

KITopen

Dynamic Language Model Adaptation using Variational Bayes Inference

Author: Yik-Cheung Tam And
Publication venue
Publication date
Field of study

We propose an unsupervised dynamic language model (LM) adaptation framework using long-distance latent topic mixtures. The framework employs the Latent Dirichlet Allocation model (LDA) which models the latent topics of a document collection in an unsupervised and Bayesian fashion. In the LDA model, each word is modeled as a mixture of latent topics. Varying topics within a context can be modeled by re-sampling the mixture weights of the latent topics from a prior Dirichlet distribution. The model can be trained using the variational Bayes Expectation Maximization algorithm. During decoding, mixture weights of the latent topics are adapted dynamically using the hypotheses of previously decoded utterances. In our work, the LDA model is combined with the trigram language model using linear interpolation. We evaluated the approach on the CCTV episode of the RT04 Mandarin Broadcast News test set. Results show that the proposed approach reduces the perplexity by up to 15.4% relative and the character error rate by 4.9% relative depending on the size and setup of the training set

CiteSeerX