Skip to main content
Article thumbnail
Location of Repository

Distributed discriminative language models for Google voice-search

By Preethi Jyothi, Leif Johnson, Ciprian Chelba, Brian Strope, Google Inc and Perceptron Mapreduce

Abstract

This paper considers large-scale linear discriminative language models trained using a distributed perceptron algorithm. The algorithm is implemented efficiently using a MapReduce/SSTable framework. This work also introduces the use of large amounts of unsupervised data (confidence filtered Google voice-search logs) in conjunction with a novel training procedure that regenerates word lattices for the given data with a weaker acoustic model than the one used to generate the unsupervised transcriptions for the logged data. We observe small but statistically significant improvements in recognition performance after reranking N-best lists of a standard Google voice-search data set

Topics: Index Terms — Discriminative language models, Distributed
Year: 2012
OAI identifier: oai:CiteSeerX.psu:10.1.1.353.4208
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://citeseerx.ist.psu.edu/v... (external link)
  • http://www.mirlab.org/conferen... (external link)
  • Suggested articles


    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.