The Unsupervised Acquisition of a Lexicon from Continuous Speech

de Marcken, Carl

research

The Unsupervised Acquisition of a Lexicon from Continuous Speech

Authors: Carl de Marcken
Publication date: 1 January 1995
Publisher

Abstract

We present an unsupervised learning algorithm that acquires a natural-language lexicon from raw speech. The algorithm is based on the optimal encoding of symbol sequences in an MDL framework, and uses a hierarchical representation of language that overcomes many of the problems that have stymied previous grammar-induction procedures. The forward mapping from symbol sequences to the speech stream is modeled using features based on articulatory gestures. We present results on the acquisition of lexicons and language models from raw speech, text, and phonetic transcripts, and demonstrate that our algorithm compares very favorably to other reported results with respect to segmentation performance and statistical efficiency.Comment: 27 page technical repor

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

CiteSeerX

oai:CiteSeerX.psu:10.1.1.53.52...

Last time updated on 22/10/2014

DSpace@MIT

oai:dspace.mit.edu:1721.1/7191

Last time updated on 11/06/2012