Unsupervised Language Acquisition

de Marcken, Carl

thesis

Unsupervised Language Acquisition

Authors: Carl de Marcken
Publication date: 1 January 1996
Publisher

Abstract

This thesis presents a computational theory of unsupervised language acquisition, precisely defining procedures for learning language from ordinary spoken or written utterances, with no explicit help from a teacher. The theory is based heavily on concepts borrowed from machine learning and statistical estimation. In particular, learning takes place by fitting a stochastic, generative model of language to the evidence. Much of the thesis is devoted to explaining conditions that must hold for this general learning strategy to arrive at linguistically desirable grammars. The thesis introduces a variety of technical innovations, among them a common representation for evidence and grammars, and a learning strategy that separates the ``content'' of linguistic parameters from their representation. Algorithms based on it suffer from few of the search problems that have plagued other computational approaches to language acquisition. The theory has been tested on problems of learning vocabularies and grammars from unsegmented text and continuous speech, and mappings between sound and representations of meaning. It performs extremely well on various objective criteria, acquiring knowledge that causes it to assign almost exactly the same structure to utterances as humans do. This work has application to data compression, language modeling, speech recognition, machine translation, information retrieval, and other tasks that rely on either structural or stochastic descriptions of language.Comment: PhD thesis, 133 page

Similar works

Full text

Available Versions

CiteSeerX

oai:CiteSeerX.psu:10.1.1.591.9...

Last time updated on 29/10/2017

DSpace@MIT

oai:dspace.mit.edu:1721.1/1064...

Last time updated on 11/06/2012

CiteSeerX

oai:CiteSeerX.psu:10.1.1.97.36...

Last time updated on 23/10/2014