Unsupervised extraction of recurring words from infant-directed speech

Goldwater, Sharon; McInnes, Fergus R.

research

Unsupervised extraction of recurring words from infant-directed speech

Authors: Sharon Goldwater
Fergus R. McInnes
Publication date: 1 January 2011
Publisher

Abstract

To date, most computational models of infant word segmentation have worked from phonemic or phonetic input, or have used toy datasets. In this paper, we present an algorithm for word extraction that works directly from naturalistic acoustic input: infant-directed speech from the CHILDES corpus. The algorithm identifies recurring acoustic patterns that are candidates for identification as words or phrases, and then clusters together the most similar patterns. The recurring patterns are found in a single pass through the corpus using an incremental method, where only a small number of utterances are considered at once. Despite this limitation, we show that the algorithm is able to extract a number of recurring words, including some that infants learn earliest, such as Mommy and the child’s name. We also introduce a novel information-theoretic evaluation measure

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Sustaining member

eScholarship - University of California

oai:escholarship.org:ark:/1303...

Last time updated on 25/12/2021

Edinburgh Research Explorer

oai:pure.ed.ac.uk:publications...

Last time updated on 08/02/2015