Search CORE

1,364 research outputs found

Learning OT constraint rankings using a maximum entropy model

Author: Goldwater Sharon
Johnson M
Publication venue
Publication date: 01/01/2003
Field of study

Abstract. A weakness of standard Optimality Theory is its inability to account for grammar

CiteSeerX

Edinburgh Research Explorer

Improving Statistical MT through Morphological Analysis

Author: Goldwater Sharon
McClosky David
Publication venue
Publication date: 01/01/2005
Field of study

Crossref

Edinburgh Research Explorer

Minimally-Supervised Morphological Segmentation using Adaptor Grammars

Author: Goldwater Sharon
Sirts Kairit
Publication venue
Publication date: 01/01/2013
Field of study

This paper explores the use of Adaptor Grammars, a nonparametric Bayesian modelling framework, for minimally supervised morphological segmentation. We compare three training methods: unsupervised training, semi-supervised training, and a novel model selection method. In the model selection method, we train unsupervised Adaptor Grammars using an over-articulated metagrammar, then use a small labelled data set to select which potential morph boundaries identified by the metagrammar should be returned in the final output. We evaluate on five languages and show that semi-supervised training provides a boost over unsupervised training, while the model selection method yields the best average results over all languages and is competitive with state-of-the-art semi-supervised systems. Moreover, this method provides the potential to tune performance according to different evaluation metrics or downstream tasks.12 page(s

Edinburgh Research Explorer

Macquarie University ResearchOnline

Improving morphology induction by learning spelling rules

Author: Goldwater Sharon
Naradowski Jason
Publication venue
Publication date: 01/01/2009
Field of study

Edinburgh Research Explorer

Unsupervised syntactic chunking with acoustic cues: Computational models for prosodic bootstrapping

Author: Goldwater Sharon
Pate John K.
Publication venue
Publication date: 01/01/2011
Field of study

Learning to group words into phrases without supervision is a hard task for NLP systems, but infants routinely accomplish it. We hypothesize that infants use acoustic cues to prosody, which NLP systems typically ignore. To evaluate the utility of prosodic information for phrase discovery, we present an HMM-based unsupervised chunker that learns from only transcribed words and raw acoustic correlates to prosody. Unlike previous work on unsupervised parsing and chunking, we use neither gold standard part-of-speech tags nor punctuation in the input. Evaluated on the Switchboard corpus, our model outperforms several baselines that exploit either lexical or prosodic information alone, and, despite producing a flat structure, performs competitively with a state-of-the-art unsupervised lexicalized parser, with a substantial advantage in precision. Our results support the hypothesis that acoustic-prosodic cues provide useful evidence about syntactic phrases for language-learning infants.10 page(s

Edinburgh Research Explorer

Macquarie University ResearchOnline

Unsupervised extraction of recurring words from infant-directed speech

Author: Goldwater Sharon
McInnes Fergus R.
Publication venue
Publication date: 01/01/2011
Field of study

To date, most computational models of infant word segmentation have worked from phonemic or phonetic input, or have used toy datasets. In this paper, we present an algorithm for word extraction that works directly from naturalistic acoustic input: infant-directed speech from the CHILDES corpus. The algorithm identifies recurring acoustic patterns that are candidates for identification as words or phrases, and then clusters together the most similar patterns. The recurring patterns are found in a single pass through the corpus using an incremental method, where only a small number of utterances are considered at once. Despite this limitation, we show that the algorithm is able to extract a number of recurring words, including some that infants learn earliest, such as Mommy and the child’s name. We also introduce a novel information-theoretic evaluation measure

CiteSeerX

Edinburgh Research Explorer

eScholarship - University of California

Predictability effects in adult-directed and infant-directed speech: Does the listener matter?

Author: Goldwater Sharon
Pate John K.
Publication venue
Publication date: 01/01/2011
Field of study

A well-known effect in speech production is that more predictable words tend to be phonetically reduced. Recent work has suggested that predictability effects result from hardwired properties of the language production system, rather than active modulation by the talker to accommodate the listener. However, these studies investigated only minor manipulations of listener characteristics. Here, we examine predictability effects with two very different listener populations: adults and preverbal infants. Using mixed effects regressions on spontaneous speech corpora, we compare the effect of word frequency, probability in context, and previous mention on word duration in adult-directed and infant-directed speech. We find that the effects of preceding context and word frequency differ according to listener. Contrary to previous work, these results suggest that talkers do modulate the phonetic effects of predictability based on listener characteristics. To our knowledge, this study is also the first published analysis of predictability effects in infant-directed speech

CiteSeerX

Edinburgh Research Explorer

eScholarship - University of California

Macquarie University ResearchOnline

Edge-Based Best-First Chart Parsing

Author: Charniak Eugene
Goldwater Sharon
Johnson Mark
Publication venue
Publication date: 01/01/1998
Field of study

Best-first probabilistic chart parsing attempts to parse efficiently by working on edges that are judged 'best' by some probabilistic figure of merit (FOM). Recent work has used proba- bilistic context-free grammars (PCFGs) to sign probabilities to constituents, and to use these probabilities as the starting point for the FOM. This paper extends this approach to us- ing a probabilistic FOM to judge edges (incomplete constituents), thereby giving a much finergrained control over parsing effort. We show how this can be accomplished in a particularly simple way using the common idea of binarizing the PCFG. The results obtained are about a factor of twenty improvement over the best prior results -- that is, our parser achieves equivalent results using one twentieth the number of edges. Furthermore we show that this improvement is obtained with parsing precision and recall levels superior to those achieved by exhaustive parsing

CiteSeerX

Edinburgh Research Explorer

Bayesian Inference for PCFGs via Markov Chain Monte Carlo

Author: Goldwater Sharon
Griffiths Thomas
Johnson Mark
Publication venue
Publication date: 01/01/2007
Field of study

8 page(s

Edinburgh Research Explorer

Macquarie University ResearchOnline