1,364 research outputs found

    Learning OT constraint rankings using a maximum entropy model

    Get PDF
    Abstract. A weakness of standard Optimality Theory is its inability to account for grammar

    Minimally-Supervised Morphological Segmentation using Adaptor Grammars

    Get PDF
    This paper explores the use of Adaptor Grammars, a nonparametric Bayesian modelling framework, for minimally supervised morphological segmentation. We compare three training methods: unsupervised training, semi-supervised training, and a novel model selection method. In the model selection method, we train unsupervised Adaptor Grammars using an over-articulated metagrammar, then use a small labelled data set to select which potential morph boundaries identified by the metagrammar should be returned in the final output. We evaluate on five languages and show that semi-supervised training provides a boost over unsupervised training, while the model selection method yields the best average results over all languages and is competitive with state-of-the-art semi-supervised systems. Moreover, this method provides the potential to tune performance according to different evaluation metrics or downstream tasks.12 page(s

    Unsupervised syntactic chunking with acoustic cues: Computational models for prosodic bootstrapping

    Get PDF
    Learning to group words into phrases without supervision is a hard task for NLP systems, but infants routinely accomplish it. We hypothesize that infants use acoustic cues to prosody, which NLP systems typically ignore. To evaluate the utility of prosodic information for phrase discovery, we present an HMM-based unsupervised chunker that learns from only transcribed words and raw acoustic correlates to prosody. Unlike previous work on unsupervised parsing and chunking, we use neither gold standard part-of-speech tags nor punctuation in the input. Evaluated on the Switchboard corpus, our model outperforms several baselines that exploit either lexical or prosodic information alone, and, despite producing a flat structure, performs competitively with a state-of-the-art unsupervised lexicalized parser, with a substantial advantage in precision. Our results support the hypothesis that acoustic-prosodic cues provide useful evidence about syntactic phrases for language-learning infants.10 page(s

    Unsupervised extraction of recurring words from infant-directed speech

    Get PDF
    To date, most computational models of infant word segmentation have worked from phonemic or phonetic input, or have used toy datasets. In this paper, we present an algorithm for word extraction that works directly from naturalistic acoustic input: infant-directed speech from the CHILDES corpus. The algorithm identifies recurring acoustic patterns that are candidates for identification as words or phrases, and then clusters together the most similar patterns. The recurring patterns are found in a single pass through the corpus using an incremental method, where only a small number of utterances are considered at once. Despite this limitation, we show that the algorithm is able to extract a number of recurring words, including some that infants learn earliest, such as Mommy and the child’s name. We also introduce a novel information-theoretic evaluation measure

    Predictability effects in adult-directed and infant-directed speech: Does the listener matter?

    Get PDF
    A well-known effect in speech production is that more predictable words tend to be phonetically reduced. Recent work has suggested that predictability effects result from hardwired properties of the language production system, rather than active modulation by the talker to accommodate the listener. However, these studies investigated only minor manipulations of listener characteristics. Here, we examine predictability effects with two very different listener populations: adults and preverbal infants. Using mixed effects regressions on spontaneous speech corpora, we compare the effect of word frequency, probability in context, and previous mention on word duration in adult-directed and infant-directed speech. We find that the effects of preceding context and word frequency differ according to listener. Contrary to previous work, these results suggest that talkers do modulate the phonetic effects of predictability based on listener characteristics. To our knowledge, this study is also the first published analysis of predictability effects in infant-directed speech

    Edge-Based Best-First Chart Parsing

    Get PDF
    Best-first probabilistic chart parsing attempts to parse efficiently by working on edges that are judged 'best' by some probabilistic figure of merit (FOM). Recent work has used proba- bilistic context-free grammars (PCFGs) to sign probabilities to constituents, and to use these probabilities as the starting point for the FOM. This paper extends this approach to us- ing a probabilistic FOM to judge edges (incomplete constituents), thereby giving a much finergrained control over parsing effort. We show how this can be accomplished in a particularly simple way using the common idea of binarizing the PCFG. The results obtained are about a factor of twenty improvement over the best prior results -- that is, our parser achieves equivalent results using one twentieth the number of edges. Furthermore we show that this improvement is obtained with parsing precision and recall levels superior to those achieved by exhaustive parsing
    corecore