Search CORE

7,036 research outputs found

Multi-dimensional Boltzmann Sampling of Languages

Author: Bodini Olivier
Ponty Yann
Publication venue
Publication date: 01/01/2010
Field of study

This paper addresses the uniform random generation of words from a context-free language (over an alphabet of size

k

), while constraining every letter to a targeted frequency of occurrence. Our approach consists in a multidimensional extension of Boltzmann samplers \cite{Duchon2004}. We show that, under mostly \emph{strong-connectivity} hypotheses, our samplers return a word of size in

[(1-\varepsilon)n, (1+\varepsilon)n]

and exact frequency in

\mathcal{O}(n^{1+k/2})

expected time. Moreover, if we accept tolerance intervals of width in

\Omega(\sqrt{n})

for the number of occurrences of each letters, our samplers perform an approximate-size generation of words in expected

\mathcal{O}(n)

time. We illustrate these techniques on the generation of Tetris tessellations with uniform statistics in the different types of tetraminoes.Comment: 12p

arXiv.org e-Print Archive

HAL-CentraleSupelec

CiteSeerX

INRIA a CCSD electronic archive server

HAL-Polytechnique

HAL-Rennes 1

Unsupervised syntactic chunking with acoustic cues: Computational models for prosodic bootstrapping

Author: Goldwater Sharon
Pate John K.
Publication venue
Publication date: 01/01/2011
Field of study

Learning to group words into phrases without supervision is a hard task for NLP systems, but infants routinely accomplish it. We hypothesize that infants use acoustic cues to prosody, which NLP systems typically ignore. To evaluate the utility of prosodic information for phrase discovery, we present an HMM-based unsupervised chunker that learns from only transcribed words and raw acoustic correlates to prosody. Unlike previous work on unsupervised parsing and chunking, we use neither gold standard part-of-speech tags nor punctuation in the input. Evaluated on the Switchboard corpus, our model outperforms several baselines that exploit either lexical or prosodic information alone, and, despite producing a flat structure, performs competitively with a state-of-the-art unsupervised lexicalized parser, with a substantial advantage in precision. Our results support the hypothesis that acoustic-prosodic cues provide useful evidence about syntactic phrases for language-learning infants.10 page(s

Edinburgh Research Explorer

Macquarie University ResearchOnline

Improved Constituent Context Model with Features

Author: Huang Yun
Tan Chew Lim
Zhang Min
Publication venue: 'Faculty of Computer Science, Universitas Indonesia'
Publication date: 01/01/2012
Field of study

Waseda University Repository