94 research outputs found
Predictability effects in language acquisition
Human language has two fundamental requirements: it must allow competent speakers
to exchange messages efficiently, and it must be readily learned by children. Recent
work has examined effects of language predictability on language production, with
many researchers arguing that so-called âpredictability effectsâ function towards the
efficiency requirement. Specifically, recent work has found that talkers tend to reduce
linguistic forms that are more probable more heavily. This dissertation proposes the
âPredictability Bootstrapping Hypothesisâ that predictability effects also make language
more learnable. There is a great deal of evidence that the adult grammars have
substantial statistical components. Since predictability effects result in heavier reduction
for more probable words and hidden structure, they provide infants with direct
cues to the statistical components of the grammars they are trying to learn.
The corpus studies and computational modeling experiments in this dissertation
show that predictability effects could be a substantial source of information to language-learning
infants, focusing on the potential utility of phonetic reduction in terms of word
duration for syntax acquisition. First, corpora of spontaneous adult-directed and child-directed
speech (ADS and CDS, respectively) are compared to verify that predictability
effects actually exist in CDS. While revealing some differences, mixed effects regressions
on those corpora indicate that predictability effects in CDS are largely similar
(in kind and magnitude) to predictability effects in ADS. This result indicates that predictability
effects are available to infants, however useful they may be.
Second, this dissertation builds probabilistic, unsupervised, and lexicalized models
for learning about syntax from words and durational cues. One series of models is
based on Hidden Markov Models and learns shallow constituency structure, while the
other series is based on the Dependency Model with Valence and learns dependency
structure. These models are then used to measure how useful durational cues are for
syntax acquisition, and to what extent their utility in this task can be attributed to
effects of syntactic predictability on word duration. As part of this investigation, these
models are also used to explore the venerable âProsodic Bootstrapping Hypothesisâ
that prosodic structure, which is cued in part by word duration, may be useful for
syntax acquisition. The empirical evaluations of these models provide evidence that
effects of syntactic predictability on word duration are easier to discover and exploit
than effects of prosodic structure, and that even gold-standard annotations of prosodic
structure provide at most a relatively small improvement in parsing performance over raw word duration.
Taken together, this work indicates that predictability effects provide useful information
about syntax to infants, showing that the Predictability Bootstrapping Hypothesis
for syntax acquisition is computationally plausible and motivating future behavioural
investigation. Additionally, as talkers consider the probability of many different
aspects of linguistic structure when reducing according to predictability effects,
this result also motivates investigation of Predictability Bootstrapping of other aspects
of linguistic knowledge
Analysis by Synthesis: A (Re-)Emerging Program of Research for Language and Vision
This contribution reviews (some of) the history of analysis by synthesis, an approach to perception and comprehension articulated in the 1950s. Whereas much research has focused on bottom-up, feed-forward, inductive mechanisms, analysis by synthesis as a heuristic model emphasizes a balance of bottom-up and knowledge-driven, top-down, predictive steps in speech perception and language comprehension. This idea aligns well with contemporary Bayesian approaches to perception (in language and other domains), which are illustrated with examples from different aspects of perception and comprehension. Results from psycholinguistics, the cognitive neuroscience of language, and visual object recognition suggest that analysis by synthesis can provide a productive way of structuring biolinguistic research. Current evidence suggests that such a model is theoretically well motivated, biologically sensible, and becomes computationally tractable borrowing from Bayesian formalizations
Dialogue Act Modeling for Automatic Tagging and Recognition of Conversational Speech
We describe a statistical approach for modeling dialogue acts in
conversational speech, i.e., speech-act-like units such as Statement, Question,
Backchannel, Agreement, Disagreement, and Apology. Our model detects and
predicts dialogue acts based on lexical, collocational, and prosodic cues, as
well as on the discourse coherence of the dialogue act sequence. The dialogue
model is based on treating the discourse structure of a conversation as a
hidden Markov model and the individual dialogue acts as observations emanating
from the model states. Constraints on the likely sequence of dialogue acts are
modeled via a dialogue act n-gram. The statistical dialogue grammar is combined
with word n-grams, decision trees, and neural networks modeling the
idiosyncratic lexical and prosodic manifestations of each dialogue act. We
develop a probabilistic integration of speech recognition with dialogue
modeling, to improve both speech recognition and dialogue act classification
accuracy. Models are trained and evaluated using a large hand-labeled database
of 1,155 conversations from the Switchboard corpus of spontaneous
human-to-human telephone speech. We achieved good dialogue act labeling
accuracy (65% based on errorful, automatically recognized words and prosody,
and 71% based on word transcripts, compared to a chance baseline accuracy of
35% and human accuracy of 84%) and a small reduction in word recognition error.Comment: 35 pages, 5 figures. Changes in copy editing (note title spelling
changed
Unsupervised learning for text-to-speech synthesis
This thesis introduces a general method for incorporating the distributional analysis
of textual and linguistic objects into text-to-speech (TTS) conversion systems.
Conventional TTS conversion uses intermediate layers of representation to bridge
the gap between text and speech. Collecting the annotated data needed to produce
these intermediate layers is a far from trivial task, possibly prohibitively so
for languages in which no such resources are in existence. Distributional analysis,
in contrast, proceeds in an unsupervised manner, and so enables the creation of
systems using textual data that are not annotated. The method therefore aids
the building of systems for languages in which conventional linguistic resources
are scarce, but is not restricted to these languages.
The distributional analysis proposed here places the textual objects analysed
in a continuous-valued space, rather than specifying a hard categorisation of those
objects. This space is then partitioned during the training of acoustic models for
synthesis, so that the models generalise over objects' surface forms in a way that
is acoustically relevant.
The method is applied to three levels of textual analysis: to the characterisation
of sub-syllabic units, word units and utterances. Entire systems for three
languages (English, Finnish and Romanian) are built with no reliance on manually
labelled data or language-specific expertise. Results of a subjective evaluation
are presented
Recommended from our members
Learning with Joint Inference and Latent Linguistic Structure in Graphical Models
Constructing end-to-end NLP systems requires the processing of many types of linguistic information prior to solving the desired end task. A common approach to this problem is to construct a pipeline, one component for each task, with each system\u27s output becoming input for the next. This approach poses two problems. First, errors propagate, and, much like the childhood game of telephone , combining systems in this manner can lead to unintelligible outcomes. Second, each component task requires annotated training data to act as supervision for training the model. These annotations are often expensive and time-consuming to produce, may differ from each other in genre and style, and may not match the intended application.
In this dissertation we present a general framework for constructing and reasoning on joint graphical model formulations of NLP problems. Individual models are composed using weighted Boolean logic constraints, and inference is performed using belief propagation. The systems we develop are composed of two parts: one a representation of syntax, the other a desired end task (semantic role labeling, named entity recognition, or relation extraction). By modeling these problems jointly, both models are trained in a single, integrated process, with uncertainty propagated between them. This mitigates the accumulation of errors typical of pipelined approaches.
Additionally we propose a novel marginalization-based training method in which the error signal from end task annotations is used to guide the induction of a constrained latent syntactic representation. This allows training in the absence of syntactic training data, where the latent syntactic structure is instead optimized to best support the end task predictions. We find that across many NLP tasks this training method offers performance comparable to fully supervised training of each individual component, and in some instances improves upon it by learning latent structures which are more appropriate for the task
Identifying prosodic prominence patterns for English text-to-speech synthesis
This thesis proposes to improve and enrich the expressiveness of English Text-to-Speech (TTS) synthesis by identifying and generating natural patterns of prosodic
prominence.
In most state-of-the-art TTS systems the prediction from text of prosodic prominence
relations between words in an utterance relies on features that very loosely account
for the combined effects of syntax, semantics, word informativeness and salience,
on prosodic prominence.
To improve prosodic prominence prediction we first follow up the classic approach
in which prosodic prominence patterns are flattened into binary sequences of pitch accented
and pitch unaccented words. We propose and motivate statistic and syntactic
dependency based features that are complementary to the most predictive features proposed
in previous works on automatic pitch accent prediction and show their utility on
both read and spontaneous speech.
Different accentuation patterns can be associated to the same sentence. Such variability
rises the question on how evaluating pitch accent predictors when more patterns
are allowed. We carry out a study on prosodic symbols variability on a speech corpus
where different speakers read the same text and propose an information-theoretic definition
of optionality of symbolic prosodic events that leads to a novel evaluation metric
in which prosodic variability is incorporated as a factor affecting prediction accuracy.
We additionally propose a method to take advantage of the optionality of prosodic
events in unit-selection speech synthesis.
To better account for the tight links between the prosodic prominence of a word and
the discourse/sentence context, part of this thesis goes beyond the accent/no-accent dichotomy
and is devoted to a novel task, the automatic detection of contrast, where
contrast is meant as a (Information Structureâs) relation that ties two words that explicitly
contrast with each other. This task is mainly motivated by the fact that contrastive
words tend to be prosodically marked with particularly prominent pitch accents.
The identification of contrastive word pairs is achieved by combining lexical information,
syntactic information (which mainly aims to identify the syntactic parallelism
that often activates contrast) and semantic information (mainly drawn from the Word-
Net semantic lexicon), within a Support Vector Machines classifier.
Once we have identified patterns of prosodic prominence we propose methods to
incorporate such information in TTS synthesis and test its impact on synthetic speech
naturalness trough some large scale perceptual experiments. The results of these experiments cast some doubts on the utility of a simple accent/no-accent
distinction in Hidden Markov Model based speech synthesis while highlight the
importance of contrastive accents
Information Structure, Grammar and Strategy in Discourse
This dissertation examines two information-structural phenomena, Givenness and Focus, from the perspective of both syntax and pragmatics. Evidence from English, German and other languages suggests a split analysis of information structure--the notions of Focus and Givenness, often thought to be closely related, exist independently at two different levels of linguistic representation. Givenness is encoded as a syntactic feature which presupposes salience in prior discourse and either (1) prevents prosodic prominence (in languages like English and German), or (2) drives syntactic movement (in languages like Italian). On the other hand, Focus, which introduces strong prosodic prominence and a contrastive interpretation, exhibits none of the expected properties of a syntactic feature, and is therefore analyzed quite differently. I argue that Focus is the result of purely pragmatic principles which determine utterance choice in the face of grammatical optionality. The syntactic and phonological systems often generate multiple possible formulations of an utterance, and communicative principles can be invoked to explain the correspondences between certain kinds of discourse contexts and certain patterns of linguistic form. The application of communicative principles to problems of utterance choice is modeled mathematically using the tools of game-theoretic pragmatics. From this perspective, utterances are taken to be strategically chosen in order to maximize communicative effectiveness. Ultimately, the strong differences between Focus and Givenness emphasize a methodological point: both syntactic and pragmatic perspectives are necessary to fully determine the space of possibilities in natural language. Neither perspective should be ignored
Statistical Knowledge and Learning in Phonology
This thesis deals with the theory of the phonetic component of grammar in a formal probabilistic inference framework: (1) it has been recognized since the beginning of generative phonology that some language-specific phonetic implementation is actually context-dependent, and thus it can be said that there are gradient "phonetic processes" in grammar in addition to categorical "phonological processes." However, no explicit theory has been developed to characterize these processes. Meanwhile, (2) it is understood that language acquisition and perception are both really informed guesswork: the result of both types of inference can be reasonably thought to be a less-than-perfect committment, with multiple candidate grammars or parses considered and each associated with some degree of credence. Previous research has used probability theory to formalize these inferences in implemented computational models, especially in phonetics and phonology. In this role, computational models serve to demonstrate the existence of working learning/per- ception/parsing systems assuming a faithful implementation of one particular theory of human language, and are not intended to adjudicate whether that theory is correct. The current thesis (1) develops a theory of the phonetic component of grammar and how it
relates to the greater phonological system and (2) uses a formal Bayesian treatment of learning to evaluate this theory of the phonological architecture and for making predictions about how the resulting grammars will be organized. The coarse description of the consequence for linguistic theory is that the processes we think of as "allophonic" are actually language-specific, gradient phonetic processes, assigned to the phonetic component of grammar; strict allophones have no representation in the output of the categorical phonological grammar
Greek Meter : An Approach Using Metrical Grids and Maxent
Standard presentations of ancient Greek poetic meter typically focus on identifying and classifying the repeatable syllable-weight-based patterns found in Greek poetry. This dissertation, by contrast, seeks to understand why selected Greek poets arranged their words in just those patterns instead of some others. Counter to the prevailing approach in classics, which deïŹnes meters as strings of short and long positions, meters are here viewed as abstract rhythmic patterns, made concrete by the phonological representations of verses. A main goal is to explicitly characterize the well-formedness conditions on the correspondences between these abstract patterns and actual lines. The study is couched in the framework of generative metrics.
Chapter 1 sets the scope and context of the study and provides a brief rationale for the proposed approach by comparing it with traditional Greek metrics and demonstrating the built-in limitations of the latter in explaining the metrical choices of Greek poets. In addition, the chapter examines some basic features of Greek meter from the perspective of comparative metrics. Chapter 2 discusses the key background assumptions about the structure of meter and defends the view that poetic meters are musical objects rather than purely phonological ones, as some scholars have suggested. Chapter 3 presents the statistical method used in the dissertation to model the metrical intuitions of poets (maximum entropy density estimation). The chapter also introduces a new method for examining the extent to which the inherent rhythms of the relevant language explain the regularities observed in verses.
Chapters 4-6 contain the main contributions to the study of Greek meter and the theory of metrics. Chapter 4 presents statistical analyses of four different meters (trochaic tetrameter, archaic and tragic iambic trimeter, comic iambic trimeter, and anapestic dimeter). According to the analyses, the quantitative patterns in these meters can be plausibly described using hierarchical metrical grids and natural metrical constraints. Chapter 5 examines the rhythmically more complex verses of Sappho and Alcaeus in the light of Paul Kiparskyâs recent proposal that the rhythmic aperiodicity that characterizes much early Greek verse is due to syncopation. It is shown that Kiparsky's theory, with some revisions, can be applied to the analysis of the metrical forms used by Sappho and Alcaeus. Chapter 6 argues against the theory of âProsodic metricsâ, which seeks to analyze Greek meters (and those of other languages) by using phonological markedness constraints alone.
Chapter 7 summarizes the main results of the dissertation, places them in the context of the recent history of metrical scholarship, and considers directions for further research.Antiikin kreikkalaisen metriikan yleisesitykset tyypillisesti keskittyvÀt teksteissÀ esiintyvien rytmikuvioiden tunnistamiseen ja luokitteluun. TÀmÀ vÀitöskirja pyrkii sen sijaan ymmÀrtÀmÀÀn, miksi erÀÀt kreikkalaiset runoilijat kÀyttivÀt juuri nÀitÀ kuvioita joidenkin toisten asemesta. Vastoin antiikintutkimuksessa vallitsevaa lÀhestymistapaa, jossa runomittoja kuvaillaan lyhyiden ja pitkien tavupositioiden muodostamina jonoina, tÀssÀ vÀitöskirjassa mittoja tarkastellaan abstrakteina rytmisinÀ skeemoina, joita runoilijat konkretisoivat kielen sommitelmilla. Työn pÀÀtavoite on kuvata tÀsmÀllisesti tÀllaisten mitta-sÀe-vastaavuusparien hyvinmuodostuneisuutta koskevia ehtoja. Tutkimus nivoutuu generatiiivisen metriikan tutkimustraditioon.
VÀitöskirja koostuu seitsemÀstÀ luvusta. Luvussa 1 mÀÀritellÀÀn työn tausta ja tavoitteet sekÀ motivoidaan valittu lÀhestymistapa vertaamalla sitÀ traditionaaliseen metriikkaan ja osoittamalla jÀlkimmÀisen lÀhestymistavan sisÀÀnrakennetut rajoitteet sÀemuotojen valikoitumisen selittÀmisessÀ. LisÀksi luvussa kuvaillaan joitakin kreikkalaisen metriikan peruspiirteitÀ komparatiivisen metriikan nÀkökulmasta. Luvussa 2 tarkastellaan työn keskeisiÀ taustaoletuksia mittojen rakenteesta ja puolustetaan nÀkemystÀ, ettÀ runomitat ovat musiikillisia eivÀtkÀ puhtaasti fonologisia konstruktioita, kuten erÀÀt tutkijat ovat esittÀneet. Luvussa 3 esitellÀÀn tilastollinen menetelmÀ, jota työssÀ sovelletaan runoilijoiden mitallisten intuitioiden mallintamiseen (ns. suurimman uskottavuuden estimointi). Luvussa myös esitellÀÀn uusi menetelmÀ sen tutkimiseen, miltÀ osin kielen ominaisrytmit selittÀvÀt sÀkeissÀ havaittavia sÀÀnnönmukaisuuksia.
Luvut 4-6 sisÀltÀvÀt työn keskeisen kotribuution kreikan metriikan ja metriikan teorian tutkimukseen. Luvussa 4 esitetÀÀn tilastollinen analyysi neljÀstÀ eri runomitasta (trokeinen tetrametri, arkaainen ja traaginen jambinen trimetri, koominen jambinen trimetri ja anapestinen dimetri). Analyysien mukaan nÀissÀ mitoissa sommiteltua kielenainesta voidaan uskottavasti kuvailla hierarkkisten metristen kaavojen ja yksinkertaisten mittarajoitteiden avulla. Luvussa 5 tarkastellaan Sapfon ja Alkaioksen rytmisesti monimutkaisempia sÀkeitÀ analysoiden niitÀ Paul Kiparskyn viimeaikaisen ehdotuksen nÀkökulmasta, jonka mukaan kreikan varhaisten sÀemuotojen nÀennÀinen aperiodisuus johtuu yksinkertaisen perussykkeen synkopoinnista. Luvussa osoitetaan, ettÀ Kiparskyn teoriaa voidaan muutamin muutoksin soveltaa myös Sapfon ja Alkaioksen kÀyttÀmien runomittojen analysointiin. Luvussa 6 argumentoidaan nÀkemystÀ vastaan, jonka mukaan kreikan (ja muiden kielten) mittoja voidaan uskottavasti kuvata pelkkien fonologisten tunnusmerkkirajoitteiden avulla.
Luvussa 7 esitetÀÀn yhteenveto vÀitöskirjan tÀrkeimmistÀ tuloksista, kontekstualisoidaan niitÀ suhteessa metriikan tutkimuksen lÀhihistoriaan sekÀ hahmotellaan suuntaviivoja jatkotutkimukselle
- âŠ