94 research outputs found

    Predictability effects in language acquisition

    Get PDF
    Human language has two fundamental requirements: it must allow competent speakers to exchange messages efficiently, and it must be readily learned by children. Recent work has examined effects of language predictability on language production, with many researchers arguing that so-called “predictability effects” function towards the efficiency requirement. Specifically, recent work has found that talkers tend to reduce linguistic forms that are more probable more heavily. This dissertation proposes the “Predictability Bootstrapping Hypothesis” that predictability effects also make language more learnable. There is a great deal of evidence that the adult grammars have substantial statistical components. Since predictability effects result in heavier reduction for more probable words and hidden structure, they provide infants with direct cues to the statistical components of the grammars they are trying to learn. The corpus studies and computational modeling experiments in this dissertation show that predictability effects could be a substantial source of information to language-learning infants, focusing on the potential utility of phonetic reduction in terms of word duration for syntax acquisition. First, corpora of spontaneous adult-directed and child-directed speech (ADS and CDS, respectively) are compared to verify that predictability effects actually exist in CDS. While revealing some differences, mixed effects regressions on those corpora indicate that predictability effects in CDS are largely similar (in kind and magnitude) to predictability effects in ADS. This result indicates that predictability effects are available to infants, however useful they may be. Second, this dissertation builds probabilistic, unsupervised, and lexicalized models for learning about syntax from words and durational cues. One series of models is based on Hidden Markov Models and learns shallow constituency structure, while the other series is based on the Dependency Model with Valence and learns dependency structure. These models are then used to measure how useful durational cues are for syntax acquisition, and to what extent their utility in this task can be attributed to effects of syntactic predictability on word duration. As part of this investigation, these models are also used to explore the venerable “Prosodic Bootstrapping Hypothesis” that prosodic structure, which is cued in part by word duration, may be useful for syntax acquisition. The empirical evaluations of these models provide evidence that effects of syntactic predictability on word duration are easier to discover and exploit than effects of prosodic structure, and that even gold-standard annotations of prosodic structure provide at most a relatively small improvement in parsing performance over raw word duration. Taken together, this work indicates that predictability effects provide useful information about syntax to infants, showing that the Predictability Bootstrapping Hypothesis for syntax acquisition is computationally plausible and motivating future behavioural investigation. Additionally, as talkers consider the probability of many different aspects of linguistic structure when reducing according to predictability effects, this result also motivates investigation of Predictability Bootstrapping of other aspects of linguistic knowledge

    Analysis by Synthesis: A (Re-)Emerging Program of Research for Language and Vision

    Get PDF
    This contribution reviews (some of) the history of analysis by synthesis, an approach to perception and comprehension articulated in the 1950s. Whereas much research has focused on bottom-up, feed-forward, inductive mechanisms, analysis by synthesis as a heuristic model emphasizes a balance of bottom-up and knowledge-driven, top-down, predictive steps in speech perception and language comprehension. This idea aligns well with contemporary Bayesian approaches to perception (in language and other domains), which are illustrated with examples from different aspects of perception and comprehension. Results from psycholinguistics, the cognitive neuroscience of language, and visual object recognition suggest that analysis by synthesis can provide a productive way of structuring biolinguistic research. Current evidence suggests that such a model is theoretically well motivated, biologically sensible, and becomes computationally tractable borrowing from Bayesian formalizations

    Dialogue Act Modeling for Automatic Tagging and Recognition of Conversational Speech

    Get PDF
    We describe a statistical approach for modeling dialogue acts in conversational speech, i.e., speech-act-like units such as Statement, Question, Backchannel, Agreement, Disagreement, and Apology. Our model detects and predicts dialogue acts based on lexical, collocational, and prosodic cues, as well as on the discourse coherence of the dialogue act sequence. The dialogue model is based on treating the discourse structure of a conversation as a hidden Markov model and the individual dialogue acts as observations emanating from the model states. Constraints on the likely sequence of dialogue acts are modeled via a dialogue act n-gram. The statistical dialogue grammar is combined with word n-grams, decision trees, and neural networks modeling the idiosyncratic lexical and prosodic manifestations of each dialogue act. We develop a probabilistic integration of speech recognition with dialogue modeling, to improve both speech recognition and dialogue act classification accuracy. Models are trained and evaluated using a large hand-labeled database of 1,155 conversations from the Switchboard corpus of spontaneous human-to-human telephone speech. We achieved good dialogue act labeling accuracy (65% based on errorful, automatically recognized words and prosody, and 71% based on word transcripts, compared to a chance baseline accuracy of 35% and human accuracy of 84%) and a small reduction in word recognition error.Comment: 35 pages, 5 figures. Changes in copy editing (note title spelling changed

    Unsupervised learning for text-to-speech synthesis

    Get PDF
    This thesis introduces a general method for incorporating the distributional analysis of textual and linguistic objects into text-to-speech (TTS) conversion systems. Conventional TTS conversion uses intermediate layers of representation to bridge the gap between text and speech. Collecting the annotated data needed to produce these intermediate layers is a far from trivial task, possibly prohibitively so for languages in which no such resources are in existence. Distributional analysis, in contrast, proceeds in an unsupervised manner, and so enables the creation of systems using textual data that are not annotated. The method therefore aids the building of systems for languages in which conventional linguistic resources are scarce, but is not restricted to these languages. The distributional analysis proposed here places the textual objects analysed in a continuous-valued space, rather than specifying a hard categorisation of those objects. This space is then partitioned during the training of acoustic models for synthesis, so that the models generalise over objects' surface forms in a way that is acoustically relevant. The method is applied to three levels of textual analysis: to the characterisation of sub-syllabic units, word units and utterances. Entire systems for three languages (English, Finnish and Romanian) are built with no reliance on manually labelled data or language-specific expertise. Results of a subjective evaluation are presented

    Identifying prosodic prominence patterns for English text-to-speech synthesis

    Get PDF
    This thesis proposes to improve and enrich the expressiveness of English Text-to-Speech (TTS) synthesis by identifying and generating natural patterns of prosodic prominence. In most state-of-the-art TTS systems the prediction from text of prosodic prominence relations between words in an utterance relies on features that very loosely account for the combined effects of syntax, semantics, word informativeness and salience, on prosodic prominence. To improve prosodic prominence prediction we first follow up the classic approach in which prosodic prominence patterns are flattened into binary sequences of pitch accented and pitch unaccented words. We propose and motivate statistic and syntactic dependency based features that are complementary to the most predictive features proposed in previous works on automatic pitch accent prediction and show their utility on both read and spontaneous speech. Different accentuation patterns can be associated to the same sentence. Such variability rises the question on how evaluating pitch accent predictors when more patterns are allowed. We carry out a study on prosodic symbols variability on a speech corpus where different speakers read the same text and propose an information-theoretic definition of optionality of symbolic prosodic events that leads to a novel evaluation metric in which prosodic variability is incorporated as a factor affecting prediction accuracy. We additionally propose a method to take advantage of the optionality of prosodic events in unit-selection speech synthesis. To better account for the tight links between the prosodic prominence of a word and the discourse/sentence context, part of this thesis goes beyond the accent/no-accent dichotomy and is devoted to a novel task, the automatic detection of contrast, where contrast is meant as a (Information Structure’s) relation that ties two words that explicitly contrast with each other. This task is mainly motivated by the fact that contrastive words tend to be prosodically marked with particularly prominent pitch accents. The identification of contrastive word pairs is achieved by combining lexical information, syntactic information (which mainly aims to identify the syntactic parallelism that often activates contrast) and semantic information (mainly drawn from the Word- Net semantic lexicon), within a Support Vector Machines classifier. Once we have identified patterns of prosodic prominence we propose methods to incorporate such information in TTS synthesis and test its impact on synthetic speech naturalness trough some large scale perceptual experiments. The results of these experiments cast some doubts on the utility of a simple accent/no-accent distinction in Hidden Markov Model based speech synthesis while highlight the importance of contrastive accents

    Information Structure, Grammar and Strategy in Discourse

    Get PDF
    This dissertation examines two information-structural phenomena, Givenness and Focus, from the perspective of both syntax and pragmatics. Evidence from English, German and other languages suggests a split analysis of information structure--the notions of Focus and Givenness, often thought to be closely related, exist independently at two different levels of linguistic representation. Givenness is encoded as a syntactic feature which presupposes salience in prior discourse and either (1) prevents prosodic prominence (in languages like English and German), or (2) drives syntactic movement (in languages like Italian). On the other hand, Focus, which introduces strong prosodic prominence and a contrastive interpretation, exhibits none of the expected properties of a syntactic feature, and is therefore analyzed quite differently. I argue that Focus is the result of purely pragmatic principles which determine utterance choice in the face of grammatical optionality. The syntactic and phonological systems often generate multiple possible formulations of an utterance, and communicative principles can be invoked to explain the correspondences between certain kinds of discourse contexts and certain patterns of linguistic form. The application of communicative principles to problems of utterance choice is modeled mathematically using the tools of game-theoretic pragmatics. From this perspective, utterances are taken to be strategically chosen in order to maximize communicative effectiveness. Ultimately, the strong differences between Focus and Givenness emphasize a methodological point: both syntactic and pragmatic perspectives are necessary to fully determine the space of possibilities in natural language. Neither perspective should be ignored

    Statistical Knowledge and Learning in Phonology

    Get PDF
    This thesis deals with the theory of the phonetic component of grammar in a formal probabilistic inference framework: (1) it has been recognized since the beginning of generative phonology that some language-specific phonetic implementation is actually context-dependent, and thus it can be said that there are gradient "phonetic processes" in grammar in addition to categorical "phonological processes." However, no explicit theory has been developed to characterize these processes. Meanwhile, (2) it is understood that language acquisition and perception are both really informed guesswork: the result of both types of inference can be reasonably thought to be a less-than-perfect committment, with multiple candidate grammars or parses considered and each associated with some degree of credence. Previous research has used probability theory to formalize these inferences in implemented computational models, especially in phonetics and phonology. In this role, computational models serve to demonstrate the existence of working learning/per- ception/parsing systems assuming a faithful implementation of one particular theory of human language, and are not intended to adjudicate whether that theory is correct. The current thesis (1) develops a theory of the phonetic component of grammar and how it relates to the greater phonological system and (2) uses a formal Bayesian treatment of learning to evaluate this theory of the phonological architecture and for making predictions about how the resulting grammars will be organized. The coarse description of the consequence for linguistic theory is that the processes we think of as "allophonic" are actually language-specific, gradient phonetic processes, assigned to the phonetic component of grammar; strict allophones have no representation in the output of the categorical phonological grammar

    Greek Meter : An Approach Using Metrical Grids and Maxent

    Get PDF
    Standard presentations of ancient Greek poetic meter typically focus on identifying and classifying the repeatable syllable-weight-based patterns found in Greek poetry. This dissertation, by contrast, seeks to understand why selected Greek poets arranged their words in just those patterns instead of some others. Counter to the prevailing approach in classics, which deïŹnes meters as strings of short and long positions, meters are here viewed as abstract rhythmic patterns, made concrete by the phonological representations of verses. A main goal is to explicitly characterize the well-formedness conditions on the correspondences between these abstract patterns and actual lines. The study is couched in the framework of generative metrics. Chapter 1 sets the scope and context of the study and provides a brief rationale for the proposed approach by comparing it with traditional Greek metrics and demonstrating the built-in limitations of the latter in explaining the metrical choices of Greek poets. In addition, the chapter examines some basic features of Greek meter from the perspective of comparative metrics. Chapter 2 discusses the key background assumptions about the structure of meter and defends the view that poetic meters are musical objects rather than purely phonological ones, as some scholars have suggested. Chapter 3 presents the statistical method used in the dissertation to model the metrical intuitions of poets (maximum entropy density estimation). The chapter also introduces a new method for examining the extent to which the inherent rhythms of the relevant language explain the regularities observed in verses. Chapters 4-6 contain the main contributions to the study of Greek meter and the theory of metrics. Chapter 4 presents statistical analyses of four different meters (trochaic tetrameter, archaic and tragic iambic trimeter, comic iambic trimeter, and anapestic dimeter). According to the analyses, the quantitative patterns in these meters can be plausibly described using hierarchical metrical grids and natural metrical constraints. Chapter 5 examines the rhythmically more complex verses of Sappho and Alcaeus in the light of Paul Kiparsky’s recent proposal that the rhythmic aperiodicity that characterizes much early Greek verse is due to syncopation. It is shown that Kiparsky's theory, with some revisions, can be applied to the analysis of the metrical forms used by Sappho and Alcaeus. Chapter 6 argues against the theory of “Prosodic metrics”, which seeks to analyze Greek meters (and those of other languages) by using phonological markedness constraints alone. Chapter 7 summarizes the main results of the dissertation, places them in the context of the recent history of metrical scholarship, and considers directions for further research.Antiikin kreikkalaisen metriikan yleisesitykset tyypillisesti keskittyvĂ€t teksteissĂ€ esiintyvien rytmikuvioiden tunnistamiseen ja luokitteluun. TĂ€mĂ€ vĂ€itöskirja pyrkii sen sijaan ymmĂ€rtĂ€mÀÀn, miksi erÀÀt kreikkalaiset runoilijat kĂ€yttivĂ€t juuri nĂ€itĂ€ kuvioita joidenkin toisten asemesta. Vastoin antiikintutkimuksessa vallitsevaa lĂ€hestymistapaa, jossa runomittoja kuvaillaan lyhyiden ja pitkien tavupositioiden muodostamina jonoina, tĂ€ssĂ€ vĂ€itöskirjassa mittoja tarkastellaan abstrakteina rytmisinĂ€ skeemoina, joita runoilijat konkretisoivat kielen sommitelmilla. Työn pÀÀtavoite on kuvata tĂ€smĂ€llisesti tĂ€llaisten mitta-sĂ€e-vastaavuusparien hyvinmuodostuneisuutta koskevia ehtoja. Tutkimus nivoutuu generatiiivisen metriikan tutkimustraditioon. VĂ€itöskirja koostuu seitsemĂ€stĂ€ luvusta. Luvussa 1 mÀÀritellÀÀn työn tausta ja tavoitteet sekĂ€ motivoidaan valittu lĂ€hestymistapa vertaamalla sitĂ€ traditionaaliseen metriikkaan ja osoittamalla jĂ€lkimmĂ€isen lĂ€hestymistavan sisÀÀnrakennetut rajoitteet sĂ€emuotojen valikoitumisen selittĂ€misessĂ€. LisĂ€ksi luvussa kuvaillaan joitakin kreikkalaisen metriikan peruspiirteitĂ€ komparatiivisen metriikan nĂ€kökulmasta. Luvussa 2 tarkastellaan työn keskeisiĂ€ taustaoletuksia mittojen rakenteesta ja puolustetaan nĂ€kemystĂ€, ettĂ€ runomitat ovat musiikillisia eivĂ€tkĂ€ puhtaasti fonologisia konstruktioita, kuten erÀÀt tutkijat ovat esittĂ€neet. Luvussa 3 esitellÀÀn tilastollinen menetelmĂ€, jota työssĂ€ sovelletaan runoilijoiden mitallisten intuitioiden mallintamiseen (ns. suurimman uskottavuuden estimointi). Luvussa myös esitellÀÀn uusi menetelmĂ€ sen tutkimiseen, miltĂ€ osin kielen ominaisrytmit selittĂ€vĂ€t sĂ€keissĂ€ havaittavia sÀÀnnönmukaisuuksia. Luvut 4-6 sisĂ€ltĂ€vĂ€t työn keskeisen kotribuution kreikan metriikan ja metriikan teorian tutkimukseen. Luvussa 4 esitetÀÀn tilastollinen analyysi neljĂ€stĂ€ eri runomitasta (trokeinen tetrametri, arkaainen ja traaginen jambinen trimetri, koominen jambinen trimetri ja anapestinen dimetri). Analyysien mukaan nĂ€issĂ€ mitoissa sommiteltua kielenainesta voidaan uskottavasti kuvailla hierarkkisten metristen kaavojen ja yksinkertaisten mittarajoitteiden avulla. Luvussa 5 tarkastellaan Sapfon ja Alkaioksen rytmisesti monimutkaisempia sĂ€keitĂ€ analysoiden niitĂ€ Paul Kiparskyn viimeaikaisen ehdotuksen nĂ€kökulmasta, jonka mukaan kreikan varhaisten sĂ€emuotojen nĂ€ennĂ€inen aperiodisuus johtuu yksinkertaisen perussykkeen synkopoinnista. Luvussa osoitetaan, ettĂ€ Kiparskyn teoriaa voidaan muutamin muutoksin soveltaa myös Sapfon ja Alkaioksen kĂ€yttĂ€mien runomittojen analysointiin. Luvussa 6 argumentoidaan nĂ€kemystĂ€ vastaan, jonka mukaan kreikan (ja muiden kielten) mittoja voidaan uskottavasti kuvata pelkkien fonologisten tunnusmerkkirajoitteiden avulla. Luvussa 7 esitetÀÀn yhteenveto vĂ€itöskirjan tĂ€rkeimmistĂ€ tuloksista, kontekstualisoidaan niitĂ€ suhteessa metriikan tutkimuksen lĂ€hihistoriaan sekĂ€ hahmotellaan suuntaviivoja jatkotutkimukselle
    • 

    corecore