Search CORE

94 research outputs found

Evaluation of automatic break insertion for an agglutinative and inflected language

Author: Agresti
Allen
Bachenko
Blum
Breiman
Carletta
Eva Navas
Frazier
Hirschberg
Inmaculada Hernáez
Iñaki Sainz
Liberman
Maragoudakis
Oparin
Ostendorf
Read
Salton
Sangho
Siegel
Stone
Taylor
van Rijsbergen
Wang
Yoon
Zellner
Zervas
Publication venue: 'Elsevier BV'
Publication date
Field of study

Predictability effects in language acquisition

Author: Pate John Kenton
Publication venue: The University of Edinburgh
Publication date: 02/07/2013
Field of study

Human language has two fundamental requirements: it must allow competent speakers to exchange messages efficiently, and it must be readily learned by children. Recent work has examined effects of language predictability on language production, with many researchers arguing that so-called “predictability effects” function towards the efficiency requirement. Specifically, recent work has found that talkers tend to reduce linguistic forms that are more probable more heavily. This dissertation proposes the “Predictability Bootstrapping Hypothesis” that predictability effects also make language more learnable. There is a great deal of evidence that the adult grammars have substantial statistical components. Since predictability effects result in heavier reduction for more probable words and hidden structure, they provide infants with direct cues to the statistical components of the grammars they are trying to learn. The corpus studies and computational modeling experiments in this dissertation show that predictability effects could be a substantial source of information to language-learning infants, focusing on the potential utility of phonetic reduction in terms of word duration for syntax acquisition. First, corpora of spontaneous adult-directed and child-directed speech (ADS and CDS, respectively) are compared to verify that predictability effects actually exist in CDS. While revealing some differences, mixed effects regressions on those corpora indicate that predictability effects in CDS are largely similar (in kind and magnitude) to predictability effects in ADS. This result indicates that predictability effects are available to infants, however useful they may be. Second, this dissertation builds probabilistic, unsupervised, and lexicalized models for learning about syntax from words and durational cues. One series of models is based on Hidden Markov Models and learns shallow constituency structure, while the other series is based on the Dependency Model with Valence and learns dependency structure. These models are then used to measure how useful durational cues are for syntax acquisition, and to what extent their utility in this task can be attributed to effects of syntactic predictability on word duration. As part of this investigation, these models are also used to explore the venerable “Prosodic Bootstrapping Hypothesis” that prosodic structure, which is cued in part by word duration, may be useful for syntax acquisition. The empirical evaluations of these models provide evidence that effects of syntactic predictability on word duration are easier to discover and exploit than effects of prosodic structure, and that even gold-standard annotations of prosodic structure provide at most a relatively small improvement in parsing performance over raw word duration. Taken together, this work indicates that predictability effects provide useful information about syntax to infants, showing that the Predictability Bootstrapping Hypothesis for syntax acquisition is computationally plausible and motivating future behavioural investigation. Additionally, as talkers consider the probability of many different aspects of linguistic structure when reducing according to predictability effects, this result also motivates investigation of Predictability Bootstrapping of other aspects of linguistic knowledge

Edinburgh Research Archive

Analysis by Synthesis: A (Re-)Emerging Program of Research for Language and Vision

Author: Bever Thomas G.
Poeppel David
Publication venue: BIOLINGUISTICS
Publication date: 01/09/2010
Field of study

This contribution reviews (some of) the history of analysis by synthesis, an approach to perception and comprehension articulated in the 1950s. Whereas much research has focused on bottom-up, feed-forward, inductive mechanisms, analysis by synthesis as a heuristic model emphasizes a balance of bottom-up and knowledge-driven, top-down, predictive steps in speech perception and language comprehension. This idea aligns well with contemporary Bayesian approaches to perception (in language and other domains), which are illustrated with examples from different aspects of perception and comprehension. Results from psycholinguistics, the cognitive neuroscience of language, and visual object recognition suggest that analysis by synthesis can provide a productive way of structuring biolinguistic research. Current evidence suggests that such a model is theoretically well motivated, biologically sensible, and becomes computationally tractable borrowing from Bayesian formalizations

Directory of Open Access Journals

Biolinguistics (E-Journal)

Dialogue Act Modeling for Automatic Tagging and Recognition of Conversational Speech

Author: Andreas Stolcke
Berger Adam L
Carletta Jean
Carol Van Ess-Dykema
Daniel Jurafsky
Dermatas Evangelos
Elizabeth Shriberg
Grosz Barbara J
Hirschberg Julia B
Klaus Ries
Marie Meteer
Noah Coccaro
Paul Taylor
Rachel Martin
Rebecca Bates
Publication venue
Publication date: 01/01/2000
Field of study

We describe a statistical approach for modeling dialogue acts in conversational speech, i.e., speech-act-like units such as Statement, Question, Backchannel, Agreement, Disagreement, and Apology. Our model detects and predicts dialogue acts based on lexical, collocational, and prosodic cues, as well as on the discourse coherence of the dialogue act sequence. The dialogue model is based on treating the discourse structure of a conversation as a hidden Markov model and the individual dialogue acts as observations emanating from the model states. Constraints on the likely sequence of dialogue acts are modeled via a dialogue act n-gram. The statistical dialogue grammar is combined with word n-grams, decision trees, and neural networks modeling the idiosyncratic lexical and prosodic manifestations of each dialogue act. We develop a probabilistic integration of speech recognition with dialogue modeling, to improve both speech recognition and dialogue act classification accuracy. Models are trained and evaluated using a large hand-labeled database of 1,155 conversations from the Switchboard corpus of spontaneous human-to-human telephone speech. We achieved good dialogue act labeling accuracy (65% based on errorful, automatically recognized words and prosody, and 71% based on word transcripts, compared to a chance baseline accuracy of 35% and human accuracy of 84%) and a small reduction in word recognition error.Comment: 35 pages, 5 figures. Changes in copy editing (note title spelling changed

arXiv.org e-Print Archive

CiteSeerX

Crossref

Edinburgh Research Archive

Institutional Repository for Minnesota State University, Mankato

Unsupervised learning for text-to-speech synthesis

Author: Watts Oliver Samuel
Publication venue: The University of Edinburgh
Publication date: 02/07/2013
Field of study

This thesis introduces a general method for incorporating the distributional analysis of textual and linguistic objects into text-to-speech (TTS) conversion systems. Conventional TTS conversion uses intermediate layers of representation to bridge the gap between text and speech. Collecting the annotated data needed to produce these intermediate layers is a far from trivial task, possibly prohibitively so for languages in which no such resources are in existence. Distributional analysis, in contrast, proceeds in an unsupervised manner, and so enables the creation of systems using textual data that are not annotated. The method therefore aids the building of systems for languages in which conventional linguistic resources are scarce, but is not restricted to these languages. The distributional analysis proposed here places the textual objects analysed in a continuous-valued space, rather than specifying a hard categorisation of those objects. This space is then partitioned during the training of acoustic models for synthesis, so that the models generalise over objects' surface forms in a way that is acoustically relevant. The method is applied to three levels of textual analysis: to the characterisation of sub-syllabic units, word units and utterances. Entire systems for three languages (English, Finnish and Romanian) are built with no reliance on manually labelled data or language-specific expertise. Results of a subjective evaluation are presented

Edinburgh Research Archive

Recommended from our members

Learning with Joint Inference and Latent Linguistic Structure in Graphical Models

Author: Narad Jason
Publication venue: ScholarWorks@UMass Amherst
Publication date: 17/03/2015
Field of study

Constructing end-to-end NLP systems requires the processing of many types of linguistic information prior to solving the desired end task. A common approach to this problem is to construct a pipeline, one component for each task, with each system\u27s output becoming input for the next. This approach poses two problems. First, errors propagate, and, much like the childhood game of telephone , combining systems in this manner can lead to unintelligible outcomes. Second, each component task requires annotated training data to act as supervision for training the model. These annotations are often expensive and time-consuming to produce, may differ from each other in genre and style, and may not match the intended application. In this dissertation we present a general framework for constructing and reasoning on joint graphical model formulations of NLP problems. Individual models are composed using weighted Boolean logic constraints, and inference is performed using belief propagation. The systems we develop are composed of two parts: one a representation of syntax, the other a desired end task (semantic role labeling, named entity recognition, or relation extraction). By modeling these problems jointly, both models are trained in a single, integrated process, with uncertainty propagated between them. This mitigates the accumulation of errors typical of pipelined approaches. Additionally we propose a novel marginalization-based training method in which the error signal from end task annotations is used to guide the induction of a constrained latent syntactic representation. This allows training in the absence of syntactic training data, where the latent syntactic structure is instead optimized to best support the end task predictions. We find that across many NLP tasks this training method offers performance comparable to fully supervised training of each individual component, and in some instances improves upon it by learning latent structures which are more appropriate for the task

ScholarWorks@UMass Amherst

Macquarie University ResearchOnline

Identifying prosodic prominence patterns for English text-to-speech synthesis

Author: Badino Leonardo
Publication venue: The University of Edinburgh
Publication date: 01/01/2010
Field of study

This thesis proposes to improve and enrich the expressiveness of English Text-to-Speech (TTS) synthesis by identifying and generating natural patterns of prosodic prominence. In most state-of-the-art TTS systems the prediction from text of prosodic prominence relations between words in an utterance relies on features that very loosely account for the combined effects of syntax, semantics, word informativeness and salience, on prosodic prominence. To improve prosodic prominence prediction we first follow up the classic approach in which prosodic prominence patterns are flattened into binary sequences of pitch accented and pitch unaccented words. We propose and motivate statistic and syntactic dependency based features that are complementary to the most predictive features proposed in previous works on automatic pitch accent prediction and show their utility on both read and spontaneous speech. Different accentuation patterns can be associated to the same sentence. Such variability rises the question on how evaluating pitch accent predictors when more patterns are allowed. We carry out a study on prosodic symbols variability on a speech corpus where different speakers read the same text and propose an information-theoretic definition of optionality of symbolic prosodic events that leads to a novel evaluation metric in which prosodic variability is incorporated as a factor affecting prediction accuracy. We additionally propose a method to take advantage of the optionality of prosodic events in unit-selection speech synthesis. To better account for the tight links between the prosodic prominence of a word and the discourse/sentence context, part of this thesis goes beyond the accent/no-accent dichotomy and is devoted to a novel task, the automatic detection of contrast, where contrast is meant as a (Information Structure’s) relation that ties two words that explicitly contrast with each other. This task is mainly motivated by the fact that contrastive words tend to be prosodically marked with particularly prominent pitch accents. The identification of contrastive word pairs is achieved by combining lexical information, syntactic information (which mainly aims to identify the syntactic parallelism that often activates contrast) and semantic information (mainly drawn from the Word- Net semantic lexicon), within a Support Vector Machines classifier. Once we have identified patterns of prosodic prominence we propose methods to incorporate such information in TTS synthesis and test its impact on synthetic speech naturalness trough some large scale perceptual experiments. The results of these experiments cast some doubts on the utility of a simple accent/no-accent distinction in Hidden Markov Model based speech synthesis while highlight the importance of contrastive accents

Edinburgh Research Archive

Information Structure, Grammar and Strategy in Discourse

Author: Stevens Jon Scott
Publication venue: ScholarlyCommons
Publication date: 01/01/2013
Field of study

This dissertation examines two information-structural phenomena, Givenness and Focus, from the perspective of both syntax and pragmatics. Evidence from English, German and other languages suggests a split analysis of information structure--the notions of Focus and Givenness, often thought to be closely related, exist independently at two different levels of linguistic representation. Givenness is encoded as a syntactic feature which presupposes salience in prior discourse and either (1) prevents prosodic prominence (in languages like English and German), or (2) drives syntactic movement (in languages like Italian). On the other hand, Focus, which introduces strong prosodic prominence and a contrastive interpretation, exhibits none of the expected properties of a syntactic feature, and is therefore analyzed quite differently. I argue that Focus is the result of purely pragmatic principles which determine utterance choice in the face of grammatical optionality. The syntactic and phonological systems often generate multiple possible formulations of an utterance, and communicative principles can be invoked to explain the correspondences between certain kinds of discourse contexts and certain patterns of linguistic form. The application of communicative principles to problems of utterance choice is modeled mathematically using the tools of game-theoretic pragmatics. From this perspective, utterances are taken to be strategically chosen in order to maximize communicative effectiveness. Ultimately, the strong differences between Focus and Givenness emphasize a methodological point: both syntactic and pragmatic perspectives are necessary to fully determine the space of possibilities in natural language. Neither perspective should be ignored

ScholarlyCommons@Penn

Statistical Knowledge and Learning in Phonology

Author: Dunbar Ewan
Publication venue
Publication date: 01/01/2013
Field of study

This thesis deals with the theory of the phonetic component of grammar in a formal probabilistic inference framework: (1) it has been recognized since the beginning of generative phonology that some language-specific phonetic implementation is actually context-dependent, and thus it can be said that there are gradient "phonetic processes" in grammar in addition to categorical "phonological processes." However, no explicit theory has been developed to characterize these processes. Meanwhile, (2) it is understood that language acquisition and perception are both really informed guesswork: the result of both types of inference can be reasonably thought to be a less-than-perfect committment, with multiple candidate grammars or parses considered and each associated with some degree of credence. Previous research has used probability theory to formalize these inferences in implemented computational models, especially in phonetics and phonology. In this role, computational models serve to demonstrate the existence of working learning/per- ception/parsing systems assuming a faithful implementation of one particular theory of human language, and are not intended to adjudicate whether that theory is correct. The current thesis (1) develops a theory of the phonetic component of grammar and how it relates to the greater phonological system and (2) uses a formal Bayesian treatment of learning to evaluate this theory of the phonological architecture and for making predictions about how the resulting grammars will be organized. The coarse description of the consequence for linguistic theory is that the processes we think of as "allophonic" are actually language-specific, gradient phonetic processes, assigned to the phonetic component of grammar; strict allophones have no representation in the output of the categorical phonological grammar

Digital Repository at the University of Maryland

Greek Meter : An Approach Using Metrical Grids and Maxent

Author: Henriksson Erik
Publication venue: 'University of Helsinki Libraries'
Publication date: 25/03/2022
Field of study

Standard presentations of ancient Greek poetic meter typically focus on identifying and classifying the repeatable syllable-weight-based patterns found in Greek poetry. This dissertation, by contrast, seeks to understand why selected Greek poets arranged their words in just those patterns instead of some others. Counter to the prevailing approach in classics, which deﬁnes meters as strings of short and long positions, meters are here viewed as abstract rhythmic patterns, made concrete by the phonological representations of verses. A main goal is to explicitly characterize the well-formedness conditions on the correspondences between these abstract patterns and actual lines. The study is couched in the framework of generative metrics. Chapter 1 sets the scope and context of the study and provides a brief rationale for the proposed approach by comparing it with traditional Greek metrics and demonstrating the built-in limitations of the latter in explaining the metrical choices of Greek poets. In addition, the chapter examines some basic features of Greek meter from the perspective of comparative metrics. Chapter 2 discusses the key background assumptions about the structure of meter and defends the view that poetic meters are musical objects rather than purely phonological ones, as some scholars have suggested. Chapter 3 presents the statistical method used in the dissertation to model the metrical intuitions of poets (maximum entropy density estimation). The chapter also introduces a new method for examining the extent to which the inherent rhythms of the relevant language explain the regularities observed in verses. Chapters 4-6 contain the main contributions to the study of Greek meter and the theory of metrics. Chapter 4 presents statistical analyses of four different meters (trochaic tetrameter, archaic and tragic iambic trimeter, comic iambic trimeter, and anapestic dimeter). According to the analyses, the quantitative patterns in these meters can be plausibly described using hierarchical metrical grids and natural metrical constraints. Chapter 5 examines the rhythmically more complex verses of Sappho and Alcaeus in the light of Paul Kiparsky’s recent proposal that the rhythmic aperiodicity that characterizes much early Greek verse is due to syncopation. It is shown that Kiparsky's theory, with some revisions, can be applied to the analysis of the metrical forms used by Sappho and Alcaeus. Chapter 6 argues against the theory of “Prosodic metrics”, which seeks to analyze Greek meters (and those of other languages) by using phonological markedness constraints alone. Chapter 7 summarizes the main results of the dissertation, places them in the context of the recent history of metrical scholarship, and considers directions for further research.Antiikin kreikkalaisen metriikan yleisesitykset tyypillisesti keskittyvät teksteissä esiintyvien rytmikuvioiden tunnistamiseen ja luokitteluun. Tämä väitöskirja pyrkii sen sijaan ymmärtämään, miksi eräät kreikkalaiset runoilijat käyttivät juuri näitä kuvioita joidenkin toisten asemesta. Vastoin antiikintutkimuksessa vallitsevaa lähestymistapaa, jossa runomittoja kuvaillaan lyhyiden ja pitkien tavupositioiden muodostamina jonoina, tässä väitöskirjassa mittoja tarkastellaan abstrakteina rytmisinä skeemoina, joita runoilijat konkretisoivat kielen sommitelmilla. Työn päätavoite on kuvata täsmällisesti tällaisten mitta-säe-vastaavuusparien hyvinmuodostuneisuutta koskevia ehtoja. Tutkimus nivoutuu generatiiivisen metriikan tutkimustraditioon. Väitöskirja koostuu seitsemästä luvusta. Luvussa 1 määritellään työn tausta ja tavoitteet sekä motivoidaan valittu lähestymistapa vertaamalla sitä traditionaaliseen metriikkaan ja osoittamalla jälkimmäisen lähestymistavan sisäänrakennetut rajoitteet säemuotojen valikoitumisen selittämisessä. Lisäksi luvussa kuvaillaan joitakin kreikkalaisen metriikan peruspiirteitä komparatiivisen metriikan näkökulmasta. Luvussa 2 tarkastellaan työn keskeisiä taustaoletuksia mittojen rakenteesta ja puolustetaan näkemystä, että runomitat ovat musiikillisia eivätkä puhtaasti fonologisia konstruktioita, kuten eräät tutkijat ovat esittäneet. Luvussa 3 esitellään tilastollinen menetelmä, jota työssä sovelletaan runoilijoiden mitallisten intuitioiden mallintamiseen (ns. suurimman uskottavuuden estimointi). Luvussa myös esitellään uusi menetelmä sen tutkimiseen, miltä osin kielen ominaisrytmit selittävät säkeissä havaittavia säännönmukaisuuksia. Luvut 4-6 sisältävät työn keskeisen kotribuution kreikan metriikan ja metriikan teorian tutkimukseen. Luvussa 4 esitetään tilastollinen analyysi neljästä eri runomitasta (trokeinen tetrametri, arkaainen ja traaginen jambinen trimetri, koominen jambinen trimetri ja anapestinen dimetri). Analyysien mukaan näissä mitoissa sommiteltua kielenainesta voidaan uskottavasti kuvailla hierarkkisten metristen kaavojen ja yksinkertaisten mittarajoitteiden avulla. Luvussa 5 tarkastellaan Sapfon ja Alkaioksen rytmisesti monimutkaisempia säkeitä analysoiden niitä Paul Kiparskyn viimeaikaisen ehdotuksen näkökulmasta, jonka mukaan kreikan varhaisten säemuotojen näennäinen aperiodisuus johtuu yksinkertaisen perussykkeen synkopoinnista. Luvussa osoitetaan, että Kiparskyn teoriaa voidaan muutamin muutoksin soveltaa myös Sapfon ja Alkaioksen käyttämien runomittojen analysointiin. Luvussa 6 argumentoidaan näkemystä vastaan, jonka mukaan kreikan (ja muiden kielten) mittoja voidaan uskottavasti kuvata pelkkien fonologisten tunnusmerkkirajoitteiden avulla. Luvussa 7 esitetään yhteenveto väitöskirjan tärkeimmistä tuloksista, kontekstualisoidaan niitä suhteessa metriikan tutkimuksen lähihistoriaan sekä hahmotellaan suuntaviivoja jatkotutkimukselle

Helsingin yliopiston digitaalinen arkisto