Search CORE

1,220 research outputs found

A Factored Language Model for Prosody Dependent Speech Recognition

Author: Jennifer S. Cole
Ken Chen
Mark A. Hasegawa-Johnson
Publication venue: 'IntechOpen'
Publication date: 01/06/2007
Field of study

IntechOpen

Recommended from our members

Chapter 2: The Original ToBI System and the Evolution of the ToBI Framework

Author: Beckman Mary E.
Hirschberg Julia Bell
Shattuck-Hufnagel Stefanie
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2004
Field of study

In this chapter, the authors will try to identify the essential properties of a ToBI framework annotation system by describing the development and design of the original ToBI conventions. In this description, the authors will overview the general phonological theory and the specific theory of Mainstream American English intonation and prosody that the authors decided to incorporate in the original ToBI tags. The authors will also state the practical principles that led us to make the decisions that the authors did. The chapter is organised as follows. Section 2.2 briefly chronicles how the MAE_ToBI system came into being. Section 2.3 briefly describes the consensus account of English intonation and prosody on which the MAE_ToBI system is based. Section 2.4 catalogues the different components of a MAE_ToBI transcription and lists the salient rules which constrain the relationships between different components. This section also expands upon the theoretical foundations and practical consequences of adopting the general structure of multiple labelling tiers, and particularly the separation of the labels for tones from the labels for indexing prosodic boundary strength. Section 2.5 then describes some of the extensions of the basic ToBI tiers that have been adopted by some sites. This section also compares our decisions about the number of tiers and about inter-tier constraints with the analogous decisions for some of the other ToBI systems described in this book. Section 2.6 discusses the status of the symbolic labels relative to the continuous phonetic records that are also an obligatory component of the MAE_ToBI transcription. Section 2.7 then closes by listing several open research questions that the authors would like to see addressed by MAE_ToBI users and the larger ToBI community

Columbia University Academic Commons

Generating Tailored, Comparative Descriptions with Contextually Appropriate Intonation

Author: Johanna D. Moore
Michael White
Robert A. J. Clark
Roberts Craige
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2010
Field of study

Generating responses that take user preferences into account requires adaptation at all levels of the generation process. This article describes a multi-level approach to presenting user-tailored information in spoken dialogues which brings together for the first time multi-attribute decision models, strategic content planning, surface realization that incorporates prosody prediction, and unit selection synthesis that takes the resulting prosodic structure into account. The system selects the most important options to mention and the attributes that are most relevant to choosing between them, based on the user model. Multiple options are selected when each offers a compelling trade-off. To convey these trade-offs, the system employs a novel presentation strategy which straightforwardly lends itself to the determination of information structure, as well as the contents of referring expressions. During surface realization, the prosodic structure is derived from the information structure using Combinatory Categorial Grammar in a way that allows phrase boundaries to be determined in a flexible, data-driven fashion. This approach to choosing pitch accents and edge tones is shown to yield prosodic structures with significantly higher acceptability than baseline prosody prediction models in an expert evaluation. These prosodic structures are then shown to enable perceptibly more natural synthesis using a unit selection voice that aims to produce the target tunes, in comparison to two baseline synthetic voices. An expert evaluation and f0 analysis confirm the superiority of the generator-driven intonation and its contribution to listeners' ratings

CiteSeerX

Crossref

Edinburgh Research Archive

Prosody and sentence disambiguation in European Portuguese

Author: Vigário Marina
Publication venue: 'Universitat Autonoma de Barcelona'
Publication date: 01/01/2003
Field of study

Our investigation focuses on several types of structural ambiguity in European Portuguese. The materials include sentences with set-divider adverbs ambiguous as to the direction of syntactic attachment, adjunct and complement PPs ambiguous as to the level of syntactic embedding, nonrestrictive clauses with local and non-local possible antecedents, and relative clauses ambiguous as to their restrictive/non-restrictive meaning. Besides providing a prosodic description of sentences with these various sorts of ambiguity, the relation between prosody and syntactic structure is addressed. It is concluded that structural ambiguity is not always cued by prosody, and it may be resolved by prosodic means that are optional. Additionally, some options on sentence partition in intonational phrases are only available under some interpretations, and in specific configurations I-breaks may not be inserted (namely, between a head and an adjacent complement or modifier). In all cases studied intonational phrase level properties play a crucial role in sentence disambiguation. An intonational phrase boundary after set-divider adverbs indicates leftattachment and between a constituent and the preceding material implies non-local attachment. These facts are seen to follow in a principled way from the conditions on the formation of intonational phrases

CiteSeerX

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Directory of Open Access Journals

Revistes Catalanes amb Accés Obert

Diposit Digital de Documents de la UAB

Secretaría de Estado de Cultura

Information structure in linguistic theory and in speech production : validation of a cross-linguistic data set

Author: Hellmuth Sam
Skopeteas Stavros
Publication venue
Publication date: 01/01/2007
Field of study

The aim of this paper is to validate a dataset collected by means of production experiments which are part of the Questionnaire on Information Structure. The experiments generate a range of information structure contexts that have been observed in the literature to induce specific constructions. This paper compares the speech production results from a subset of these experiments with specific claims about the reflexes of information structure in four different languages. The results allow us to evaluate and in most cases validate the efficacy of our elicitation paradigms, to identify potentially fruitful avenues of future research, and to highlight issues involved in interpreting speech production data of this kind

Publications at Bielefeld University

Hochschulschriftenserver - Universität Frankfurt am Main

In defence of underlying representations: Latin rhotacism, French liaison, Romanian palatalization

Author: Bermúdez-Otero Ricardo
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 22/08/2018
Field of study

Crossref

The University of Manchester - Institutional Repository

Intonation, word order and focus projection in Serbo-Croatian

Author: Godjevac S.
Publication venue: Ohio State Univ.
Publication date: 01/01/2000
Field of study

LoC Class: PG1224.7, LoC Subject Headings: Serbo-Croatian language--Intonation, Serbo-Croatian language--Word orde

OhioLINK Electronic Thesis and Dissertation Center

MPG.PuRe

Prosodic phrase break prediction: problems in the evaluation of models against a gold standard

Author: Atwell E
Brierley C
Publication venue: Association pour le Traitement Automatique des Langues
Publication date: 01/01/2007
Field of study

The goal of automatic phrase break prediction is to identify prosodic-syntactic boundaries in text which correspond to the way a native speaker might process or chunk that same text as speech. This is treated as a classification task in machine learning and output predictions from language models are evaluated against a ‘gold standard’: human-labelled prosodic phrase break annotations in transcriptions of recorded speech - the speech corpus. Despite the introduction of rigorous metrics such as precision and recall, the evaluation of phrase break models is still problematic because prosody is inherently variable; morphosyntactic analysis and prosodic annotations for a given text are not representative of the range of parsing and phrasing strategies available to, and exhibited by, native speakers. This article recommends creating automatically-generated POS tagged and prosodically annotated variants of a text to enrich the gold standard and enable more robust ‘noise-tolerant’ evaluation of language models

White Rose Research Online

On Left and Right Dislocation: A Dynamic Perspective

Author: Cann Ronnie
Kempson Ruth
Otsuka Masayuki
Publication venue
Publication date: 01/01/2002
Field of study

The paper argues that by modelling the incremental and left-right process of interpretation as a process of growth of logical form (representing logical forms as trees), an integrated typology of left-dislocation and right-dislocation phenomena becomes available, bringing out not merely the similarities between these types of phenomena, but also their asymmetry. The data covered include hanging topic left dislocation, clitic left dislocation, left dislocation, pronoun doubling, expletives, extraposition, and right node raising, with each set of data analysed in terms of general principles of tree growth. In the light of the success in providing a characterisation of the asymmetry between left and right periphery phenomena, a result not achieved in more wellknown formalisms, the paper concludes that grammar formalisms should model the dynamics of language processing in time.Articl

SAS-SPACE