374 research outputs found
Robust Estimation of Tone Break Indices from Speech Signal using Multi-Scale Analysis and their Applications
The aim of this study is to develop robust algorithm to automatically detect the Tone and Break Indices(ToBI) from the speech signal and explore their applications. iLAST was introduced to analyze the acoustic and prosodic features to detect the ToBI indices. Both expert and data driven rules were used to improve the robustness. The integration of multi-scale signal analysis with rule-based classification has helped in robustly identifying tones that can be used in applications, such as identifying Vowel triangle, emotions from speech etc. Empirical analyses using labeled dataset were performed to illustrate the utility of the proposed approach. Further analyses were conducted to identify the inefficiencies with the proposed approach and address those issues through co-analyses of prosodic features in identifying the major contributors to robust detection of ToBI. It was demonstrated that the proposed approach performs robustly and can be used for developing a wide variety of applications
Chapter 2: The Original ToBI System and the Evolution of the ToBI Framework
In this chapter, the authors will try to identify the essential properties of a ToBI framework annotation system by describing the development and design of the original ToBI conventions. In this description, the authors will overview the general phonological theory and the specific theory of Mainstream American English intonation and prosody that the authors decided to incorporate in the original ToBI tags. The authors will also state the practical principles that led us to make the decisions that the authors did. The chapter is organised as follows. Section 2.2 briefly chronicles how the MAE_ToBI system came into being. Section 2.3 briefly describes the consensus account of English intonation and prosody on which the MAE_ToBI system is based. Section 2.4 catalogues the different components of a MAE_ToBI transcription and lists the salient rules which constrain the relationships between different components. This section also expands upon the theoretical foundations and practical consequences of adopting the general structure of multiple labelling tiers, and particularly the separation of the labels for tones from the labels for indexing prosodic boundary strength. Section 2.5 then describes some of the extensions of the basic ToBI tiers that have been adopted by some sites. This section also compares our decisions about the number of tiers and about inter-tier constraints with the analogous decisions for some of the other ToBI systems described in this book. Section 2.6 discusses the status of the symbolic labels relative to the continuous phonetic records that are also an obligatory component of the MAE_ToBI transcription. Section 2.7 then closes by listing several open research questions that the authors would like to see addressed by MAE_ToBI users and the larger ToBI community
RRL: A Rich Representation Language for the Description of Agent Behaviour in NECA
In this paper, we describe the Rich Representation Language (RRL) which is used in the NECA system. The NECA system generates interactions between two or more animated characters. The RRL is a formal framework for representing the information that is exchanged at the interfaces between the various NECA system modules
Prosodic Cues "and" Syntactic Disambiguation
This work was supported in part by a Summer Graduate Research Fellowship in Cognitive Science provided by the Center for Cognitive Science at The Ohio State University
Deep Learning for Automatic Assessment and Feedback of Spoken English
Growing global demand for learning a second language (L2), particularly English, has led to
considerable interest in automatic spoken language assessment, whether for use in computerassisted language learning (CALL) tools or for grading candidates for formal qualifications.
This thesis presents research conducted into the automatic assessment of spontaneous nonnative English speech, with a view to be able to provide meaningful feedback to learners. One
of the challenges in automatic spoken language assessment is giving candidates feedback on
particular aspects, or views, of their spoken language proficiency, in addition to the overall
holistic score normally provided. Another is detecting pronunciation and other types of errors
at the word or utterance level and feeding them back to the learner in a useful way.
It is usually difficult to obtain accurate training data with separate scores for different
views and, as examiners are often trained to give holistic grades, single-view scores can
suffer issues of consistency. Conversely, holistic scores are available for various standard
assessment tasks such as Linguaskill. An investigation is thus conducted into whether
assessment scores linked to particular views of the speaker’s ability can be obtained from
systems trained using only holistic scores.
End-to-end neural systems are designed with structures and forms of input tuned to single
views, specifically each of pronunciation, rhythm, intonation and text. By training each
system on large quantities of candidate data, individual-view information should be possible
to extract. The relationships between the predictions of each system are evaluated to examine
whether they are, in fact, extracting different information about the speaker. Three methods
of combining the systems to predict holistic score are investigated, namely averaging their
predictions and concatenating and attending over their intermediate representations. The
combined graders are compared to each other and to baseline approaches.
The tasks of error detection and error tendency diagnosis become particularly challenging
when the speech in question is spontaneous and particularly given the challenges posed by
the inconsistency of human annotation of pronunciation errors. An approach to these tasks is
presented by distinguishing between lexical errors, wherein the speaker does not know how a
particular word is pronounced, and accent errors, wherein the candidate’s speech exhibits
consistent patterns of phone substitution, deletion and insertion. Three annotated corpora
x
of non-native English speech by speakers of multiple L1s are analysed, the consistency of
human annotation investigated and a method presented for detecting individual accent and
lexical errors and diagnosing accent error tendencies at the speaker level
A Prosodic Turkish text-to-speech synthesizer
Naturalness in Text-to-Speech systems is very important in achieving high quality waveform. The naturalness of the waveform is highly correlated with phonetic coverage and prosodic features such as, duration and F0 contour. Duration determines the timing for the synthesized phoneme, whereas F0 contour determines fundamental frequency component of the waveform. This thesis presents the development of a prosodic Text-to-Speech System for Turkish Language using the Festival Tool [31]. We describe a complete realization of a new male voice, covering allophones of Turkish using duration and F0 parameters. The duration of the allophones and the word stress have been studied extensively. Sentence stress and phrasal stress are also discussed by in less detail. Carrier words are designed approximately for all allophone-allophone combinations. 1680 carrier words are recorded in a sound-proof recording studio. LPC (linear predictive coding) and RES (residual) parameters are computed. The text normalisation module is implemented for abbreviations and numbers. Durations for the allophones are entered. Sentence level and word level F0 generation modules are implemented. By increasing the number of phonemes and giving prosody we obtained a more natural sounding Text-to-Speech System for Turkish Language
Function of intonation in task-oriented dialogue
This thesis addresses the question of how intonation functions in conversation.
It examines the intonation and discourse function of single-word utterances in
spontaneous and read-aloud task-oriented dialogue (HCRC Map Task Corpus
containing Scottish English; see Anderson et al., 1991). To avoid some of the
pitfalls of previous studies in which such comparisons of intonation and discourse
structure tend to lack balance and focus more heavily on one analysis at
the expense of the other, it employs independently developed analyses. They
are the Conversational Games Analysis (as introduced in Kowtko, Isard and
Doherty, 1992) and a simple target level representation of intonation. Correlations
between categories of intonation and of discourse function in spontaneous
dialogue suggest that intonation reflects the function of an utterance. Contrary
to what one might expect from reading the literature, these categories are in
some cases categories of exclusion rather than inclusion.
Similar patterns result from the study of read-aloud dialogue. Discourse
function and intonation categories show a measure of correlation. One difference
that does appear between patterns across speech modes is that in many
instances of discourse function intonation categories shift toward tunes ending
low in the speaker's pitch range (e. g. a falling tune) for the read-aloud version.
This result is in accord with other contemporary studies (e. g. Blaauw, 1995).
The difference between spontaneous and read results suggests that read-aloud
dialogue - even that based on scripts which include hesitations and false starts
- is not a substitute for eliciting the same intonation strategies that are found
in spontaneous dialogue
The applicability of the O'Connor and Arnold model of English intonation to the analysis of Luxembourgish intonation
The aim of this dissertation is to examine whether the well-known O'Connor and Arnold model for the analysis of English intonation can be applied to the analysis of Luxembourgish intonation. After defining the concepts of intonation as applied in this dissertation, the main points of several other important studies on English intonation are discussed. This is followed by a detailed explanation of why the O'Connor and Arnold model was deemed most appropriate and an overview of the main work on Luxembourgish intonation published to date. The methodology of the research is outlined in detail, and the research findings are discussed in depth. The adaptability of the O'Connor and Arnold model to the analysis of Luxembourgish intonation is discussed, and possible alterations are suggested
- …