374 research outputs found

    Robust Estimation of Tone Break Indices from Speech Signal using Multi-Scale Analysis and their Applications

    Get PDF
    The aim of this study is to develop robust algorithm to automatically detect the Tone and Break Indices(ToBI) from the speech signal and explore their applications. iLAST was introduced to analyze the acoustic and prosodic features to detect the ToBI indices. Both expert and data driven rules were used to improve the robustness. The integration of multi-scale signal analysis with rule-based classification has helped in robustly identifying tones that can be used in applications, such as identifying Vowel triangle, emotions from speech etc. Empirical analyses using labeled dataset were performed to illustrate the utility of the proposed approach. Further analyses were conducted to identify the inefficiencies with the proposed approach and address those issues through co-analyses of prosodic features in identifying the major contributors to robust detection of ToBI. It was demonstrated that the proposed approach performs robustly and can be used for developing a wide variety of applications

    Chapter 2: The Original ToBI System and the Evolution of the ToBI Framework

    Get PDF
    In this chapter, the authors will try to identify the essential properties of a ToBI framework annotation system by describing the development and design of the original ToBI conventions. In this description, the authors will overview the general phonological theory and the specific theory of Mainstream American English intonation and prosody that the authors decided to incorporate in the original ToBI tags. The authors will also state the practical principles that led us to make the decisions that the authors did. The chapter is organised as follows. Section 2.2 briefly chronicles how the MAE_ToBI system came into being. Section 2.3 briefly describes the consensus account of English intonation and prosody on which the MAE_ToBI system is based. Section 2.4 catalogues the different components of a MAE_ToBI transcription and lists the salient rules which constrain the relationships between different components. This section also expands upon the theoretical foundations and practical consequences of adopting the general structure of multiple labelling tiers, and particularly the separation of the labels for tones from the labels for indexing prosodic boundary strength. Section 2.5 then describes some of the extensions of the basic ToBI tiers that have been adopted by some sites. This section also compares our decisions about the number of tiers and about inter-tier constraints with the analogous decisions for some of the other ToBI systems described in this book. Section 2.6 discusses the status of the symbolic labels relative to the continuous phonetic records that are also an obligatory component of the MAE_ToBI transcription. Section 2.7 then closes by listing several open research questions that the authors would like to see addressed by MAE_ToBI users and the larger ToBI community

    RRL: A Rich Representation Language for the Description of Agent Behaviour in NECA

    Get PDF
    In this paper, we describe the Rich Representation Language (RRL) which is used in the NECA system. The NECA system generates interactions between two or more animated characters. The RRL is a formal framework for representing the information that is exchanged at the interfaces between the various NECA system modules

    Prosodic Cues "and" Syntactic Disambiguation

    Get PDF
    This work was supported in part by a Summer Graduate Research Fellowship in Cognitive Science provided by the Center for Cognitive Science at The Ohio State University

    Deep Learning for Automatic Assessment and Feedback of Spoken English

    Get PDF
    Growing global demand for learning a second language (L2), particularly English, has led to considerable interest in automatic spoken language assessment, whether for use in computerassisted language learning (CALL) tools or for grading candidates for formal qualifications. This thesis presents research conducted into the automatic assessment of spontaneous nonnative English speech, with a view to be able to provide meaningful feedback to learners. One of the challenges in automatic spoken language assessment is giving candidates feedback on particular aspects, or views, of their spoken language proficiency, in addition to the overall holistic score normally provided. Another is detecting pronunciation and other types of errors at the word or utterance level and feeding them back to the learner in a useful way. It is usually difficult to obtain accurate training data with separate scores for different views and, as examiners are often trained to give holistic grades, single-view scores can suffer issues of consistency. Conversely, holistic scores are available for various standard assessment tasks such as Linguaskill. An investigation is thus conducted into whether assessment scores linked to particular views of the speaker’s ability can be obtained from systems trained using only holistic scores. End-to-end neural systems are designed with structures and forms of input tuned to single views, specifically each of pronunciation, rhythm, intonation and text. By training each system on large quantities of candidate data, individual-view information should be possible to extract. The relationships between the predictions of each system are evaluated to examine whether they are, in fact, extracting different information about the speaker. Three methods of combining the systems to predict holistic score are investigated, namely averaging their predictions and concatenating and attending over their intermediate representations. The combined graders are compared to each other and to baseline approaches. The tasks of error detection and error tendency diagnosis become particularly challenging when the speech in question is spontaneous and particularly given the challenges posed by the inconsistency of human annotation of pronunciation errors. An approach to these tasks is presented by distinguishing between lexical errors, wherein the speaker does not know how a particular word is pronounced, and accent errors, wherein the candidate’s speech exhibits consistent patterns of phone substitution, deletion and insertion. Three annotated corpora x of non-native English speech by speakers of multiple L1s are analysed, the consistency of human annotation investigated and a method presented for detecting individual accent and lexical errors and diagnosing accent error tendencies at the speaker level

    A Prosodic Turkish text-to-speech synthesizer

    Get PDF
    Naturalness in Text-to-Speech systems is very important in achieving high quality waveform. The naturalness of the waveform is highly correlated with phonetic coverage and prosodic features such as, duration and F0 contour. Duration determines the timing for the synthesized phoneme, whereas F0 contour determines fundamental frequency component of the waveform. This thesis presents the development of a prosodic Text-to-Speech System for Turkish Language using the Festival Tool [31]. We describe a complete realization of a new male voice, covering allophones of Turkish using duration and F0 parameters. The duration of the allophones and the word stress have been studied extensively. Sentence stress and phrasal stress are also discussed by in less detail. Carrier words are designed approximately for all allophone-allophone combinations. 1680 carrier words are recorded in a sound-proof recording studio. LPC (linear predictive coding) and RES (residual) parameters are computed. The text normalisation module is implemented for abbreviations and numbers. Durations for the allophones are entered. Sentence level and word level F0 generation modules are implemented. By increasing the number of phonemes and giving prosody we obtained a more natural sounding Text-to-Speech System for Turkish Language

    Function of intonation in task-oriented dialogue

    Get PDF
    This thesis addresses the question of how intonation functions in conversation. It examines the intonation and discourse function of single-word utterances in spontaneous and read-aloud task-oriented dialogue (HCRC Map Task Corpus containing Scottish English; see Anderson et al., 1991). To avoid some of the pitfalls of previous studies in which such comparisons of intonation and discourse structure tend to lack balance and focus more heavily on one analysis at the expense of the other, it employs independently developed analyses. They are the Conversational Games Analysis (as introduced in Kowtko, Isard and Doherty, 1992) and a simple target level representation of intonation. Correlations between categories of intonation and of discourse function in spontaneous dialogue suggest that intonation reflects the function of an utterance. Contrary to what one might expect from reading the literature, these categories are in some cases categories of exclusion rather than inclusion. Similar patterns result from the study of read-aloud dialogue. Discourse function and intonation categories show a measure of correlation. One difference that does appear between patterns across speech modes is that in many instances of discourse function intonation categories shift toward tunes ending low in the speaker's pitch range (e. g. a falling tune) for the read-aloud version. This result is in accord with other contemporary studies (e. g. Blaauw, 1995). The difference between spontaneous and read results suggests that read-aloud dialogue - even that based on scripts which include hesitations and false starts - is not a substitute for eliciting the same intonation strategies that are found in spontaneous dialogue

    The applicability of the O'Connor and Arnold model of English intonation to the analysis of Luxembourgish intonation

    Get PDF
    The aim of this dissertation is to examine whether the well-known O'Connor and Arnold model for the analysis of English intonation can be applied to the analysis of Luxembourgish intonation. After defining the concepts of intonation as applied in this dissertation, the main points of several other important studies on English intonation are discussed. This is followed by a detailed explanation of why the O'Connor and Arnold model was deemed most appropriate and an overview of the main work on Luxembourgish intonation published to date. The methodology of the research is outlined in detail, and the research findings are discussed in depth. The adaptability of the O'Connor and Arnold model to the analysis of Luxembourgish intonation is discussed, and possible alterations are suggested
    • …
    corecore