766 research outputs found

    Extending automatic transcripts in a unified data representation towards a prosodic-based metadata annotation and evaluation

    Get PDF
    This paper describes a framework that extends automatic speech transcripts in order to accommodate relevant information coming from manual transcripts, the speech signal itself, and other resources, like lexica. The proposed framework automatically collects, relates, computes, and stores all relevant information together in a self-contained data source, making it possible to easily provide a wide range of interconnected information suitable for speech analysis, training, and evaluating a number of automatic speech processing tasks. The main goal of this framework is to integrate different linguistic and paralinguistic layers of knowledge for a more complete view of their representation and interactions in several domains and languages. The processing chain is composed of two main stages, where the first consists of integrating the relevant manual annotations in the speech recognition data, and the second consists of further enriching the previous output in order to accommodate prosodic information. The described framework has been used for the identification and analysis of structural metadata in automatic speech transcripts. Initially put to use for automatic detection of punctuation marks and for capitalization recovery from speech data, it has also been recently used for studying the characterization of disfluencies in speech. It was already applied to several domains of Portuguese corpora, and also to English and Spanish Broadcast News corpora

    Adapting Prosody in a Text-to-Speech System

    Get PDF

    French Face-to-Face Interaction: Repetition as a Multimodal Resource

    Get PDF
    International audienceIn this chapter, after presenting the corpus as well as some of theannotations developed in the OTIM project, we then focus on the specificphenomenon of repetition. After briefly discussing this notion, we showthat different degrees of convergence can be achieved by speakersdepending on the multimodal complexity of the repetition and on thetiming in between the repeated element and the model. Although we focusmore specifically on the gestural level, we present a multimodal analysis ofgestural repetitions in which we met several issues linked to multimodalannotations of any type. This gives an overview of crucial issues in crosslevellinguistic annotation, such as the definition of a phenomenonincluding formal and/or functional categorization

    A Study of Accomodation of Prosodic and Temporal Features in Spoken Dialogues in View of Speech Technology Applications

    Get PDF
    Inter-speaker accommodation is a well-known property of human speech and human interaction in general. Broadly it refers to the behavioural patterns of two (or more) interactants and the effect of the (verbal and non-verbal) behaviour of each to that of the other(s). Implementation of thisbehavior in spoken dialogue systems is desirable as an improvement on the naturalness of humanmachine interaction. However, traditional qualitative descriptions of accommodation phenomena do not provide sufficient information for such an implementation. Therefore, a quantitativedescription of inter-speaker accommodation is required. This thesis proposes a methodology of monitoring accommodation during a human or humancomputer dialogue, which utilizes a moving average filter over sequential frames for each speaker. These frames are time-aligned across the speakers, hence the name Time Aligned Moving Average (TAMA). Analysis of spontaneous human dialogue recordings by means of the TAMA methodology reveals ubiquitous accommodation of prosodic features (pitch, intensity and speech rate) across interlocutors, and allows for statistical (time series) modeling of the behaviour, in a way which is meaningful for implementation in spoken dialogue system (SDS) environments.In addition, a novel dialogue representation is proposed that provides an additional point of view to that of TAMA in monitoring accommodation of temporal features (inter-speaker pause length and overlap frequency). This representation is a percentage turn distribution of individual speakercontributions in a dialogue frame which circumvents strict attribution of speaker-turns, by considering both interlocutors as synchronously active. Both TAMA and turn distribution metrics indicate that correlation of average pause length and overlap frequency between speakers can be attributed to accommodation (a debated issue), and point to possible improvements in SDS “turntaking” behaviour. Although the findings of the prosodic and temporal analyses can directly inform SDS implementations, further work is required in order to describe inter-speaker accommodation sufficiently, as well as to develop an adequate testing platform for evaluating the magnitude ofperceived improvement in human-machine interaction. Therefore, this thesis constitutes a first step towards a convincingly useful implementation of accommodation in spoken dialogue systems

    Methods in prosody

    Get PDF
    This book presents a collection of pioneering papers reflecting current methods in prosody research with a focus on Romance languages. The rapid expansion of the field of prosody research in the last decades has given rise to a proliferation of methods that has left little room for the critical assessment of these methods. The aim of this volume is to bridge this gap by embracing original contributions, in which experts in the field assess, reflect, and discuss different methods of data gathering and analysis. The book might thus be of interest to scholars and established researchers as well as to students and young academics who wish to explore the topic of prosody, an expanding and promising area of study

    ISCAN: a System for Integrated Phonetic Analyses Across Speech Corpora

    Get PDF
    Speech corpora of many languages, styles, and formats exist in the world, representing significant potential for the phonetic sciences. However in practice there are significant practical and methodological barriers to conducting the “same study” across corpora, including necessary technical skills and non-comparability of results using non-standardized measures. We introduce an open-source software system for Integrated Speech Corpus ANalysis (ISCAN), which enables automated acoustic phonetic analysis across spoken corpora of diverse formats and sizes. A web-browser-based GUI and Python package allow for different user backgrounds. The system is a major update of core functionality for fully- automated speech corpus analysis (importing, enriching, querying) from a previous version, to meet new goals: different user configurations, working with restricted datasets, and interacting with data (visualization and correction). The system’s flexibility for different projects is shown in two use cases: large-scale automatic segmental analysis of spontaneous speech across English dialects, and smallerscale semi-automatic prosodic analysis
    corecore