2,232 research outputs found

    Parsing Speech: A Neural Approach to Integrating Lexical and Acoustic-Prosodic Information

    Full text link
    In conversational speech, the acoustic signal provides cues that help listeners disambiguate difficult parses. For automatically parsing spoken utterances, we introduce a model that integrates transcribed text and acoustic-prosodic features using a convolutional neural network over energy and pitch trajectories coupled with an attention-based recurrent neural network that accepts text and prosodic features. We find that different types of acoustic-prosodic features are individually helpful, and together give statistically significant improvements in parse and disfluency detection F1 scores over a strong text-only baseline. For this study with known sentence boundaries, error analyses show that the main benefit of acoustic-prosodic features is in sentences with disfluencies, attachment decisions are most improved, and transcription errors obscure gains from prosody.Comment: Accepted in NAACL HLT 201

    Hierarchical Representation and Estimation of Prosody using Continuous Wavelet Transform

    Get PDF
    Prominences and boundaries are the essential constituents of prosodic struc- ture in speech. They provide for means to chunk the speech stream into linguis- tically relevant units by providing them with relative saliences and demarcating them within utterance structures. Prominences and boundaries have both been widely used in both basic research on prosody as well as in text-to-speech syn- thesis. However, there are no representation schemes that would provide for both estimating and modelling them in a unified fashion. Here we present an unsupervised unified account for estimating and representing prosodic promi- nences and boundaries using a scale-space analysis based on continuous wavelet transform. The methods are evaluated and compared to earlier work using the Boston University Radio News corpus. The results show that the proposed method is comparable with the best published supervised annotation methods.Peer reviewe

    Detecting Prominence in Conversational Speech: Pitch Accent, Givenness and Focus

    Get PDF
    The variability and reduction that are characteristic of talking in natural interaction make it very difficult to detect prominence in conversational speech. In this paper, we present analytic studies and automatic detection results for pitch accent, as well as on the realization of information structure phenomena like givenness and focus. For pitch accent, our conditional random field model combining acoustic and textual features has an accuracy of 78%, substantially better than chance performance of 58%. For givenness and focus, our analysis demonstrates that even in conversational speech there are measurable differences in acoustic properties and that an automatic detector for these categories can perform significantly above chance
    • …
    corecore