Search CORE

7,156 research outputs found

Prosodic Event Recognition using Convolutional Neural Networks with Context Information

Author: Stehwien Sabrina
Vu Ngoc Thang
Publication venue
Publication date: 02/06/2017
Field of study

This paper demonstrates the potential of convolutional neural networks (CNN) for detecting and classifying prosodic events on words, specifically pitch accents and phrase boundary tones, from frame-based acoustic features. Typical approaches use not only feature representations of the word in question but also its surrounding context. We show that adding position features indicating the current word benefits the CNN. In addition, this paper discusses the generalization from a speaker-dependent modelling approach to a speaker-independent setup. The proposed method is simple and efficient and yields strong results not only in speaker-dependent but also speaker-independent cases.Comment: Interspeech 2017 4 pages, 1 figur

arXiv.org e-Print Archive

Crossref

Parsing Speech: A Neural Approach to Integrating Lexical and Acoustic-Prosodic Information

Author: Bansal Mohit
Gimpel Kevin
Livescu Karen
Ostendorf Mari
Toshniwal Shubham
Tran Trang
Publication venue
Publication date: 01/01/2018
Field of study

In conversational speech, the acoustic signal provides cues that help listeners disambiguate difficult parses. For automatically parsing spoken utterances, we introduce a model that integrates transcribed text and acoustic-prosodic features using a convolutional neural network over energy and pitch trajectories coupled with an attention-based recurrent neural network that accepts text and prosodic features. We find that different types of acoustic-prosodic features are individually helpful, and together give statistically significant improvements in parse and disfluency detection F1 scores over a strong text-only baseline. For this study with known sentence boundaries, error analyses show that the main benefit of acoustic-prosodic features is in sentences with disfluencies, attachment decisions are most improved, and transcription errors obscure gains from prosody.Comment: Accepted in NAACL HLT 201

arXiv.org e-Print Archive

Crossref

Articulatory features for speech-driven head motion synthesis

Author: Ben Youssef Atef
Braude David A.
Shimodaira Hiroshi
Publication venue
Publication date: 01/08/2013
Field of study

This study investigates the use of articulatory features for speech-driven head motion synthesis as opposed to prosody features such as F0 and energy that have been mainly used in the literature. In the proposed approach, multi-stream HMMs are trained jointly on the synchronous streams of speech and head motion data. Articulatory features can be regarded as an intermediate parametrisation of speech that are expected to have a close link with head movement. Measured head and articulatory movements acquired by EMA were synchronously recorded with speech. Measured articulatory data was compared to those predicted from speech using an HMM-based inversion mapping system trained in a semi-supervised fashion. Canonical correlation analysis (CCA) on a data set of free speech of 12 people shows that the articulatory features are more correlated with head rotation than prosodic and/or cepstral speech features. It is also shown that the synthesised head motion using articulatory features gave higher correlations with the original head motion than when only prosodic features are used. Index Terms: head motion synthesis, articulatory features, canonical correlation analysis, acoustic-to-articulatory mappin

CiteSeerX

Edinburgh Research Explorer

Acoustic correlates of stress in Besemah

Author: McDonnell Bradley
Publication venue
Publication date: 30/03/2016
Field of study

Prometheus-Academic Collections