Parsing Speech: A Neural Approach to Integrating Lexical and
  Acoustic-Prosodic Information

Bansal, Mohit; Gimpel, Kevin; Livescu, Karen; Ostendorf, Mari; Toshniwal, Shubham; Tran, Trang

research

Parsing Speech: A Neural Approach to Integrating Lexical and Acoustic-Prosodic Information

Authors: Mohit Bansal
Kevin Gimpel
Karen Livescu
Mari Ostendorf
Shubham Toshniwal
Trang Tran
Publication date: 1 January 2018
Publisher
Doi

Abstract

In conversational speech, the acoustic signal provides cues that help listeners disambiguate difficult parses. For automatically parsing spoken utterances, we introduce a model that integrates transcribed text and acoustic-prosodic features using a convolutional neural network over energy and pitch trajectories coupled with an attention-based recurrent neural network that accepts text and prosodic features. We find that different types of acoustic-prosodic features are individually helpful, and together give statistically significant improvements in parse and disfluency detection F1 scores over a strong text-only baseline. For this study with known sentence boundaries, error analyses show that the main benefit of acoustic-prosodic features is in sentences with disfluencies, attachment decisions are most improved, and transcription errors obscure gains from prosody.Comment: Accepted in NAACL HLT 201

Similar works

Full text

Available Versions

Crossref

info:doi/10.18653%2Fv1%2Fn18-1...

Last time updated on 10/08/2021