116 research outputs found
Parsing Speech: A Neural Approach to Integrating Lexical and Acoustic-Prosodic Information
In conversational speech, the acoustic signal provides cues that help
listeners disambiguate difficult parses. For automatically parsing spoken
utterances, we introduce a model that integrates transcribed text and
acoustic-prosodic features using a convolutional neural network over energy and
pitch trajectories coupled with an attention-based recurrent neural network
that accepts text and prosodic features. We find that different types of
acoustic-prosodic features are individually helpful, and together give
statistically significant improvements in parse and disfluency detection F1
scores over a strong text-only baseline. For this study with known sentence
boundaries, error analyses show that the main benefit of acoustic-prosodic
features is in sentences with disfluencies, attachment decisions are most
improved, and transcription errors obscure gains from prosody.Comment: Accepted in NAACL HLT 201
Increase Apparent Public Speaking Fluency By Speech Augmentation
Fluent and confident speech is desirable to every speaker. But professional
speech delivering requires a great deal of experience and practice. In this
paper, we propose a speech stream manipulation system which can help
non-professional speakers to produce fluent, professional-like speech content,
in turn contributing towards better listener engagement and comprehension. We
propose to achieve this task by manipulating the disfluencies in human speech,
like the sounds 'uh' and 'um', the filler words and awkward long silences.
Given any unrehearsed speech we segment and silence the filled pauses and
doctor the duration of imposed silence as well as other long pauses
('disfluent') by a predictive model learned using professional speech dataset.
Finally, we output a audio stream in which speaker sounds more fluent,
confident and practiced compared to the original speech he/she recorded.
According to our quantitative evaluation, we significantly increase the fluency
of speech by reducing rate of pauses and fillers
- …