245 research outputs found
Increase Apparent Public Speaking Fluency By Speech Augmentation
Fluent and confident speech is desirable to every speaker. But professional
speech delivering requires a great deal of experience and practice. In this
paper, we propose a speech stream manipulation system which can help
non-professional speakers to produce fluent, professional-like speech content,
in turn contributing towards better listener engagement and comprehension. We
propose to achieve this task by manipulating the disfluencies in human speech,
like the sounds 'uh' and 'um', the filler words and awkward long silences.
Given any unrehearsed speech we segment and silence the filled pauses and
doctor the duration of imposed silence as well as other long pauses
('disfluent') by a predictive model learned using professional speech dataset.
Finally, we output a audio stream in which speaker sounds more fluent,
confident and practiced compared to the original speech he/she recorded.
According to our quantitative evaluation, we significantly increase the fluency
of speech by reducing rate of pauses and fillers
Parsing Speech: A Neural Approach to Integrating Lexical and Acoustic-Prosodic Information
In conversational speech, the acoustic signal provides cues that help
listeners disambiguate difficult parses. For automatically parsing spoken
utterances, we introduce a model that integrates transcribed text and
acoustic-prosodic features using a convolutional neural network over energy and
pitch trajectories coupled with an attention-based recurrent neural network
that accepts text and prosodic features. We find that different types of
acoustic-prosodic features are individually helpful, and together give
statistically significant improvements in parse and disfluency detection F1
scores over a strong text-only baseline. For this study with known sentence
boundaries, error analyses show that the main benefit of acoustic-prosodic
features is in sentences with disfluencies, attachment decisions are most
improved, and transcription errors obscure gains from prosody.Comment: Accepted in NAACL HLT 201
Joint Learning of Correlated Sequence Labelling Tasks Using Bidirectional Recurrent Neural Networks
The stream of words produced by Automatic Speech Recognition (ASR) systems is
typically devoid of punctuations and formatting. Most natural language
processing applications expect segmented and well-formatted texts as input,
which is not available in ASR output. This paper proposes a novel technique of
jointly modeling multiple correlated tasks such as punctuation and
capitalization using bidirectional recurrent neural networks, which leads to
improved performance for each of these tasks. This method could be extended for
joint modeling of any other correlated sequence labeling tasks.Comment: Accepted in Interspeech 201
Multi-Task Self-Supervised Learning for Disfluency Detection
Most existing approaches to disfluency detection heavily rely on
human-annotated data, which is expensive to obtain in practice. To tackle the
training data bottleneck, we investigate methods for combining multiple
self-supervised tasks-i.e., supervised tasks where data can be collected
without manual labeling. First, we construct large-scale pseudo training data
by randomly adding or deleting words from unlabeled news data, and propose two
self-supervised pre-training tasks: (i) tagging task to detect the added noisy
words. (ii) sentence classification to distinguish original sentences from
grammatically-incorrect sentences. We then combine these two tasks to jointly
train a network. The pre-trained network is then fine-tuned using
human-annotated disfluency detection training data. Experimental results on the
commonly used English Switchboard test set show that our approach can achieve
competitive performance compared to the previous systems (trained using the
full dataset) by using less than 1% (1000 sentences) of the training data. Our
method trained on the full dataset significantly outperforms previous methods,
reducing the error by 21% on English Switchboard
- …