15,402 research outputs found
Articulatory and bottleneck features for speaker-independent ASR of dysarthric speech
The rapid population aging has stimulated the development of assistive
devices that provide personalized medical support to the needies suffering from
various etiologies. One prominent clinical application is a computer-assisted
speech training system which enables personalized speech therapy to patients
impaired by communicative disorders in the patient's home environment. Such a
system relies on the robust automatic speech recognition (ASR) technology to be
able to provide accurate articulation feedback. With the long-term aim of
developing off-the-shelf ASR systems that can be incorporated in clinical
context without prior speaker information, we compare the ASR performance of
speaker-independent bottleneck and articulatory features on dysarthric speech
used in conjunction with dedicated neural network-based acoustic models that
have been shown to be robust against spectrotemporal deviations. We report ASR
performance of these systems on two dysarthric speech datasets of different
characteristics to quantify the achieved performance gains. Despite the
remaining performance gap between the dysarthric and normal speech, significant
improvements have been reported on both datasets using speaker-independent ASR
architectures.Comment: to appear in Computer Speech & Language -
https://doi.org/10.1016/j.csl.2019.05.002 - arXiv admin note: substantial
text overlap with arXiv:1807.1094
A Cross-media Retrieval System for Lecture Videos
We propose a cross-media lecture-on-demand system, in which users can
selectively view specific segments of lecture videos by submitting text
queries. Users can easily formulate queries by using the textbook associated
with a target lecture, even if they cannot come up with effective keywords. Our
system extracts the audio track from a target lecture video, generates a
transcription by large vocabulary continuous speech recognition, and produces a
text index. Experimental results showed that by adapting speech recognition to
the topic of the lecture, the recognition accuracy increased and the retrieval
accuracy was comparable with that obtained by human transcription
TheanoLM - An Extensible Toolkit for Neural Network Language Modeling
We present a new tool for training neural network language models (NNLMs),
scoring sentences, and generating text. The tool has been written using Python
library Theano, which allows researcher to easily extend it and tune any aspect
of the training process. Regardless of the flexibility, Theano is able to
generate extremely fast native code that can utilize a GPU or multiple CPU
cores in order to parallelize the heavy numerical computations. The tool has
been evaluated in difficult Finnish and English conversational speech
recognition tasks, and significant improvement was obtained over our best
back-off n-gram models. The results that we obtained in the Finnish task were
compared to those from existing RNNLM and RWTHLM toolkits, and found to be as
good or better, while training times were an order of magnitude shorter
- …