2,860 research outputs found
Non-native children speech recognition through transfer learning
This work deals with non-native children's speech and investigates both
multi-task and transfer learning approaches to adapt a multi-language Deep
Neural Network (DNN) to speakers, specifically children, learning a foreign
language. The application scenario is characterized by young students learning
English and German and reading sentences in these second-languages, as well as
in their mother language. The paper analyzes and discusses techniques for
training effective DNN-based acoustic models starting from children native
speech and performing adaptation with limited non-native audio material. A
multi-lingual model is adopted as baseline, where a common phonetic lexicon,
defined in terms of the units of the International Phonetic Alphabet (IPA), is
shared across the three languages at hand (Italian, German and English); DNN
adaptation methods based on transfer learning are evaluated on significant
non-native evaluation sets. Results show that the resulting non-native models
allow a significant improvement with respect to a mono-lingual system adapted
to speakers of the target language
Articulatory and bottleneck features for speaker-independent ASR of dysarthric speech
The rapid population aging has stimulated the development of assistive
devices that provide personalized medical support to the needies suffering from
various etiologies. One prominent clinical application is a computer-assisted
speech training system which enables personalized speech therapy to patients
impaired by communicative disorders in the patient's home environment. Such a
system relies on the robust automatic speech recognition (ASR) technology to be
able to provide accurate articulation feedback. With the long-term aim of
developing off-the-shelf ASR systems that can be incorporated in clinical
context without prior speaker information, we compare the ASR performance of
speaker-independent bottleneck and articulatory features on dysarthric speech
used in conjunction with dedicated neural network-based acoustic models that
have been shown to be robust against spectrotemporal deviations. We report ASR
performance of these systems on two dysarthric speech datasets of different
characteristics to quantify the achieved performance gains. Despite the
remaining performance gap between the dysarthric and normal speech, significant
improvements have been reported on both datasets using speaker-independent ASR
architectures.Comment: to appear in Computer Speech & Language -
https://doi.org/10.1016/j.csl.2019.05.002 - arXiv admin note: substantial
text overlap with arXiv:1807.1094
Pronunciation variations and context-dependent model to improve ASR performance for dyslexic children’s read speech
Focusing on the key element for an ASR-based application for dyslexic children reading isolated words in Bahasa Melayu, this paper can be an evidence of the need to have a carefully designed acoustic model for a satisfying recognition accuracy of 79.17% on test dataset. Pronunciation variations and context-dependent model are two main components of such acoustic model. This model adopts the most frequent errors in reading selected vocabulary, which are obtained from primary data collection and analysis.The analysis gives the most frequent spelling and reading errors as vowel substitution with over 20% of total errors made
Automatic assessment of spoken language proficiency of non-native children
This paper describes technology developed to automatically grade Italian
students (ages 9-16) on their English and German spoken language proficiency.
The students' spoken answers are first transcribed by an automatic speech
recognition (ASR) system and then scored using a feedforward neural network
(NN) that processes features extracted from the automatic transcriptions.
In-domain acoustic models, employing deep neural networks (DNNs), are derived
by adapting the parameters of an original out of domain DNN
Acoustic variability and automatic recognition of children’s speech
International audienc
- …