286 research outputs found
Multilingual Speech Recognition With A Single End-To-End Model
Training a conventional automatic speech recognition (ASR) system to support
multiple languages is challenging because the sub-word unit, lexicon and word
inventories are typically language specific. In contrast, sequence-to-sequence
models are well suited for multilingual ASR because they encapsulate an
acoustic, pronunciation and language model jointly in a single network. In this
work we present a single sequence-to-sequence ASR model trained on 9 different
Indian languages, which have very little overlap in their scripts.
Specifically, we take a union of language-specific grapheme sets and train a
grapheme-based sequence-to-sequence model jointly on data from all languages.
We find that this model, which is not explicitly given any information about
language identity, improves recognition performance by 21% relative compared to
analogous sequence-to-sequence models trained on each language individually. By
modifying the model to accept a language identifier as an additional input
feature, we further improve performance by an additional 7% relative and
eliminate confusion between different languages.Comment: Accepted in ICASSP 201
Affect Recognition in Human Emotional Speech using Probabilistic Support Vector Machines
The problem of inferring human emotional state automatically from speech has become one of the central problems in Man Machine Interaction (MMI). Though Support Vector Machines (SVMs) were used in several worksfor emotion recognition from speech, the potential of using probabilistic SVMs for this task is not explored. The emphasis of the current work is on how to use probabilistic SVMs for the efficient recognition of emotions from speech. Emotional speech corpuses for two Dravidian languages- Telugu & Tamil- were constructed for assessing the recognition accuracy of Probabilistic SVMs. Recognition accuracy of the proposed model is analyzed using both Telugu and Tamil emotional speech corpuses and compared with three of the existing works. Experimental results indicated that the proposed model is significantly better compared with the existing methods
SPRING-INX: A Multilingual Indian Language Speech Corpus by SPRING Lab, IIT Madras
India is home to a multitude of languages of which 22 languages are
recognised by the Indian Constitution as official. Building speech based
applications for the Indian population is a difficult problem owing to limited
data and the number of languages and accents to accommodate. To encourage the
language technology community to build speech based applications in Indian
languages, we are open sourcing SPRING-INX data which has about 2000 hours of
legally sourced and manually transcribed speech data for ASR system building in
Assamese, Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Odia, Punjabi
and Tamil. This endeavor is by SPRING Lab , Indian Institute of Technology
Madras and is a part of National Language Translation Mission (NLTM), funded by
the Indian Ministry of Electronics and Information Technology (MeitY),
Government of India. We describe the data collection and data cleaning process
along with the data statistics in this paper.Comment: 3 pages, About SPRING-INX Dat
- …