4,069 research outputs found
Multilingual Speech Recognition With A Single End-To-End Model
Training a conventional automatic speech recognition (ASR) system to support
multiple languages is challenging because the sub-word unit, lexicon and word
inventories are typically language specific. In contrast, sequence-to-sequence
models are well suited for multilingual ASR because they encapsulate an
acoustic, pronunciation and language model jointly in a single network. In this
work we present a single sequence-to-sequence ASR model trained on 9 different
Indian languages, which have very little overlap in their scripts.
Specifically, we take a union of language-specific grapheme sets and train a
grapheme-based sequence-to-sequence model jointly on data from all languages.
We find that this model, which is not explicitly given any information about
language identity, improves recognition performance by 21% relative compared to
analogous sequence-to-sequence models trained on each language individually. By
modifying the model to accept a language identifier as an additional input
feature, we further improve performance by an additional 7% relative and
eliminate confusion between different languages.Comment: Accepted in ICASSP 201
Handwritten Character Recognition of South Indian Scripts: A Review
Handwritten character recognition is always a frontier area of research in
the field of pattern recognition and image processing and there is a large
demand for OCR on hand written documents. Even though, sufficient studies have
performed in foreign scripts like Chinese, Japanese and Arabic characters, only
a very few work can be traced for handwritten character recognition of Indian
scripts especially for the South Indian scripts. This paper provides an
overview of offline handwritten character recognition in South Indian Scripts,
namely Malayalam, Tamil, Kannada and Telungu.Comment: Paper presented on the "National Conference on Indian Language
Computing", Kochi, February 19-20, 2011. 6 pages, 5 figure
- …