210 research outputs found
Annotated Speech Corpus for Low Resource Indian Languages: Awadhi, Bhojpuri, Braj and Magahi
In this paper we discuss an in-progress work on the development of a speech
corpus for four low-resource Indo-Aryan languages -- Awadhi, Bhojpuri, Braj and
Magahi using the field methods of linguistic data collection. The total size of
the corpus currently stands at approximately 18 hours (approx. 4-5 hours each
language) and it is transcribed and annotated with grammatical information such
as part-of-speech tags, morphological features and Universal dependency
relationships. We discuss our methodology for data collection in these
languages, most of which was done in the middle of the COVID-19 pandemic, with
one of the aims being to generate some additional income for low-income groups
speaking these languages. In the paper, we also discuss the results of the
baseline experiments for automatic speech recognition system in these
languages.Comment: Speech for Social Good Workshop, 2022, Interspeech 202
Targeted Subset Selection for Limited-data ASR Accent Adaptation
We study the task of adapting an existing ASR model to a non-native accent
while being constrained by a transcription budget on the duration of utterances
selected from a large unlabeled corpus. We propose a subset selection approach
using the recently proposed submodular mutual information functions, in which
we identify a diverse set of utterances that match the target accent. This is
specified through a few target utterances and achieved by modelling the
relationship between the target and the selected subsets using these functions.
The model adapts to the accent through fine-tuning with utterances selected and
transcribed from the unlabeled corpus. We also use an accent classifier to
learn accent-aware feature representations. Our method is also able to exploit
samples from other accents to perform out-of-domain selections for low-resource
accents which are not available in these corpora. We show that the targeted
subset selection approach improves significantly upon random sampling - by
around 5% to 10% (absolute) in most cases, and is around 10x more
label-efficient. We also compare with an oracle method where we specifically
pick from the target accent and our method is comparable to the oracle in its
selections and WER performance.Comment: Under review (INTERSPEECH 2022
Transfer learning of language-independent end-to-end ASR with language model fusion
This work explores better adaptation methods to low-resource languages using
an external language model (LM) under the framework of transfer learning. We
first build a language-independent ASR system in a unified sequence-to-sequence
(S2S) architecture with a shared vocabulary among all languages. During
adaptation, we perform LM fusion transfer, where an external LM is integrated
into the decoder network of the attention-based S2S model in the whole
adaptation stage, to effectively incorporate linguistic context of the target
language. We also investigate various seed models for transfer learning.
Experimental evaluations using the IARPA BABEL data set show that LM fusion
transfer improves performances on all target five languages compared with
simple transfer learning when the external text data is available. Our final
system drastically reduces the performance gap from the hybrid systems.Comment: Accepted at ICASSP201
Combining tandem and hybrid systems for improved speech recognition and keyword spotting on low resource languages
Copyright © 2014 ISCA. In recent years there has been significant interest in Automatic Speech Recognition (ASR) and KeyWord Spotting (KWS) systems for low resource languages. One of the driving forces for this research direction is the IARPA Babel project. This paper examines the performance gains that can be obtained by combining two forms of deep neural network ASR systems, Tandem and Hybrid, for both ASR and KWS using data released under the Babel project. Baseline systems are described for the five option period 1 languages: Assamese; Bengali; Haitian Creole; Lao; and Zulu. All the ASR systems share common attributes, for example deep neural network configurations, and decision trees based on rich phonetic questions and state-position root nodes. The baseline ASR and KWS performance of Hybrid and Tandem systems are compared for both the "full", approximately 80 hours of training data, and limited, approximately 10 hours of training data, language packs. By combining the two systems together consistent performance gains can be obtained for KWS in all configurations
North East Indian Linguistics 8 (NEIL 8)
This is the eighth volume of North East Indian Linguistics, a series of volumes for publishing
current research on the languages of North East India, the first volume of which was
published in 2008. The papers in this volume were presented at the 9th conference of the
North East Indian Linguistics Society (NEILS), held at Tezpur University in February 2016.
The papers for this anniversary volume continue the NEILS tradition of research by both local
and international scholars on a wide range of languages and topics. This eighth volume
includes papers on small community languages and large regional languages from across
North East India, and present detailed phonological, semantic and morphosyntactic studies of
structures that are characteristic of particular languages or language groups alongside
sociolinguistic studies that explore language attitudes in contexts of language shift
- …