4 research outputs found
XLS-R Deep Learning Model for Multilingual ASR on Low- Resource Languages: Indonesian, Javanese, and Sundanese
This research paper focuses on the development and evaluation of Automatic
Speech Recognition (ASR) technology using the XLS-R 300m model. The study aims
to improve ASR performance in converting spoken language into written text,
specifically for Indonesian, Javanese, and Sundanese languages. The paper
discusses the testing procedures, datasets used, and methodology employed in
training and evaluating the ASR systems. The results show that the XLS-R 300m
model achieves competitive Word Error Rate (WER) measurements, with a slight
compromise in performance for Javanese and Sundanese languages. The integration
of a 5-gram KenLM language model significantly reduces WER and enhances ASR
accuracy. The research contributes to the advancement of ASR technology by
addressing linguistic diversity and improving performance across various
languages. The findings provide insights into optimizing ASR accuracy and
applicability for diverse linguistic contexts
NusaCrowd: Open Source Initiative for Indonesian NLP Resources
We present NusaCrowd, a collaborative initiative to collect and unify
existing resources for Indonesian languages, including opening access to
previously non-public resources. Through this initiative, we have brought
together 137 datasets and 118 standardized data loaders. The quality of the
datasets has been assessed manually and automatically, and their value is
demonstrated through multiple experiments. NusaCrowd's data collection enables
the creation of the first zero-shot benchmarks for natural language
understanding and generation in Indonesian and the local languages of
Indonesia. Furthermore, NusaCrowd brings the creation of the first multilingual
automatic speech recognition benchmark in Indonesian and the local languages of
Indonesia. Our work strives to advance natural language processing (NLP)
research for languages that are under-represented despite being widely spoken
Natural Language Processing: Emerging Neural Approaches and Applications
This Special Issue highlights the most recent research being carried out in the NLP field to discuss relative open issues, with a particular focus on both emerging approaches for language learning, understanding, production, and grounding interactively or autonomously from data in cognitive and neural systems, as well as on their potential or real applications in different domains