163,714 research outputs found
Breaking Language Barriers: A Question Answering Dataset for Hindi and Marathi
The recent advances in deep-learning have led to the development of highly
sophisticated systems with an unquenchable appetite for data. On the other
hand, building good deep-learning models for low-resource languages remains a
challenging task. This paper focuses on developing a Question Answering dataset
for two such languages- Hindi and Marathi. Despite Hindi being the 3rd most
spoken language worldwide, with 345 million speakers, and Marathi being the
11th most spoken language globally, with 83.2 million speakers, both languages
face limited resources for building efficient Question Answering systems. To
tackle the challenge of data scarcity, we have developed a novel approach for
translating the SQuAD 2.0 dataset into Hindi and Marathi. We release the
largest Question-Answering dataset available for these languages, with each
dataset containing 28,000 samples. We evaluate the dataset on various
architectures and release the best-performing models for both Hindi and
Marathi, which will facilitate further research in these languages. Leveraging
similarity tools, our method holds the potential to create datasets in diverse
languages, thereby enhancing the understanding of natural language across
varied linguistic contexts. Our fine-tuned models, code, and dataset will be
made publicly available
Explaining Speech Classification Models via Word-Level Audio Segments and Paralinguistic Features
Recent advances in eXplainable AI (XAI) have provided new insights into how
models for vision, language, and tabular data operate. However, few approaches
exist for understanding speech models. Existing work focuses on a few spoken
language understanding (SLU) tasks, and explanations are difficult to interpret
for most users. We introduce a new approach to explain speech classification
models. We generate easy-to-interpret explanations via input perturbation on
two information levels. 1) Word-level explanations reveal how each word-related
audio segment impacts the outcome. 2) Paralinguistic features (e.g., prosody
and background noise) answer the counterfactual: ``What would the model
prediction be if we edited the audio signal in this way?'' We validate our
approach by explaining two state-of-the-art SLU models on two speech
classification tasks in English and Italian. Our findings demonstrate that the
explanations are faithful to the model's inner workings and plausible to
humans. Our method and findings pave the way for future research on
interpreting speech models.Comment: 8 page
DRAGON: A Dialogue-Based Robot for Assistive Navigation with Visual Language Grounding
Persons with visual impairments (PwVI) have difficulties understanding and
navigating spaces around them. Current wayfinding technologies either focus
solely on navigation or provide limited communication about the environment.
Motivated by recent advances in visual-language grounding and semantic
navigation, we propose DRAGON, a guiding robot powered by a dialogue system and
the ability to associate the environment with natural language. By
understanding the commands from the user, DRAGON is able to guide the user to
the desired landmarks on the map, describe the environment, and answer
questions from visual observations. Through effective utilization of dialogue,
the robot can ground the user's free-form descriptions to landmarks in the
environment, and give the user semantic information through spoken language. We
conduct a user study with blindfolded participants in an everyday indoor
environment. Our results demonstrate that DRAGON is able to communicate with
the user smoothly, provide a good guiding experience, and connect users with
their surrounding environment in an intuitive manner.Comment: Webpage and videos are at
https://sites.google.com/view/dragon-wayfinding/hom
Inspecting Spoken Language Understanding from Kids for Basic Math Learning at Home
Enriching the quality of early childhood education with interactive math
learning at home systems, empowered by recent advances in conversational AI
technologies, is slowly becoming a reality. With this motivation, we implement
a multimodal dialogue system to support play-based learning experiences at
home, guiding kids to master basic math concepts. This work explores Spoken
Language Understanding (SLU) pipeline within a task-oriented dialogue system
developed for Kid Space, with cascading Automatic Speech Recognition (ASR) and
Natural Language Understanding (NLU) components evaluated on our home
deployment data with kids going through gamified math learning activities. We
validate the advantages of a multi-task architecture for NLU and experiment
with a diverse set of pretrained language representations for Intent
Recognition and Entity Extraction tasks in the math learning domain. To
recognize kids' speech in realistic home environments, we investigate several
ASR systems, including the commercial Google Cloud and the latest open-source
Whisper solutions with varying model sizes. We evaluate the SLU pipeline by
testing our best-performing NLU models on noisy ASR output to inspect the
challenges of understanding children for math learning in authentic homes.Comment: Proceedings of the 18th Workshop on Innovative Use of NLP for
Building Educational Applications (BEA) at ACL 202
- …