3,207 research outputs found
SGGNet: Speech-Scene Graph Grounding Network for Speech-guided Navigation
The spoken language serves as an accessible and efficient interface, enabling
non-experts and disabled users to interact with complex assistant robots.
However, accurately grounding language utterances gives a significant challenge
due to the acoustic variability in speakers' voices and environmental noise. In
this work, we propose a novel speech-scene graph grounding network (SGGNet)
that robustly grounds spoken utterances by leveraging the acoustic similarity
between correctly recognized and misrecognized words obtained from automatic
speech recognition (ASR) systems. To incorporate the acoustic similarity, we
extend our previous grounding model, the scene-graph-based grounding network
(SGGNet), with the ASR model from NVIDIA NeMo. We accomplish this by feeding
the latent vector of speech pronunciations into the BERT-based grounding
network within SGGNet. We evaluate the effectiveness of using latent vectors of
speech commands in grounding through qualitative and quantitative studies. We
also demonstrate the capability of SGGNet in a speech-based navigation task
using a real quadruped robot, RBQ-3, from Rainbow Robotics.Comment: 7 pages, 6 figures, Paper accepted for the Special Session at the
2023 International Symposium on Robot and Human Interactive Communication
(RO-MAN), [Dohyun Kim, Yeseung Kim, Jaehwi Jang, and Minjae Song] contributed
equally to this wor
Towards an Indexical Model of Situated Language Comprehension for Cognitive Agents in Physical Worlds
We propose a computational model of situated language comprehension based on
the Indexical Hypothesis that generates meaning representations by translating
amodal linguistic symbols to modal representations of beliefs, knowledge, and
experience external to the linguistic system. This Indexical Model incorporates
multiple information sources, including perceptions, domain knowledge, and
short-term and long-term experiences during comprehension. We show that
exploiting diverse information sources can alleviate ambiguities that arise
from contextual use of underspecific referring expressions and unexpressed
argument alternations of verbs. The model is being used to support linguistic
interactions in Rosie, an agent implemented in Soar that learns from
instruction.Comment: Advances in Cognitive Systems 3 (2014
- …