36,781 research outputs found
Interactively Picking Real-World Objects with Unconstrained Spoken Language Instructions
Comprehension of spoken natural language is an essential component for robots
to communicate with human effectively. However, handling unconstrained spoken
instructions is challenging due to (1) complex structures including a wide
variety of expressions used in spoken language and (2) inherent ambiguity in
interpretation of human instructions. In this paper, we propose the first
comprehensive system that can handle unconstrained spoken language and is able
to effectively resolve ambiguity in spoken instructions. Specifically, we
integrate deep-learning-based object detection together with natural language
processing technologies to handle unconstrained spoken instructions, and
propose a method for robots to resolve instruction ambiguity through dialogue.
Through our experiments on both a simulated environment as well as a physical
industrial robot arm, we demonstrate the ability of our system to understand
natural instructions from human operators effectively, and how higher success
rates of the object picking task can be achieved through an interactive
clarification process.Comment: 9 pages. International Conference on Robotics and Automation (ICRA)
2018. Accompanying videos are available at the following links:
https://youtu.be/_Uyv1XIUqhk (the system submitted to ICRA-2018) and
http://youtu.be/DGJazkyw0Ws (with improvements after ICRA-2018 submission
Towards an Indexical Model of Situated Language Comprehension for Cognitive Agents in Physical Worlds
We propose a computational model of situated language comprehension based on
the Indexical Hypothesis that generates meaning representations by translating
amodal linguistic symbols to modal representations of beliefs, knowledge, and
experience external to the linguistic system. This Indexical Model incorporates
multiple information sources, including perceptions, domain knowledge, and
short-term and long-term experiences during comprehension. We show that
exploiting diverse information sources can alleviate ambiguities that arise
from contextual use of underspecific referring expressions and unexpressed
argument alternations of verbs. The model is being used to support linguistic
interactions in Rosie, an agent implemented in Soar that learns from
instruction.Comment: Advances in Cognitive Systems 3 (2014
Recurrent Multimodal Interaction for Referring Image Segmentation
In this paper we are interested in the problem of image segmentation given
natural language descriptions, i.e. referring expressions. Existing works
tackle this problem by first modeling images and sentences independently and
then segment images by combining these two types of representations. We argue
that learning word-to-image interaction is more native in the sense of jointly
modeling two modalities for the image segmentation task, and we propose
convolutional multimodal LSTM to encode the sequential interactions between
individual words, visual information, and spatial information. We show that our
proposed model outperforms the baseline model on benchmark datasets. In
addition, we analyze the intermediate output of the proposed multimodal LSTM
approach and empirically explain how this approach enforces a more effective
word-to-image interaction.Comment: To appear in ICCV 2017. See http://www.cs.jhu.edu/~cxliu/ for code
and supplementary materia
Don't Blame Distributional Semantics if it can't do Entailment
Distributional semantics has had enormous empirical success in Computational
Linguistics and Cognitive Science in modeling various semantic phenomena, such
as semantic similarity, and distributional models are widely used in
state-of-the-art Natural Language Processing systems. However, the theoretical
status of distributional semantics within a broader theory of language and
cognition is still unclear: What does distributional semantics model? Can it
be, on its own, a fully adequate model of the meanings of linguistic
expressions? The standard answer is that distributional semantics is not fully
adequate in this regard, because it falls short on some of the central aspects
of formal semantic approaches: truth conditions, entailment, reference, and
certain aspects of compositionality. We argue that this standard answer rests
on a misconception: These aspects do not belong in a theory of expression
meaning, they are instead aspects of speaker meaning, i.e., communicative
intentions in a particular context. In a slogan: words do not refer, speakers
do. Clearing this up enables us to argue that distributional semantics on its
own is an adequate model of expression meaning. Our proposal sheds light on the
role of distributional semantics in a broader theory of language and cognition,
its relationship to formal semantics, and its place in computational models.Comment: To appear in Proceedings of the 13th International Conference on
Computational Semantics (IWCS 2019), Gothenburg, Swede
- …