17,408 research outputs found
VirtualHome: Simulating Household Activities via Programs
In this paper, we are interested in modeling complex activities that occur in
a typical household. We propose to use programs, i.e., sequences of atomic
actions and interactions, as a high level representation of complex tasks.
Programs are interesting because they provide a non-ambiguous representation of
a task, and allow agents to execute them. However, nowadays, there is no
database providing this type of information. Towards this goal, we first
crowd-source programs for a variety of activities that happen in people's
homes, via a game-like interface used for teaching kids how to code. Using the
collected dataset, we show how we can learn to extract programs directly from
natural language descriptions or from videos. We then implement the most common
atomic (inter)actions in the Unity3D game engine, and use our programs to
"drive" an artificial agent to execute tasks in a simulated household
environment. Our VirtualHome simulator allows us to create a large activity
video dataset with rich ground-truth, enabling training and testing of video
understanding models. We further showcase examples of our agent performing
tasks in our VirtualHome based on language descriptions.Comment: CVPR 2018 (Oral
Recommended from our members
A corpus-based analysis of route instructions in human-robot interaction
This paper investigates how users employ spatial descriptions to navigate a speech-enabled robot. We created a simulated environment in which users gave route instructions in a dialogic real-time interaction with a robot, which was
operated by naĂŻve participants. The ability of robot monitoring was also manipulated in two experimental conditions. The results provide evidence that the content of the instructions and strategies of the users vary depending on the conditions and
demands of the interaction. As expected, the route instructions frequently were underspecified and arbitrary. The findings of
this study elucidate the complexity in interpreting spatial language in HRI. However, they also point to the need for
endowing mobile robots with richer dialogue resources to compensate for the uncertainties arising from language as well
as the environment
Improving Natural Language Interaction with Robots Using Advice
Over the last few years, there has been growing interest in learning models
for physically grounded language understanding tasks, such as the popular
blocks world domain. These works typically view this problem as a single-step
process, in which a human operator gives an instruction and an automated agent
is evaluated on its ability to execute it. In this paper we take the first step
towards increasing the bandwidth of this interaction, and suggest a protocol
for including advice, high-level observations about the task, which can help
constrain the agent's prediction. We evaluate our approach on the blocks world
task, and show that even simple advice can help lead to significant performance
improvements. To help reduce the effort involved in supplying the advice, we
also explore model self-generated advice which can still improve results.Comment: Accepted as a short paper at NAACL 2019 (8 pages
Exploring miscommunication and collaborative behaviour in human-robot interaction
This paper presents the first step in designing a speech-enabled robot that is capable of natural management of miscommunication. It describes the methods
and results of two WOz studies, in which
dyads of naĂŻve participants interacted in a
collaborative task. The first WOz study
explored human miscommunication
management. The second study investigated
how shared visual space and monitoring
shape the processes of feedback and communication in task-oriented interactions.
The results provide insights for the development of human-inspired and
robust natural language interfaces in robots
Towards an Indexical Model of Situated Language Comprehension for Cognitive Agents in Physical Worlds
We propose a computational model of situated language comprehension based on
the Indexical Hypothesis that generates meaning representations by translating
amodal linguistic symbols to modal representations of beliefs, knowledge, and
experience external to the linguistic system. This Indexical Model incorporates
multiple information sources, including perceptions, domain knowledge, and
short-term and long-term experiences during comprehension. We show that
exploiting diverse information sources can alleviate ambiguities that arise
from contextual use of underspecific referring expressions and unexpressed
argument alternations of verbs. The model is being used to support linguistic
interactions in Rosie, an agent implemented in Soar that learns from
instruction.Comment: Advances in Cognitive Systems 3 (2014
- …