Search CORE

26,132 research outputs found

Embodied Question Answering

Author: Batra Dhruv
Das Abhishek
Datta Samyak
Gkioxari Georgia
Lee Stefan
Parikh Devi
Publication venue
Publication date: 01/12/2017
Field of study

We present a new AI task -- Embodied Question Answering (EmbodiedQA) -- where an agent is spawned at a random location in a 3D environment and asked a question ("What color is the car?"). In order to answer, the agent must first intelligently navigate to explore the environment, gather information through first-person (egocentric) vision, and then answer the question ("orange"). This challenging task requires a range of AI skills -- active perception, language understanding, goal-driven navigation, commonsense reasoning, and grounding of language into actions. In this work, we develop the environments, end-to-end-trained reinforcement learning agents, and evaluation protocols for EmbodiedQA.Comment: 20 pages, 13 figures, Webpage: https://embodiedqa.org

arXiv.org e-Print Archive

Crossref

ANGELICA : choice of output modality in an embodied agent

Author: Theune Mariët
Publication venue
Publication date: 01/01/2001
Field of study

The ANGELICA project addresses the problem of modality choice in information presentation by embodied, humanlike agents. The output modalities available to such agents include both language and various nonverbal signals such as pointing and gesturing. For each piece of information to be presented by the agent it must be decided whether it should be expressed using language, a nonverbal signal, or both. In the ANGELICA project a model of the different factors influencing this choice will be developed and integrated in a natural language generation system. The application domain is the presentation of route descriptions by an embodied agent in a 3D environment. Evaluation and testing form an integral part of the project. In particular, we will investigate the effect of different modality choices on the effectiveness and naturalness of the generated presentations and on the user's perception of the agent's personality

University of Twente Research Information

Maps, agents and dialogue for exploring a virtual world

Author: Dijk E.M.A.G. van
Nijholt A.
Zwiers J.
Publication venue: International Institute of Informatics and Systemics (IIIS)
Publication date: 01/01/2001
Field of study

In previous years we have been involved in several projects in which users (or visitors) had to find their way in information-rich virtual environments. 'Information-rich' means that the users do not know beforehand what is available in the environment, where to go in the environment to find the information and, moreover, users or visitors do not necessarily know exactly what they are looking for. Information-rich means also that the information may change during time. A second visit to the same environment will require different behavior of the visitor in order for him or her to obtain similar information than was available during a previous visit. In this paper we report about two projects and discuss our attempts to generalize from the different approaches and application domains to obtain a library of methods and tools to design and implement intelligent agents that inhabit virtual environments and where the agents support the navigation of the user/visitor

CiteSeerX

University of Twente Research Information

Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments

Author: Anderson Peter
Bruce Jake
Gould Stephen
Hengel Anton van den
Johnson Mark
Reid Ian
Sünderhauf Niko
Teney Damien
Wu Qi
Publication venue
Publication date: 01/01/2018
Field of study

A robot that can carry out a natural-language instruction has been a dream since before the Jetsons cartoon series imagined a life of leisure mediated by a fleet of attentive robot helpers. It is a dream that remains stubbornly distant. However, recent advances in vision and language methods have made incredible progress in closely related areas. This is significant because a robot interpreting a natural-language navigation instruction on the basis of what it sees is carrying out a vision and language process that is similar to Visual Question Answering. Both tasks can be interpreted as visually grounded sequence-to-sequence translation problems, and many of the same methods are applicable. To enable and encourage the application of vision and language methods to the problem of interpreting visually-grounded navigation instructions, we present the Matterport3D Simulator -- a large-scale reinforcement learning environment based on real imagery. Using this simulator, which can in future support a range of embodied vision and language tasks, we provide the first benchmark dataset for visually-grounded natural language navigation in real buildings -- the Room-to-Room (R2R) dataset.Comment: CVPR 2018 Spotlight presentatio

arXiv.org e-Print Archive

Crossref

Adelaide Research & Scholarship

Queensland University of Technology ePrints Archive

The Australian National University