Search CORE

50,701 research outputs found

Embodied Question Answering

Author: Batra Dhruv
Das Abhishek
Datta Samyak
Gkioxari Georgia
Lee Stefan
Parikh Devi
Publication venue
Publication date: 01/12/2017
Field of study

We present a new AI task -- Embodied Question Answering (EmbodiedQA) -- where an agent is spawned at a random location in a 3D environment and asked a question ("What color is the car?"). In order to answer, the agent must first intelligently navigate to explore the environment, gather information through first-person (egocentric) vision, and then answer the question ("orange"). This challenging task requires a range of AI skills -- active perception, language understanding, goal-driven navigation, commonsense reasoning, and grounding of language into actions. In this work, we develop the environments, end-to-end-trained reinforcement learning agents, and evaluation protocols for EmbodiedQA.Comment: 20 pages, 13 figures, Webpage: https://embodiedqa.org

arXiv.org e-Print Archive

Crossref

Recommended from our members

Role of three-dimensional virtual environments in the globalisation of science education

Author: Minocha Shailey
Publication venue
Publication date: 14/03/2012
Field of study

In this poster, we illustrate how 3D virtual environments can facilitate science education in distance and blended education contexts, and can support collaboration amongst students and educators in geographically distributed settings and in different institutions. Three-dimensional (3D) virtual environments, also called synthetic worlds, are multimedia, simulated environments, often managed over the web, which users can ‘inhabit’ and interact via their graphical self-representations known as ‘avatars’. In a 3D virtual environment, the users, represented as avatars, experience others as being present in the same environment even though they may be geographically distributed. Users converse in real time through gestures, audio, text-based chat, and instant messaging. Three-dimensional virtual environments support synchronous communication and collaboration more effectively than two-dimensional (2D) web-based environments: by extending the user’s ability to employ traditional communication cues of face-to-face interactions, and having a sense of presence and a sense of place in a way that 2D environments do not. A 3D environment can enable students to carry out a range of authentic and practical scientific enquiries: interacting with 3D models, participating in virtual field trips; learning to control instruments; assembling apparatus and instruments; and creating 3D models. The social aspects of a 3D environment support scientific discourse and dialogues at different levels. For example, in an avatar-based 3D virtual world, the avatar can navigate, encounter other avatars, and communicate with them in real-time through gestures, voice, text, and instant messaging. They can critique experimental designs, compare results, share good practice, and look over each other’s work just as one would do in a real-life laboratory. The sense of working together in a place with other avatars provides an immersive experience that drives sustained engagement and aids visual memory

Open Research Online (The Open University)

ViZDoom Competitions: Playing Doom from Pixels

Author: Jaśkowski Wojciech
Kempka Michał
Wydmuch Marek
Publication venue
Publication date: 10/09/2018
Field of study

This paper presents the first two editions of Visual Doom AI Competition, held in 2016 and 2017. The challenge was to create bots that compete in a multi-player deathmatch in a first-person shooter (FPS) game, Doom. The bots had to make their decisions based solely on visual information, i.e., a raw screen buffer. To play well, the bots needed to understand their surroundings, navigate, explore, and handle the opponents at the same time. These aspects, together with the competitive multi-agent aspect of the game, make the competition a unique platform for evaluating the state of the art reinforcement learning algorithms. The paper discusses the rules, solutions, results, and statistics that give insight into the agents' behaviors. Best-performing agents are described in more detail. The results of the competition lead to the conclusion that, although reinforcement learning can produce capable Doom bots, they still are not yet able to successfully compete against humans in this game. The paper also revisits the ViZDoom environment, which is a flexible, easy to use, and efficient 3D platform for research for vision-based reinforcement learning, based on a well-recognized first-person perspective game Doom

arXiv.org e-Print Archive

Motion Planning Among Dynamic, Decision-Making Agents with Deep Reinforcement Learning

Author: Chen Yu Fan
Everett Michael
How Jonathan P.
Publication venue
Publication date: 04/05/2018
Field of study

Robots that navigate among pedestrians use collision avoidance algorithms to enable safe and efficient operation. Recent works present deep reinforcement learning as a framework to model the complex interactions and cooperation. However, they are implemented using key assumptions about other agents' behavior that deviate from reality as the number of agents in the environment increases. This work extends our previous approach to develop an algorithm that learns collision avoidance among a variety of types of dynamic agents without assuming they follow any particular behavior rules. This work also introduces a strategy using LSTM that enables the algorithm to use observations of an arbitrary number of other agents, instead of previous methods that have a fixed observation size. The proposed algorithm outperforms our previous approach in simulation as the number of agents increases, and the algorithm is demonstrated on a fully autonomous robotic vehicle traveling at human walking speed, without the use of a 3D Lidar

arXiv.org e-Print Archive

Crossref

DSpace@MIT

How does the design of landmarks on a mobile map influence wayfinding experts’ spatial learning during a real-world navigation task?

Author: Cheng Bingjie
Fabrikant Sara I
Hilton Christopher
Kapaj Armand
Lanini-Maggi Sara
Publication venue: Taylor & Francis
Publication date: 07/03/2023
Field of study

Humans increasingly rely on GPS-enabled mobile maps to navigate novel environments. However, this reliance can negatively affect spatial learning, which can be detrimental even for expert navigators such as search and rescue personnel. Landmark visualization has been shown to improve spatial learning in general populations by facilitating object identification between the map and the environment. How landmark visualization supports expert users’ spatial learning during map-assisted navigation is still an open research question. We thus conducted a real-world study with wayfinding experts in an unknown residential neighborhood. We aimed to assess how two different landmark visualization styles (abstract 2D vs. realistic 3D buildings) would affect experts’ spatial learning in a map-assisted navigation task during an emergency scenario. Using a between-subjects design, we asked Swiss military personnel to follow a given route using a mobile map, and to identify five task-relevant landmarks along the route. We recorded experts’ gaze behavior while navigating and examined their spatial learning after the navigation task. We found that experts’ spatial learning improved when they focused their visual attention on the environment, but the direction of attention between the map and the environment was not affected by the landmark visualization style. Further, there was no difference in spatial learning between the 2D and 3D groups. Contrary to previous research with general populations, this study suggests that the landmark visualization style does not enhance expert navigators’ navigation or spatial learning abilities, thus highlighting the need for population-specific mobile map design solutions

ZORA

Two-stage visual navigation by deep neural networks and multi-goal reinforcement learning

Author: Bidoia Francesco
Chong Yiebo
Kuiper Cornel
Schomaker Lambert
Shantia Amirhossein
Timmers Rik
Wiering Marco
Publication venue: 'Elsevier BV'
Publication date: 01/04/2021
Field of study

In this paper, we propose a two-stage learning framework for visual navigation in which the experience of the agent during exploration of one goal is shared to learn to navigate to other goals. We train a deep neural network for estimating the robot's position in the environment using ground truth information provided by a classical localization and mapping approach. The second simpler multi-goal Q-function learns to traverse the environment by using the provided discretized map. Transfer learning is applied to the multi-goal Q-function from a maze structure to a 2D simulator and is finally deployed in a 3D simulator where the robot uses the estimated locations from the position estimator deep network. In the experiments, we first compare different architectures to select the best deep network for location estimation, and then compare the effects of the multi-goal reinforcement learning method to traditional reinforcement learning. The results show a significant improvement when multi-goal reinforcement learning is used. Furthermore, the results of the location estimator show that a deep network can learn and generalize in different environments using camera images with high accuracy in both position and orientation

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Machine Learning in Robotic Navigation:Deep Visual Localization and Adaptive Control

Author: Shantia Amir
Publication venue: 'University of Groningen Press'
Publication date: 01/01/2021
Field of study

The work conducted in this thesis contributes to the robotic navigation field by focusing on different machine learning solutions: supervised learning with (deep) neural networks, unsupervised learning, and reinforcement learning.First, we propose a semi-supervised machine learning approach that can dynamically update the robot controller's parameters using situational analysis through feature extraction and unsupervised clustering. The results show that the robot can adapt to the changes in its surroundings, resulting in a thirty percent improvement in navigation speed and stability.Then, we train multiple deep neural networks for estimating the robot's position in the environment using ground truth information provided by a classical localization and mapping approach. We prepare two image-based localization datasets in 3D simulation and compare the results of a traditional multilayer perceptron, a stacked denoising autoencoder, and a convolutional neural network (CNN). The experiment results show that our proposed inception based CNNs without pooling layers perform very well in all the environments. Finally, we propose a two-stage learning framework for visual navigation in which the experience of the agent during exploration of one goal is shared to learn to navigate to other goals. The multi-goal Q-function learns to traverse the environment by using the provided discretized map. Transfer learning is applied to the multi-goal Q-function from a maze structure to a 2D simulator and is finally deployed in a 3D simulator where the robot uses the estimated locations from the position estimator deep CNNs. The results show a significant improvement when multi-goal reinforcement learning is used

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen