87 research outputs found
Semi-parametric Topological Memory for Navigation
We introduce a new memory architecture for navigation in previously unseen
environments, inspired by landmark-based navigation in animals. The proposed
semi-parametric topological memory (SPTM) consists of a (non-parametric) graph
with nodes corresponding to locations in the environment and a (parametric)
deep network capable of retrieving nodes from the graph based on observations.
The graph stores no metric information, only connectivity of locations
corresponding to the nodes. We use SPTM as a planning module in a navigation
system. Given only 5 minutes of footage of a previously unseen maze, an
SPTM-based navigation agent can build a topological map of the environment and
use it to confidently navigate towards goals. The average success rate of the
SPTM agent in goal-directed navigation across test environments is higher than
the best-performing baseline by a factor of three. A video of the agent is
available at https://youtu.be/vRF7f4lhswoComment: Published at International Conference on Learning Representations
(ICLR) 2018. Project website at https://sites.google.com/view/SPT
Learning to Navigate in Indoor Environments: from Memorizing to Reasoning
Autonomous navigation is an essential capability of smart mobility for mobile
robots. Traditional methods must have the environment map to plan a
collision-free path in workspace. Deep reinforcement learning (DRL) is a
promising technique to realize the autonomous navigation task without a map,
with which deep neural network can fit the mapping from observation to
reasonable action through explorations. It should not only memorize the trained
target, but more importantly, the planner can reason out the unseen goal. We
proposed a new motion planner based on deep reinforcement learning that can
arrive at new targets that have not been trained before in the indoor
environment with RGB image and odometry only. The model has a structure of
stacked Long Short-Term memory (LSTM). Finally, experiments were implemented in
both simulated and real environments. The source code is available:
https://github.com/marooncn/navbot
Expert-augmented actor-critic for ViZDoom and Montezumas Revenge
We propose an expert-augmented actor-critic algorithm, which we evaluate on
two environments with sparse rewards: Montezumas Revenge and a demanding maze
from the ViZDoom suite. In the case of Montezumas Revenge, an agent trained
with our method achieves very good results consistently scoring above 27,000
points (in many experiments beating the first world). With an appropriate
choice of hyperparameters, our algorithm surpasses the performance of the
expert data. In a number of experiments, we have observed an unreported bug in
Montezumas Revenge which allowed the agent to score more than 800,000 points
On Evaluation of Embodied Navigation Agents
Skillful mobile operation in three-dimensional environments is a primary
topic of study in Artificial Intelligence. The past two years have seen a surge
of creative work on navigation. This creative output has produced a plethora of
sometimes incompatible task definitions and evaluation protocols. To coordinate
ongoing and future research in this area, we have convened a working group to
study empirical methodology in navigation research. The present document
summarizes the consensus recommendations of this working group. We discuss
different problem statements and the role of generalization, present evaluation
measures, and provide standard scenarios that can be used for benchmarking.Comment: Report of a working group on empirical methodology in navigation
research. Authors are listed in alphabetical orde
Floyd-Warshall Reinforcement Learning: Learning from Past Experiences to Reach New Goals
Consider mutli-goal tasks that involve static environments and dynamic goals.
Examples of such tasks, such as goal-directed navigation and pick-and-place in
robotics, abound. Two types of Reinforcement Learning (RL) algorithms are used
for such tasks: model-free or model-based. Each of these approaches has
limitations. Model-free RL struggles to transfer learned information when the
goal location changes, but achieves high asymptotic accuracy in single goal
tasks. Model-based RL can transfer learned information to new goal locations by
retaining the explicitly learned state-dynamics, but is limited by the fact
that small errors in modelling these dynamics accumulate over long-term
planning. In this work, we improve upon the limitations of model-free RL in
multi-goal domains. We do this by adapting the Floyd-Warshall algorithm for RL
and call the adaptation Floyd-Warshall RL (FWRL). The proposed algorithm learns
a goal-conditioned action-value function by constraining the value of the
optimal path between any two states to be greater than or equal to the value of
paths via intermediary states. Experimentally, we show that FWRL is more
sample-efficient and learns higher reward strategies in multi-goal tasks as
compared to Q-learning, model-based RL and other relevant baselines in a
tabular domain
Playing hard exploration games by watching YouTube
Deep reinforcement learning methods traditionally struggle with tasks where
environment rewards are particularly sparse. One successful method of guiding
exploration in these domains is to imitate trajectories provided by a human
demonstrator. However, these demonstrations are typically collected under
artificial conditions, i.e. with access to the agent's exact environment setup
and the demonstrator's action and reward trajectories. Here we propose a
two-stage method that overcomes these limitations by relying on noisy,
unaligned footage without access to such data. First, we learn to map unaligned
videos from multiple sources to a common representation using self-supervised
objectives constructed over both time and modality (i.e. vision and sound).
Second, we embed a single YouTube video in this representation to construct a
reward function that encourages an agent to imitate human gameplay. This method
of one-shot imitation allows our agent to convincingly exceed human-level
performance on the infamously hard exploration games Montezuma's Revenge,
Pitfall! and Private Eye for the first time, even if the agent is not presented
with any environment rewards
Learning to Learn How to Learn: Self-Adaptive Visual Navigation Using Meta-Learning
Learning is an inherently continuous phenomenon. When humans learn a new task
there is no explicit distinction between training and inference. As we learn a
task, we keep learning about it while performing the task. What we learn and
how we learn it varies during different stages of learning. Learning how to
learn and adapt is a key property that enables us to generalize effortlessly to
new settings. This is in contrast with conventional settings in machine
learning where a trained model is frozen during inference. In this paper we
study the problem of learning to learn at both training and test time in the
context of visual navigation. A fundamental challenge in navigation is
generalization to unseen scenes. In this paper we propose a self-adaptive
visual navigation method (SAVN) which learns to adapt to new environments
without any explicit supervision. Our solution is a meta-reinforcement learning
approach where an agent learns a self-supervised interaction loss that
encourages effective navigation. Our experiments, performed in the AI2-THOR
framework, show major improvements in both success rate and SPL for visual
navigation in novel scenes. Our code and data are available at:
https://github.com/allenai/savn
Visual Semantic Navigation using Scene Priors
How do humans navigate to target objects in novel scenes? Do we use the
semantic/functional priors we have built over years to efficiently search and
navigate? For example, to search for mugs, we search cabinets near the coffee
machine and for fruits we try the fridge. In this work, we focus on
incorporating semantic priors in the task of semantic navigation. We propose to
use Graph Convolutional Networks for incorporating the prior knowledge into a
deep reinforcement learning framework. The agent uses the features from the
knowledge graph to predict the actions. For evaluation, we use the AI2-THOR
framework. Our experiments show how semantic knowledge improves performance
significantly. More importantly, we show improvement in generalization to
unseen scenes and/or objects. The supplementary video can be accessed at the
following link: https://youtu.be/otKjuO805dE
SafeRoute: Learning to Navigate Streets Safely in an Urban Environment
Recent studies show that 85% of women have changed their traveled route to
avoid harassment and assault. Despite this, current mapping tools do not
empower users with information to take charge of their personal safety. We
propose SafeRoute, a novel solution to the problem of navigating cities and
avoiding street harassment and crime. Unlike other street navigation
applications, SafeRoute introduces a new type of path generation via deep
reinforcement learning. This enables us to successfully optimize for
multi-criteria path-finding and incorporate representation learning within our
framework. Our agent learns to pick favorable streets to create a safe and
short path with a reward function that incorporates safety and efficiency.
Given access to recent crime reports in many urban cities, we train our model
for experiments in Boston, New York, and San Francisco. We test our model on
areas of these cities, specifically the populated downtown regions where
tourists and those unfamiliar with the streets walk. We evaluate SafeRoute and
successfully improve over state-of-the-art methods by up to 17% in local
average distance from crimes while decreasing path length by up to 7%.Comment: 8 page
Plan2Vec: Unsupervised Representation Learning by Latent Plans
In this paper we introduce plan2vec, an unsupervised representation learning
approach that is inspired by reinforcement learning. Plan2vec constructs a
weighted graph on an image dataset using near-neighbor distances, and then
extrapolates this local metric to a global embedding by distilling
path-integral over planned path. When applied to control, plan2vec offers a way
to learn goal-conditioned value estimates that are accurate over long horizons
that is both compute and sample efficient. We demonstrate the effectiveness of
plan2vec on one simulated and two challenging real-world image datasets.
Experimental results show that plan2vec successfully amortizes the planning
cost, enabling reactive planning that is linear in memory and computation
complexity rather than exhaustive over the entire state space.Comment: code available at https://geyang.github.io/plan2ve
- …