8,755 research outputs found
Adaptive Variance for Changing Sparse-Reward Environments
Robots that are trained to perform a task in a fixed environment often fail
when facing unexpected changes to the environment due to a lack of exploration.
We propose a principled way to adapt the policy for better exploration in
changing sparse-reward environments. Unlike previous works which explicitly
model environmental changes, we analyze the relationship between the value
function and the optimal exploration for a Gaussian-parameterized policy and
show that our theory leads to an effective strategy for adjusting the variance
of the policy, enabling fast adapt to changes in a variety of sparse-reward
environments.Comment: Accepted as a conference at International Conference on Robotics and
Automation(ICRA) 201
Policy Consolidation for Continual Reinforcement Learning
We propose a method for tackling catastrophic forgetting in deep
reinforcement learning that is \textit{agnostic} to the timescale of changes in
the distribution of experiences, does not require knowledge of task boundaries,
and can adapt in \textit{continuously} changing environments. In our
\textit{policy consolidation} model, the policy network interacts with a
cascade of hidden networks that simultaneously remember the agent's policy at a
range of timescales and regularise the current policy by its own history,
thereby improving its ability to learn without forgetting. We find that the
model improves continual learning relative to baselines on a number of
continuous control tasks in single-task, alternating two-task, and multi-agent
competitive self-play settings.Comment: Accepted at ICML 201
ANS: Adaptive Network Scaling for Deep Rectifier Reinforcement Learning Models
This work provides a thorough study on how reward scaling can affect
performance of deep reinforcement learning agents. In particular, we would like
to answer the question that how does reward scaling affect non-saturating ReLU
networks in RL? This question matters because ReLU is one of the most effective
activation functions for deep learning models. We also propose an Adaptive
Network Scaling framework to find a suitable scale of the rewards during
learning for better performance. We conducted empirical studies to justify the
solution
Reinforcement Learning for Robotics and Control with Active Uncertainty Reduction
Model-free reinforcement learning based methods such as Proximal Policy
Optimization, or Q-learning typically require thousands of interactions with
the environment to approximate the optimum controller which may not always be
feasible in robotics due to safety and time consumption. Model-based methods
such as PILCO or BlackDrops, while data-efficient, provide solutions with
limited robustness and complexity. To address this tradeoff, we introduce
active uncertainty reduction-based virtual environments, which are formed
through limited trials conducted in the original environment. We provide an
efficient method for uncertainty management, which is used as a metric for
self-improvement by identification of the points with maximum expected
improvement through adaptive sampling. Capturing the uncertainty also allows
for better mimicking of the reward responses of the original system. Our
approach enables the use of complex policy structures and reward functions
through a unique combination of model-based and model-free methods, while still
retaining the data efficiency. We demonstrate the validity of our method on
several classic reinforcement learning problems in OpenAI gym. We prove that
our approach offers a better modeling capacity for complex system dynamics as
compared to established methods
COBRA: Data-Efficient Model-Based RL through Unsupervised Object Discovery and Curiosity-Driven Exploration
Data efficiency and robustness to task-irrelevant perturbations are
long-standing challenges for deep reinforcement learning algorithms. Here we
introduce a modular approach to addressing these challenges in a continuous
control environment, without using hand-crafted or supervised information. Our
Curious Object-Based seaRch Agent (COBRA) uses task-free intrinsically
motivated exploration and unsupervised learning to build object-based models of
its environment and action space. Subsequently, it can learn a variety of tasks
through model-based search in very few steps and excel on structured hold-out
tests of policy robustness
Context-Dependent Upper-Confidence Bounds for Directed Exploration
Directed exploration strategies for reinforcement learning are critical for
learning an optimal policy in a minimal number of interactions with the
environment. Many algorithms use optimism to direct exploration, either through
visitation estimates or upper confidence bounds, as opposed to data-inefficient
strategies like \epsilon-greedy that use random, undirected exploration. Most
data-efficient exploration methods require significant computation, typically
relying on a learned model to guide exploration. Least-squares methods have the
potential to provide some of the data-efficiency benefits of model-based
approaches -- because they summarize past interactions -- with the computation
closer to that of model-free approaches. In this work, we provide a novel,
computationally efficient, incremental exploration strategy, leveraging this
property of least-squares temporal difference learning (LSTD). We derive upper
confidence bounds on the action-values learned by LSTD, with context-dependent
(or state-dependent) noise variance. Such context-dependent noise focuses
exploration on a subset of variable states, and allows for reduced exploration
in other states. We empirically demonstrate that our algorithm can converge
more quickly than other incremental exploration strategies using confidence
estimates on action-values.Comment: Neural Information Processing Systems 201
Generative predecessor models for sample-efficient imitation learning
We propose Generative Predecessor Models for Imitation Learning (GPRIL), a
novel imitation learning algorithm that matches the state-action distribution
to the distribution observed in expert demonstrations, using generative models
to reason probabilistically about alternative histories of demonstrated states.
We show that this approach allows an agent to learn robust policies using only
a small number of expert demonstrations and self-supervised interactions with
the environment. We derive this approach from first principles and compare it
empirically to a state-of-the-art imitation learning method, showing that it
outperforms or matches its performance on two simulated robot manipulation
tasks and demonstrate significantly higher sample efficiency by applying the
algorithm on a real robot
Model-Based Stochastic Search for Large Scale Optimization of Multi-Agent UAV Swarms
Recent work from the reinforcement learning community has shown that
Evolution Strategies are a fast and scalable alternative to other reinforcement
learning methods. In this paper we show that Evolution Strategies are a special
case of model-based stochastic search methods. This class of algorithms has
nice asymptotic convergence properties and known convergence rates. We show how
these methods can be used to solve both cooperative and competitive multi-agent
problems in an efficient manner. We demonstrate the effectiveness of this
approach on two complex multi-agent UAV swarm combat scenarios: where a team of
fixed wing aircraft must attack a well-defended base, and where two teams of
agents go head to head to defeat each other.Comment: Video at http://goo.gl/dWvQi7 Code freely available at
http://github.com/ddfan/swarm_evolv
EX2: Exploration with Exemplar Models for Deep Reinforcement Learning
Deep reinforcement learning algorithms have been shown to learn complex tasks
using highly general policy classes. However, sparse reward problems remain a
significant challenge. Exploration methods based on novelty detection have been
particularly successful in such settings but typically require generative or
predictive models of the observations, which can be difficult to train when the
observations are very high-dimensional and complex, as in the case of raw
images. We propose a novelty detection algorithm for exploration that is based
entirely on discriminatively trained exemplar models, where classifiers are
trained to discriminate each visited state against all others. Intuitively,
novel states are easier to distinguish against other states seen during
training. We show that this kind of discriminative modeling corresponds to
implicit density estimation, and that it can be combined with count-based
exploration to produce competitive results on a range of popular benchmark
tasks, including state-of-the-art results on challenging egocentric
observations in the vizDoom benchmark
Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments
Ability to continuously learn and adapt from limited experience in
nonstationary environments is an important milestone on the path towards
general intelligence. In this paper, we cast the problem of continuous
adaptation into the learning-to-learn framework. We develop a simple
gradient-based meta-learning algorithm suitable for adaptation in dynamically
changing and adversarial scenarios. Additionally, we design a new multi-agent
competitive environment, RoboSumo, and define iterated adaptation games for
testing various aspects of continuous adaptation strategies. We demonstrate
that meta-learning enables significantly more efficient adaptation than
reactive baselines in the few-shot regime. Our experiments with a population of
agents that learn and compete suggest that meta-learners are the fittest.Comment: Published as a conference paper at ICLR 201
- …