15 research outputs found
Monte Carlo Bayesian Reinforcement Learning
Bayesian reinforcement learning (BRL) encodes prior knowledge of the world in
a model and represents uncertainty in model parameters by maintaining a
probability distribution over them. This paper presents Monte Carlo BRL
(MC-BRL), a simple and general approach to BRL. MC-BRL samples a priori a
finite set of hypotheses for the model parameter values and forms a discrete
partially observable Markov decision process (POMDP) whose state space is a
cross product of the state space for the reinforcement learning task and the
sampled model parameter space. The POMDP does not require conjugate
distributions for belief representation, as earlier works do, and can be solved
relatively easily with point-based approximation algorithms. MC-BRL naturally
handles both fully and partially observable worlds. Theoretical and
experimental results show that the discrete POMDP approximates the underlying
BRL task well with guaranteed performance.Comment: Appears in Proceedings of the 29th International Conference on
Machine Learning (ICML 2012
Probabilistic Inference for Fast Learning in Control
We provide a novel framework for very fast model-based reinforcement learning in continuous state and action spaces. The framework requires probabilistic models that explicitly characterize their levels of confidence. Within this framework, we use flexible, non-parametric models to describe the world based on previously collected experience. We demonstrate learning on the cart-pole problem in a setting where we provide very limited prior knowledge about the task. Learning progresses rapidly, and a good policy is found after only a hand-full of iterations
Exploration in Approximate Hyper-State Space for Meta Reinforcement Learning
To rapidly learn a new task, it is often essential for agents to explore
efficiently -- especially when performance matters from the first timestep. One
way to learn such behaviour is via meta-learning. Many existing methods however
rely on dense rewards for meta-training, and can fail catastrophically if the
rewards are sparse. Without a suitable reward signal, the need for exploration
during meta-training is exacerbated. To address this, we propose HyperX, which
uses novel reward bonuses for meta-training to explore in approximate
hyper-state space (where hyper-states represent the environment state and the
agent's task belief). We show empirically that HyperX meta-learns better
task-exploration and adapts more successfully to new tasks than existing
methods.Comment: Published at the International Conference on Machine Learning (ICML)
202
Verifying Controllers Against Adversarial Examples with Bayesian Optimization
Recent successes in reinforcement learning have lead to the development of
complex controllers for real-world robots. As these robots are deployed in
safety-critical applications and interact with humans, it becomes critical to
ensure safety in order to avoid causing harm. A first step in this direction is
to test the controllers in simulation. To be able to do this, we need to
capture what we mean by safety and then efficiently search the space of all
behaviors to see if they are safe. In this paper, we present an active-testing
framework based on Bayesian Optimization. We specify safety constraints using
logic and exploit structure in the problem in order to test the system for
adversarial counter examples that violate the safety specifications. These
specifications are defined as complex boolean combinations of smooth functions
on the trajectories and, unlike reward functions in reinforcement learning, are
expressive and impose hard constraints on the system. In our framework, we
exploit regularity assumptions on individual functions in form of a Gaussian
Process (GP) prior. We combine these into a coherent optimization framework
using problem structure. The resulting algorithm is able to provably verify
complex safety specifications or alternatively find counter examples.
Experimental results show that the proposed method is able to find adversarial
examples quickly.Comment: Proc. of the IEEE International Conference on Robotics and
Automation, 201
Robotic manipulation of multiple objects as a POMDP
This paper investigates manipulation of multiple unknown objects in a crowded
environment. Because of incomplete knowledge due to unknown objects and
occlusions in visual observations, object observations are imperfect and action
success is uncertain, making planning challenging. We model the problem as a
partially observable Markov decision process (POMDP), which allows a general
reward based optimization objective and takes uncertainty in temporal evolution
and partial observations into account. In addition to occlusion dependent
observation and action success probabilities, our POMDP model also
automatically adapts object specific action success probabilities. To cope with
the changing system dynamics and performance constraints, we present a new
online POMDP method based on particle filtering that produces compact policies.
The approach is validated both in simulation and in physical experiments in a
scenario of moving dirty dishes into a dishwasher. The results indicate that:
1) a greedy heuristic manipulation approach is not sufficient, multi-object
manipulation requires multi-step POMDP planning, and 2) on-line planning is
beneficial since it allows the adaptation of the system dynamics model based on
actual experience
Bayesian Nonparametric Methods for Partially-Observable Reinforcement Learning
Making intelligent decisions from incomplete information is critical in many applications: for example, robots must choose actions based on imperfect sensors, and speech-based interfaces must infer a user’s needs from noisy microphone inputs. What makes these tasks hard is that often we do not have a natural representation with which to model the domain and use for choosing actions; we must learn about the domain’s properties while simultaneously performing the task. Learning a representation also involves trade-offs between modeling the data that we have seen previously and being able to make predictions about new data. This article explores learning representations of stochastic systems using Bayesian nonparametric statistics. Bayesian nonparametric methods allow the sophistication of a representation to scale gracefully with the complexity in the data. Our main contribution is a careful empirical evaluation of how representations learned using Bayesian nonparametric methods compare to other standard learning approaches, especially in support of planning and control. We show that the Bayesian aspects of the methods result in achieving state-of-the-art performance in decision making with relatively few samples, while the nonparametric aspects often result in fewer computations. These results hold across a variety of different techniques for choosing actions given a representation
Optimal treatment allocations in space and time for on-line control of an emerging infectious disease
A key component in controlling the spread of an epidemic is deciding where, whenand to whom to apply an intervention.We develop a framework for using data to informthese decisionsin realtime.We formalize a treatment allocation strategy as a sequence of functions, oneper treatment period, that map up-to-date information on the spread of an infectious diseaseto a subset of locations where treatment should be allocated. An optimal allocation strategyoptimizes some cumulative outcome, e.g. the number of uninfected locations, the geographicfootprint of the disease or the cost of the epidemic. Estimation of an optimal allocation strategyfor an emerging infectious disease is challenging because spatial proximity induces interferencebetween locations, the number of possible allocations is exponential in the number oflocations, and because disease dynamics and intervention effectiveness are unknown at outbreak.We derive a Bayesian on-line estimator of the optimal allocation strategy that combinessimulation–optimization with Thompson sampling.The estimator proposed performs favourablyin simulation experiments. This work is motivated by and illustrated using data on the spread ofwhite nose syndrome, which is a highly fatal infectious disease devastating bat populations inNorth America
Efficient methods for near-optimal sequential decision making under uncertainty
This chapter discusses decision making under uncertainty. More specifically, it offers an overview of efficient Bayesian and distribution-free algorithms for making near-optimal sequential decisions under uncertainty about the environment. Due to the uncertainty, such algorithms must not only learn from their interaction with the environment but also perform as well as possible while learning is taking place. © 2010 Springer-Verlag Berlin Heidelberg
A Practical and Conceptual Framework for Learning in Control
We propose a fully Bayesian approach for efficient reinforcement learning (RL) in Markov decision processes with continuous-valued state and action spaces when no expert knowledge is available. Our framework is based on well-established ideas from statistics and machine learning and learns fast since it carefully models, quantifies, and incorporates available knowledge when making decisions. The key ingredient of our framework is a probabilistic model, which is implemented using a Gaussian process (GP), a distribution over functions. In the context of dynamic systems, the GP models the transition function. By considering all plausible transition functions simultaneously, we reduce model bias, a problem that frequently occurs when deterministic models are used. Due to its generality and efficiency, our RL framework can be considered a conceptual and practical approach to learning models and controllers whe