2,467 research outputs found
Planning with Information-Processing Constraints and Model Uncertainty in Markov Decision Processes
Information-theoretic principles for learning and acting have been proposed
to solve particular classes of Markov Decision Problems. Mathematically, such
approaches are governed by a variational free energy principle and allow
solving MDP planning problems with information-processing constraints expressed
in terms of a Kullback-Leibler divergence with respect to a reference
distribution. Here we consider a generalization of such MDP planners by taking
model uncertainty into account. As model uncertainty can also be formalized as
an information-processing constraint, we can derive a unified solution from a
single generalized variational principle. We provide a generalized value
iteration scheme together with a convergence proof. As limit cases, this
generalized scheme includes standard value iteration with a known model,
Bayesian MDP planning, and robust planning. We demonstrate the benefits of this
approach in a grid world simulation.Comment: 16 pages, 3 figure
Action and behavior: a free-energy formulation
We have previously tried to explain perceptual inference and learning under a free-energy principle that pursues Helmholtzâs agenda to understand the brain in terms of energy minimization. It is fairly easy to show that making inferences about the causes of sensory data can be cast as the minimization of a free-energy bound on the likelihood of sensory inputs, given an internal model of how they were caused. In this article, we consider what would happen if the data themselves were sampled to minimize this bound. It transpires that the ensuing active sampling or inference is mandated by ergodic arguments based on the very existence of adaptive agents. Furthermore, it accounts for many aspects of motor behavior; from retinal stabilization to goal-seeking. In particular, it suggests that motor control can be understood as fulfilling prior expectations about proprioceptive sensations. This formulation can explain why adaptive behavior emerges in biological agents and suggests a simple alternative to optimal control theory. We illustrate these points using simulations of oculomotor control and then apply to same principles to cued and goal-directed movements. In short, the free-energy formulation may provide an alternative perspective on the motor control that places it in an intimate relationship with perception
Bayesian multitask inverse reinforcement learning
We generalise the problem of inverse reinforcement learning to multiple
tasks, from multiple demonstrations. Each one may represent one expert trying
to solve a different task, or as different experts trying to solve the same
task. Our main contribution is to formalise the problem as statistical
preference elicitation, via a number of structured priors, whose form captures
our biases about the relatedness of different tasks or expert policies. In
doing so, we introduce a prior on policy optimality, which is more natural to
specify. We show that our framework allows us not only to learn to efficiently
from multiple experts but to also effectively differentiate between the goals
of each. Possible applications include analysing the intrinsic motivations of
subjects in behavioural experiments and learning from multiple teachers.Comment: Corrected version. 13 pages, 8 figure
Cover Tree Bayesian Reinforcement Learning
This paper proposes an online tree-based Bayesian approach for reinforcement
learning. For inference, we employ a generalised context tree model. This
defines a distribution on multivariate Gaussian piecewise-linear models, which
can be updated in closed form. The tree structure itself is constructed using
the cover tree method, which remains efficient in high dimensional spaces. We
combine the model with Thompson sampling and approximate dynamic programming to
obtain effective exploration policies in unknown environments. The flexibility
and computational simplicity of the model render it suitable for many
reinforcement learning problems in continuous state spaces. We demonstrate this
in an experimental comparison with least squares policy iteration
Bayesian Reinforcement Learning via Deep, Sparse Sampling
We address the problem of Bayesian reinforcement learning using efficient
model-based online planning. We propose an optimism-free Bayes-adaptive
algorithm to induce deeper and sparser exploration with a theoretical bound on
its performance relative to the Bayes optimal policy, with a lower
computational complexity. The main novelty is the use of a candidate policy
generator, to generate long-term options in the planning tree (over beliefs),
which allows us to create much sparser and deeper trees. Experimental results
on different environments show that in comparison to the state-of-the-art, our
algorithm is both computationally more efficient, and obtains significantly
higher reward in discrete environments.Comment: Published in AISTATS 202
Certifiable Robustness to Adversarial State Uncertainty in Deep Reinforcement Learning
Deep Neural Network-based systems are now the state-of-the-art in many
robotics tasks, but their application in safety-critical domains remains
dangerous without formal guarantees on network robustness. Small perturbations
to sensor inputs (from noise or adversarial examples) are often enough to
change network-based decisions, which was recently shown to cause an autonomous
vehicle to swerve into another lane. In light of these dangers, numerous
algorithms have been developed as defensive mechanisms from these adversarial
inputs, some of which provide formal robustness guarantees or certificates.
This work leverages research on certified adversarial robustness to develop an
online certifiably robust for deep reinforcement learning algorithms. The
proposed defense computes guaranteed lower bounds on state-action values during
execution to identify and choose a robust action under a worst-case deviation
in input space due to possible adversaries or noise. Moreover, the resulting
policy comes with a certificate of solution quality, even though the true state
and optimal action are unknown to the certifier due to the perturbations. The
approach is demonstrated on a Deep Q-Network policy and is shown to increase
robustness to noise and adversaries in pedestrian collision avoidance scenarios
and a classic control task. This work extends one of our prior works with new
performance guarantees, extensions to other RL algorithms, expanded results
aggregated across more scenarios, an extension into scenarios with adversarial
behavior, comparisons with a more computationally expensive method, and
visualizations that provide intuition about the robustness algorithm.Comment: arXiv admin note: text overlap with arXiv:1910.1290
- âŠ