32,998 research outputs found
Representation Discovery for Kernel-Based Reinforcement Learning
Recent years have seen increased interest in non-parametric reinforcement learning. There are now practical kernel-based algorithms for approximating value functions; however, kernel regression requires that the underlying function being approximated be smooth on its domain. Few problems of interest satisfy this requirement in their natural representation. In this paper we define Value-Consistent Pseudometric (VCPM), the distance function corresponding to a transformation of the domain into a space where the target function is maximally smooth and thus well-approximated by kernel regression. We then present DKBRL, an iterative batch RL algorithm interleaving steps of Kernel-Based Reinforcement Learning and distance metric adjustment. We evaluate its performance on Acrobot and PinBall, continuous-space reinforcement learning domains with discontinuous value functions
Human Apprenticeship Learning via Kernel-based Inverse Reinforcement Learning
It has been well demonstrated that inverse reinforcement learning (IRL) is an
effective technique for teaching machines to perform tasks at human skill
levels given human demonstrations (i.e., human to machine apprenticeship
learning). This paper seeks to show that a similar application can be
demonstrated with human learners. That is, given demonstrations from human
experts inverse reinforcement learning techniques can be used to teach other
humans to perform at higher skill levels (i.e., human to human apprenticeship
learning). To show this two experiments were conducted using a simple,
real-time web game where players were asked to touch targets in order to earn
as many points as possible. For the experiment player performance was defined
as the number of targets a player touched, irrespective of the points that a
player actually earned. This allowed for in-game points to be modified and the
effect of these alterations on performance measured. At no time were
participants told the true performance metric. To determine the point
modifications IRL was applied on demonstrations of human experts playing the
game. The results of the experiment show with significance that performance
improved over the control for select treatment groups. Finally, in addition to
the experiment, we also detail the algorithmic challenges we faced when
conducting the experiment and the techniques we used to overcome them.Comment: 31 pages, 23 figures, Submitted to Journal of Artificial Intelligence
Research, "for source code, see https://github.com/mrucker/kpirl-kla
Kernelizing LSPE λ
We propose the use of kernel-based methods as underlying function approximator in the least-squares based policy evaluation framework of LSPE(λ) and LSTD(λ). In particular we present the ‘kernelization’ of model-free LSPE(λ). The ‘kernelization’ is computationally made possible by using the subset of regressors approximation, which approximates the kernel using a vastly reduced number of basis functions. The core of our proposed solution is an efficient recursive implementation with automatic supervised selection of the relevant basis functions. The LSPE method is well-suited for optimistic policy iteration and can thus be used in the context of online reinforcement learning. We use the high-dimensional Octopus benchmark to demonstrate this
Agile and Versatile Robot Locomotion via Kernel-based Residual Learning
This work developed a kernel-based residual learning framework for
quadrupedal robotic locomotion. Initially, a kernel neural network is trained
with data collected from an MPC controller. Alongside a frozen kernel network,
a residual controller network is trained via reinforcement learning to acquire
generalized locomotion skills and resilience against external perturbations.
With this proposed framework, a robust quadrupedal locomotion controller is
learned with high sample efficiency and controllability, providing
omnidirectional locomotion at continuous velocities. Its versatility and
robustness are validated on unseen terrains that the expert MPC controller
fails to traverse. Furthermore, the learned kernel can produce a range of
functional locomotion behaviors and can generalize to unseen gaits
Multi-Agent Learning in Contextual Games under Unknown Constraints
We consider the problem of learning to play a repeated contextual game with
unknown reward and unknown constraints functions. Such games arise in
applications where each agent's action needs to belong to a feasible set, but
the feasible set is a priori unknown. For example, in constrained multi-agent
reinforcement learning, the constraints on the agents' policies are a function
of the unknown dynamics and hence, are themselves unknown. Under kernel-based
regularity assumptions on the unknown functions, we develop a no-regret,
no-violation approach which exploits similarities among different reward and
constraint outcomes. The no-violation property ensures that the time-averaged
sum of constraint violations converges to zero as the game is repeated. We show
that our algorithm, referred to as c.z.AdaNormalGP, obtains kernel-dependent
regret bounds and that the cumulative constraint violations have sublinear
kernel-dependent upper bounds. In addition we introduce the notion of
constrained contextual coarse correlated equilibria (c.z.CCE) and show that
-c.z.CCEs can be approached whenever players' follow a no-regret
no-violation strategy. Finally, we experimentally demonstrate the effectiveness
of c.z.AdaNormalGP on an instance of multi-agent reinforcement learning
From Supervised to Reinforcement Learning: a Kernel-based Bayesian Filtering Framework
International audienceIn a large number of applications, engineers have to estimate a function linked to the state of a dynamic system. To do so, a sequence of samples drawn from this unknown function is observed while the system is transiting from state to state and the problem is to generalize these observations to unvisited states. Several solutions can be envisioned among which regressing a family of parameterized functions so as to make it fit at best to the observed samples. This is the first problem addressed with the proposed kernel-based Bayesian filtering approach, which also allows quantifying uncertainty reduction occurring when acquiring more samples. Classical methods cannot handle the case where actual samples are not directly observable but only a non linear mapping of them is available, which happens when a special sensor has to be used or when solving the Bellman equation in order to control the system. However the approach proposed in this paper can be extended to this tricky case. Moreover, an application of this indirect function approximation scheme to reinforcement learning is presented. A set of experiments is also proposed in order to demonstrate the efficiency of this kernel-based Bayesian approach
- …