Search CORE

33 research outputs found

Self-organizing developmental reinforcement learning

Author: D. Bertsekas
M. Puterman
M.G. Lagoudakis
N.P. Rougier
R. Sutton
S. Bradtke
Publication venue: HAL CCSD
Publication date: 01/01/2012
Field of study

International audienceThis paper presents a developmental reinforcement learning framework aimed at exploring rich, complex and large sensorimotor spaces. The core of this architecture is made of a function approximator based on a Dynamic Self-Organizing Map (DSOM). The life-long online learning property of the DSOM allows us to take a developmental approach to learning a robotic task: the perception and motor skills of the robot can grow in richness and complexity during learning. This architecture is tested on a robotic task that looks simple but is still challenging for reinforcement learning.Cet article présente un cadre d'apprentissage par renforcement développemental qui permet d'explorer des espaces sensorimoteurs riches et complexes. Le coeur de cette architecture se compose d'un approximateur de fonction s'appuyant sur une carte auto-organisatrice dynamique (DSOM). Les propriétés de cette carte DSOM, notamment en matière d'apprentissage continu et en-ligne, permettent une approche développementale de l'apprentissage de tâches robotiques : les perceptions et les capacités motrices d'un robot peuvent devenir de plus en plus riches et complexes au cours de l'apprentissage. Cette architecture est testée sur une tâche robotique qui semble simple mais qui pose quand même un défi pour l'apprentissage par renforcement

CiteSeerX

Crossref

INRIA a CCSD electronic archive server

HAL-Rennes 1

Empowerment for Continuous Agent-Environment Systems

Author: Anthony T.
Anthony T.
Brafman R.
Daniel Polani
Der R.
Der R.
Dietterich T.G.
Ernst D.
Girard A.
Kaplan F.
Klyubin A.S.
Klyubin A.S.
Lagoudakis M.G.
Lungarella M.
Peter Stone
Prokopenko M.
Rasmussen C.E.
Schmidhuber J.
Singh S.
Sutton R.
Tishby N.
Tobias Jung
Publication venue
Publication date: 14/01/2011
Field of study

This paper develops generalizations of empowerment to continuous states. Empowerment is a recently introduced information-theoretic quantity motivated by hypotheses about the efficiency of the sensorimotor loop in biological organisms, but also from considerations stemming from curiosity-driven learning. Empowemerment measures, for agent-environment systems with stochastic transitions, how much influence an agent has on its environment, but only that influence that can be sensed by the agent sensors. It is an information-theoretic generalization of joint controllability (influence on environment) and observability (measurement by sensors) of the environment by the agent, both controllability and observability being usually defined in control theory as the dimensionality of the control/observation spaces. Earlier work has shown that empowerment has various interesting and relevant properties, e.g., it allows us to identify salient states using only the dynamics, and it can act as intrinsic reward without requiring an external reward. However, in this previous work empowerment was limited to the case of small-scale and discrete domains and furthermore state transition probabilities were assumed to be known. The goal of this paper is to extend empowerment to the significantly more important and relevant case of continuous vector-valued state spaces and initially unknown state transition probabilities. The continuous state space is addressed by Monte-Carlo approximation; the unknown transitions are addressed by model learning and prediction for which we apply Gaussian processes regression with iterated forecasting. In a number of well-known continuous control tasks we examine the dynamics induced by empowerment and include an application to exploration and online model learning

arXiv.org e-Print Archive

Crossref

University of Hertfordshire Research Archive

OxBlue2009(2D) Team Description

Author: J. Baltes
Jie Ma
M.G. Lagoudakis
S. Shiry
Stephen Cameron
T. Naruse
Publication venue
Publication date: 01/06/2010
Field of study

Revisiting natural actor-critics with value function approximation

Author: J. Park
M. Geist
M.G. Lagoudakis
R.S. Sutton
S.I. Amari
S.J. Bradtke
Publication venue: Springer Verlag
Publication date: 01/01/2010
Field of study

Reinforcement learning (RL) is generally considered as the machine learning answer to the optimal con-trol problem. In this paradigm, an agent learns to control optimally a dynamic system through interactions. At each time step i, the dynamic system is in a given state si and receives from the agent a command (or action) ai. According to its own dynamics, the system transits to a new state si+1, and a reward ri is given to the agent. The objective is to learn a control policy maximizing the expected cumulative discounted reward. Actor-critics approaches were among the first to be proposed for handling the RL problem [1]. In this setting, two structures are maintained, one for the actor (the control organ) and one for the critic (the value function which models the expected cumulative reward to be maximized). One advantage of such an approach is that it does not require knowledge about the system dynamics to learn an optimal policy. However, the introduction of the state-action value (orQ-) function [6] led to a focus of research community in pure critic methods, for which the control policy is derived from the Q-function and has no longer a specific representation. Actually, in contrast with value function, state-action value function allows deriving a greedy policy without knowing system dynamics, and function approximation (which is a way to handle large problems) is easier to combine with pure critic approaches. Pure critic algorithms therefore aim at learning this Q-function. However, actor-critics have numerous advantages over pure critics: a separat

MapReduce for Parallel Reinforcement Learning

Author: D.P. Bertsekas
D.P. Bertsekas
G.H. Golub
J..N. Tsitsiklis
M.G. Lagoudakis
M.L. Puterman
Publication venue
Publication date: 01/01/2012
Field of study

Abstract. We investigate the parallelization of reinforcement learning algorithms using MapReduce, a popular parallel computing framework. We present parallel versions of several dynamic programming algorithms, including policy evaluation, policy iteration, and off-policy updates. Furthermore, we design parallel reinforcement learning algorithms to deal with large scale problems using linear function approximation, including model-based projection, least squares policy iteration, temporal difference learning and recent gradient temporal difference learning algorithms. We give time and space complexity analysis of the proposed algorithms. This study demonstrates how parallelization opens new avenues for solving large scale reinforcement learning problems.

CiteSeerX

Crossref

Self-organizing Developmental Reinforcement Learning

Author: D. Bertsekas
M. Puterman
M.G. Lagoudakis
N.P. Rougier
R. Sutton
S. Bradtke
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Crossref

Contingent Features for Reinforcement Learning

Author: A.N. Escalante-B
H. Sprekeler
L. Wiskott
M. Luciw
M.G. Lagoudakis
S. Mahadevan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Crossref

Learning Graph-based Representations for Continuous Reinforcement Learning Domains

Author: A.G. Barto
I. Menache
J.H. Metzen
M.G. Lagoudakis
R.S. Sutton
S. Mahadevan
Y. Yekutieli
Publication venue
Publication date: 01/01/2013
Field of study

Abstract. Graph-based domain representations have been used in discrete reinforcement learning domains as basis for, e.g., autonomous skill discovery and representation learning. These abilities are also highly relevant for learning in domains which have structured, continuous state spaces as they allow to decompose complex problems into simpler ones and reduce the burden of handengineering features. However, since graphs are inherently discrete structures, the extension of these approaches to continuous domains is not straight-forward. We argue that graphs should be seen as discrete, generative models of continuous domains. Based on this intuition, we define the likelihood of a graph for a given set of observed state transitions and derive a heuristic method entitled FIGE that allows to learn graph-based representations of continuous domains with large likelihood. Based on FIGE, we present a new skill discovery approach for continuous domains. Furthermore, we show that the learning of representations can be considerably improved by using FIGE.

CiteSeerX

Crossref

Min max generalization for deterministic batch mode reinforcement learning: relaxation schemes

Author: B. Boigelot
Bradtke S.J.
D. Ernst
Ernst D.
Fonteneau R.
Fonteneau R.
Fonteneau R.
Lagoudakis M.G.
Nesterov Y.
Q. Louveaux
R. Fonteneau
Riedmiller M.
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 15/06/2012
Field of study

We study the min max optimization problem introduced in Fonteneau et al. [Towards min max reinforcement learning, ICAART 2010, Springer, Heidelberg, 2011, pp. 61–77] for computing policies for batch mode reinforcement learning in a deterministic setting with ﬁxed, ﬁnite time horizon. First, we show that the min part of this problem is NP-hard. We then provide two relaxation schemes. The ﬁrst relaxation scheme works by dropping some constraints in order to obtain a problem that is solvable in polynomial time. The second relaxation scheme, based on a Lagrangian relaxation where all constraints are dualized, can also be solved in polynomial time. We also theoretically prove and empirically illustrate that both relaxation schemes provide better results than those given in [Fonteneau et al., 2011, as cited above]

CiteSeerX

Crossref

Open Repository and Bibliography - Liège

Compositional Models for Reinforcement Learning

Author: A.G. Barto
A.W. Moore
D. Ormoneit
M.G. Lagoudakis
M.L. Puterman
R.I. Brafman
R.S. Sutton
T.G. Dietterich
Publication venue
Publication date: 01/01/2009
Field of study

Abstract. Innovations such as optimistic exploration, function approximation, and hierarchical decomposition have helped scale reinforcement learning to more complex environments, but these three ideas have rarely been studied together. This paper develops a unified framework that formalizes these algorithmic contributions as operators on learned models of the environment. Our formalism reveals some synergies among these innovations, and it suggests a straightforward way to compose them. The resulting algorithm, Fitted R-MAXQ, is the first to combine the function approximation of fitted algorithms, the efficient model-based exploration of R-MAX, and the hierarchical decompostion of MAXQ.

CiteSeerX

Crossref