Search CORE

14 research outputs found

Structural Return Maximization for Reinforcement Learning

Author: Joseph Joshua
Roy Nicholas
Velez Javier
Publication venue
Publication date: 11/05/2014
Field of study

Batch Reinforcement Learning (RL) algorithms attempt to choose a policy from a designer-provided class of policies given a fixed set of training data. Choosing the policy which maximizes an estimate of return often leads to over-fitting when only limited data is available, due to the size of the policy class in relation to the amount of data available. In this work, we focus on learning policy classes that are appropriately sized to the amount of data available. We accomplish this by using the principle of Structural Risk Minimization, from Statistical Learning Theory, which uses Rademacher complexity to identify a policy class that maximizes a bound on the return of the best policy in the chosen policy class, given the available data. Unlike similar batch RL approaches, our bound on return requires only extremely weak assumptions on the true system

arXiv.org e-Print Archive

CiteSeerX

Simultaneous Perturbation Algorithms for Batch Off-Policy Search

Author: Fonteneau Raphael
Prashanth L. A.
Publication venue
Publication date: 01/01/2014
Field of study

We propose novel policy search algorithms in the context of off-policy, batch mode reinforcement learning (RL) with continuous state and action spaces. Given a batch collection of trajectories, we perform off-line policy evaluation using an algorithm similar to that by [Fonteneau et al., 2010]. Using this Monte-Carlo like policy evaluator, we perform policy search in a class of parameterized policies. We propose both first order policy gradient and second order policy Newton algorithms. All our algorithms incorporate simultaneous perturbation estimates for the gradient as well as the Hessian of the cost-to-go vector, since the latter is unknown and only biased estimates are available. We demonstrate their practicality on a simple 1-dimensional continuous state space problem

arXiv.org e-Print Archive

CiteSeerX

Crossref

Open Repository and Bibliography - Liège

A few reinforcement learning stories

Author: Fonteneau Raphaël
Publication venue
Publication date: 05/05/2017
Field of study

Open Repository and Bibliography - Liège

Reinforcement learning with misspecified model classes

Author: Geramifard Alborz
How Jonathan P.
Joseph Joshua Mason
Roberts John W.
Roy Nicholas
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2013
Field of study

Real-world robots commonly have to act in complex, poorly understood environments where the true world dynamics are unknown. To compensate for the unknown world dynamics, we often provide a class of models to a learner so it may select a model, typically using a minimum prediction error metric over a set of training data. Often in real-world domains the model class is unable to capture the true dynamics, due to either limited domain knowledge or a desire to use a small model. In these cases we call the model class misspecified, and an unfortunate consequence of misspecification is that even with unlimited data and computation there is no guarantee the model with minimum prediction error leads to the best performing policy. In this work, our approach improves upon the standard maximum likelihood model selection metric by explicitly selecting the model which achieves the highest expected reward, rather than the most likely model. We present an algorithm for which the highest performing model from the model class is guaranteed to be found given unlimited data and computation. Empirically, we demonstrate that our algorithm is often superior to the maximum likelihood learner in a batch learning setting for two common RL benchmark problems and a third real-world system, the hydrodynamic cart-pole, a domain whose complex dynamics cannot be known exactly.United States. Office of Naval Research. Multidisciplinary University Research Initiative (N00014-11-1-0688

DSpace@MIT

Crossref

Efficient reinforcement learning through variance reduction and trajectory synthesis

Author: Zhao Xiaoming
Publication venue
Publication date: 01/05/2019
Field of study

Reinforcement learning is a general and unified framework that has been proven promising for many important AI applications, such as robotics, self-driving vehicles. However, current reinforcement learning algorithms suffer from large variance and sampling inefficiency, which leads to slow convergent rate as well as unstable performance. In this thesis, we manage to alleviate these two relevant problems. For enormous variance, we combine variance reduced optimization with deep Q-learning. For inefficient sampling, we propose novel framework that integrates self-imitation learning and artificial synthesis procedure. Our approaches, which are flexible and could be extended to many tasks, prove their effectiveness through experiments on Atari and MuJoCo environment

Illinois Digital Environment for Access to Learning and Scholarship Repository

Putting reaction-diffusion systems into port-Hamiltonian framework

Author: Scherpen Jacquelien M.A.
Seslija Marko
van der Schaft Abraham
Publication venue
Publication date: 30/03/2010
Field of study

ARTS repository - University of Groningen

29th Benelux Meeting on Systems and Control:March 30 – April 1, 2010, Heeze, The Netherlands: Book of Abstracts

Author
Publication venue: 'Wageningen University and Research'
Publication date: 01/01/2010
Field of study

University of Twente Research Information