Search CORE

4 research outputs found

Following Newton direction in Policy Gradient with parameter exploration

Author: Bascetta Luca
Manganini Giorgio
Pirotta Matteo
Restelli Marcello
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

This paper investigates the use of second-order methods to solve Markov Decision Processes (MDPs). Despite the popularity of second-order methods in optimization literature, so far little attention has been paid to the extension of such techniques to face sequential decision problems. Here we provide a model-free Reinforcement Learning method that estimates the Newton direction by sampling directly in the parameter space. In order to compute the Newton direction we provide the formulation of the Hessian of the expected return, a technique for variance reduction in the sample-based estimation and a finite sample analysis in the case of Normal distribution. Beside discussing the theoretical properties, we empirically evaluate the method on an instructional linear-quadratic regulator and on a complex dynamical quadrotor system

Archivio istituzionale della ricerca - Politecnico di Milano

Crossref

Compatible Reward Inverse Reinforcement Learning

Author: Metelli ALBERTO MARIA
Pirotta M.
Restelli M.
Publication venue
Publication date: 01/01/2017
Field of study

Archivio istituzionale della ricerca - Politecnico di Milano

Policy Search for the Optimal Control of Markov Decision Processes: A Novel Particle-Based Iterative Scheme

Author: Manganini Giorgio
Piroddi Luigi
Pirotta Matteo
Prandini Maria
Restelli Marcello
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

Archivio istituzionale della ricerca - Politecnico di Milano

Crossref

Compatible Reward Inverse Reinforcement Learning

Author: Metelli Alberto,
Pirotta Matteo
Restelli Marcello
Publication venue: HAL CCSD
Publication date: 04/12/2017
Field of study

International audienceInverse Reinforcement Learning (IRL) is an effective approach to recover a reward function that explains the behavior of an expert by observing a set of demonstrations. This paper is about a novel model-free IRL approach that, differently from most of the existing IRL algorithms, does not require to specify a function space where to search for the expert's reward function. Leveraging on the fact that the policy gradient needs to be zero for any optimal policy, the algorithm generates a set of basis functions that span the subspace of reward functions that make the policy gradient vanish. Within this subspace, using a second-order criterion, we search for the reward function that penalizes the most a deviation from the expert's policy. After introducing our approach for finite domains, we extend it to continuous ones. The proposed approach is empirically compared to other IRL methods both in the (finite) Taxi domain and in the (continuous) Linear Quadratic Gaussian (LQG) and Car on the Hill environments

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot