Search CORE

2,222 research outputs found

Apprentissage par imitation dans un cadre batch, off-policy et sans modèle

Author: Geist Matthieu
Klein Edouard
Pietquin Olivier
Publication venue: HAL CCSD
Publication date: 23/06/2011
Field of study

National audienceCe papier traite le problème de l'apprentissage par imitation, c'est à dire la résolution du problème du contrôle optimal à partir de données tirées d'une démonstration d'expert. L'apprentissage par renforcement inverse (IRL) propose un cadre efficace pour résoudre ce problème. En se basant sur l'hypothèse que l'expert maximise un critère, l'IRL essaie d'apprendre la récompense qui définit ce critère à partir de trajectoires d'exemple. Beaucoup d'algorithmes d'IRL font l'hypothèse de l'existence d'un bon approximateur linéaire pour la fonction de récompense et calculent l'attribut moyen (le cumul moyen pondéré des fonctions de base, relatives à la paramétrisation linéaire supposée de la récompense, évaluées en les états d'une trajectoire associée à une certaine politique) via une estimation de Monte-Carlo. Cela implique d'avoir accès à des trajectoires complète de l'expert ainsi qu'à au moins un modèle génératif pour tester les politiques intermédiaires. Dans ce papier nous introduisons une méthode de différence temporelle, LSTD-µ, pour calculer cet attribut moyen. Cela permet d'étendre l'apprentissage par imitation aux cas batch et off-policy

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

HAL-Rennes 1

Inverse reinforcement learning to control a robotic arm using a Brain-Computer Interface

Author: Bougrain Laurent
Duvinage Matthieu
Klein Edouard
Publication venue: HAL CCSD
Publication date: 28/09/2012
Field of study

The goal of this project is to use inverse reinforce- ment learning to better control a JACO robotic arm developed by Kinova in a Brain-Computer Interface (BCI). A self-paced BCI such as a motor imagery based-BCI allows the subject to give orders at any time to freely control a device. But using this paradigm, even after a long training, the accuracy of the classifier used to recognize the order is not 100%. While a lot of studies try to improve the accuracy using a preprocessing stage that improves the feature extraction, we work on a post- processing solution. The classifier used to recognize the mental commands will provide as outputs a value for each command such as the posterior probability. But the executed action will not only depend on this information. A decision process will also take into account the position of the robotic arm and previous trajectories. More precisely, the decision process will be obtained applying an inverse reinforcement learning (IRL) on a subset of trajectories specified by an expert. At the end of the workshop, the convergence of the inverse reinforcement algorithm has not been achieved. Nevertheless, we developed a whole processing chain based on OpenViBE for controlling 2D- movements and we present how to deal with this high dimensional time series problem with a lot of noise which is unusual for the IRL community

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

Contributions à l'apprentissage par renforcement inverse

Author: Klein Edouard
Publication venue: HAL CCSD
Publication date: 21/11/2013
Field of study

This thesis, "Contributions à l’apprentissage par renforcement inverse", brings three major contributions to the community. The first one is a method for estimating the feature expectation, a quantity involved in most of state-of-the-art approaches which were thus extended to a batch off-policy setting. The second major contribution is an Inverse Reinforcement Learning algorithm, structured classification for inverse reinforcement learning (SCIRL), which relaxes a standard constraint in the field, the repeated solving of a Markov Decision Process, by introducing the temporal structure (using the feature expectation) of this process into a structured margin classification algorithm. The afferent theoretical guarantee and the good empirical performance it exhibited allowed it to be presented in a good international conference : NIPS. Finally, the third contribution is cascaded supervised learning for inverse reinforcement learning (CSI) a method consisting in learning the expert’s behavior via a supervised learning approach, and then introducing the temporal structure of the MDP via a regression involving the score function of the classifier. This method presents the same type of theoretical guarantee as SCIRL, but uses standard components for classification and regression, which makes its use simpler. This work will be presented in another good international conference : ECML

HAL-CentraleSupelec

Thèses en Ligne

INRIA a CCSD electronic archive server

HAL-Rennes 1

A cascaded supervised learning approach to inverse reinforcement learning

Author: Geist Matthieu
Klein Edouard
Pietquin Olivier
Piot Bilal
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

International audienceThis paper considers the Inverse Reinforcement Learning (IRL) problem, that is inferring a reward function for which a demonstrated expert policy is optimal. We propose to break the IRL problem down into two generic Supervised Learning steps: this is the Cascaded Supervised IRL (CSI) approach. A classification step that defines a score function is followed by a regression step providing a reward function. A theoretical analysis shows that the demonstrated expert policy is nearoptimal for the computed reward function. Not needing to repeatedly solve a Markov Decision Process (MDP) and the ability to leverage existing techniques for classification and regression are two important advantages of the CSI approach. It is furthermore empirically demonstrated to compare positively to state-of-the-art approaches when using only transitions sampled according to the expert policy, up to the use of some heuristics. This is exemplified on two classical benchmarks (the mountain car problem and a highway driving simulator)

Surveillance acoustique des cavités à risque de fontis et d'effondrements localisés

Author: De Rosny Julien
Klein Emmanuelle
Nadim Charles-Edouard
Occhiena Cristina
Publication venue: HAL CCSD
Publication date: 29/10/2013
Field of study

National audienceIt is very difficult to monitor sinkholes and local collapses from underground using the classical geotechnical instrumentation since the location of such pre-existing phenomena cannot be easily approached or forecast in time in wide and complex underground cavities. INERIS developed and tested an acoustic method to detect, localize and characterize rock falls with the help of a few sensors.Les cavités souterraines de faible profondeur, naturelles ou anthropiques, peuvent être à l'origine de risques de mouvements de terrains par fontis ou par effondrement localisé. Ce phénomène touche l'ensemble du territoire national. Dans l'attente d'un traitement, une surveillance peut permettre de gérer le risque. Jusqu'à présent, cette surveillance était essentiellement réalisée par inspection visuelle et par instrumentation géotechnique conventionnelle. Cette démarche présentant plusieurs limites dans le suivi des phénomènes dans la continuité et d'exposition des équipes intervenantes, il était important d'examiner de nouvelles solutions instrumentales

HAL-INERIS

Hal-Diderot