2,222 research outputs found

    Apprentissage par imitation dans un cadre batch, off-policy et sans modèle

    No full text
    National audienceCe papier traite le problème de l'apprentissage par imitation, c'est à dire la résolution du problème du contrôle optimal à partir de données tirées d'une démonstration d'expert. L'apprentissage par renforcement inverse (IRL) propose un cadre efficace pour résoudre ce problème. En se basant sur l'hypothèse que l'expert maximise un critère, l'IRL essaie d'apprendre la récompense qui définit ce critère à partir de trajectoires d'exemple. Beaucoup d'algorithmes d'IRL font l'hypothèse de l'existence d'un bon approximateur linéaire pour la fonction de récompense et calculent l'attribut moyen (le cumul moyen pondéré des fonctions de base, relatives à la paramétrisation linéaire supposée de la récompense, évaluées en les états d'une trajectoire associée à une certaine politique) via une estimation de Monte-Carlo. Cela implique d'avoir accès à des trajectoires complète de l'expert ainsi qu'à au moins un modèle génératif pour tester les politiques intermédiaires. Dans ce papier nous introduisons une méthode de différence temporelle, LSTD-µ, pour calculer cet attribut moyen. Cela permet d'étendre l'apprentissage par imitation aux cas batch et off-policy

    Inverse reinforcement learning to control a robotic arm using a Brain-Computer Interface

    Get PDF
    The goal of this project is to use inverse reinforce- ment learning to better control a JACO robotic arm developed by Kinova in a Brain-Computer Interface (BCI). A self-paced BCI such as a motor imagery based-BCI allows the subject to give orders at any time to freely control a device. But using this paradigm, even after a long training, the accuracy of the classifier used to recognize the order is not 100%. While a lot of studies try to improve the accuracy using a preprocessing stage that improves the feature extraction, we work on a post- processing solution. The classifier used to recognize the mental commands will provide as outputs a value for each command such as the posterior probability. But the executed action will not only depend on this information. A decision process will also take into account the position of the robotic arm and previous trajectories. More precisely, the decision process will be obtained applying an inverse reinforcement learning (IRL) on a subset of trajectories specified by an expert. At the end of the workshop, the convergence of the inverse reinforcement algorithm has not been achieved. Nevertheless, we developed a whole processing chain based on OpenViBE for controlling 2D- movements and we present how to deal with this high dimensional time series problem with a lot of noise which is unusual for the IRL community

    Contributions à l'apprentissage par renforcement inverse

    Get PDF
    This thesis, "Contributions à l’apprentissage par renforcement inverse", brings three major contributions to the community. The first one is a method for estimating the feature expectation, a quantity involved in most of state-of-the-art approaches which were thus extended to a batch off-policy setting. The second major contribution is an Inverse Reinforcement Learning algorithm, structured classification for inverse reinforcement learning (SCIRL), which relaxes a standard constraint in the field, the repeated solving of a Markov Decision Process, by introducing the temporal structure (using the feature expectation) of this process into a structured margin classification algorithm. The afferent theoretical guarantee and the good empirical performance it exhibited allowed it to be presented in a good international conference : NIPS. Finally, the third contribution is cascaded supervised learning for inverse reinforcement learning (CSI) a method consisting in learning the expert’s behavior via a supervised learning approach, and then introducing the temporal structure of the MDP via a regression involving the score function of the classifier. This method presents the same type of theoretical guarantee as SCIRL, but uses standard components for classification and regression, which makes its use simpler. This work will be presented in another good international conference : ECML

    A cascaded supervised learning approach to inverse reinforcement learning

    Get PDF
    International audienceThis paper considers the Inverse Reinforcement Learning (IRL) problem, that is inferring a reward function for which a demonstrated expert policy is optimal. We propose to break the IRL problem down into two generic Supervised Learning steps: this is the Cascaded Supervised IRL (CSI) approach. A classification step that defines a score function is followed by a regression step providing a reward function. A theoretical analysis shows that the demonstrated expert policy is nearoptimal for the computed reward function. Not needing to repeatedly solve a Markov Decision Process (MDP) and the ability to leverage existing techniques for classification and regression are two important advantages of the CSI approach. It is furthermore empirically demonstrated to compare positively to state-of-the-art approaches when using only transitions sampled according to the expert policy, up to the use of some heuristics. This is exemplified on two classical benchmarks (the mountain car problem and a highway driving simulator)

    Surveillance acoustique des cavités à risque de fontis et d'effondrements localisés

    Get PDF
    National audienceIt is very difficult to monitor sinkholes and local collapses from underground using the classical geotechnical instrumentation since the location of such pre-existing phenomena cannot be easily approached or forecast in time in wide and complex underground cavities. INERIS developed and tested an acoustic method to detect, localize and characterize rock falls with the help of a few sensors.Les cavités souterraines de faible profondeur, naturelles ou anthropiques, peuvent être à l'origine de risques de mouvements de terrains par fontis ou par effondrement localisé. Ce phénomène touche l'ensemble du territoire national. Dans l'attente d'un traitement, une surveillance peut permettre de gérer le risque. Jusqu'à présent, cette surveillance était essentiellement réalisée par inspection visuelle et par instrumentation géotechnique conventionnelle. Cette démarche présentant plusieurs limites dans le suivi des phénomènes dans la continuité et d'exposition des équipes intervenantes, il était important d'examiner de nouvelles solutions instrumentales
    corecore