12 research outputs found

    Direct Value Learning: a Rank-Invariant Approach to Reinforcement Learning

    Get PDF
    International audienceTaking inspiration from inverse reinforcement learning, the proposed Direct Value Learning for Reinforcement Learning (DIVA) approach uses light priors to gener-ate inappropriate behaviors, and uses the corresponding state sequences to directly learn a value function. When the transition model is known, this value function directly defines a (nearly) optimal controller. Otherwise, the value function is extended to the state-action space using off-policy learning. The experimental validation of DIVA on the mountain car problem shows the robustness of the approach comparatively to SARSA, based on the assumption that the target state is known. The experimental validation on the bicycle problem shows that DIVA still finds good policies when relaxing this assumption

    Doubly robust Bayesian inference for non-stationary streaming data with β-divergences

    Get PDF
    We present the very first robust Bayesian Online Changepoint Detection algorithm through General Bayesian Inference (GBI) with β-divergences. The resulting inference procedure is doubly robust for both the predictive and the changepoint (CP) posterior, with linear time and constant space complexity. We provide a construction for exponential models and demonstrate it on the Bayesian Linear Regression model. In so doing, we make two additional contributions: Firstly, we make GBI scalable using Structural Variational approximations that are exact as β→0 . Secondly, we give a principled way of choosing the divergence parameter β by minimizing expected predictive loss on-line. We offer the state of the art and improve the False Discovery Rate of CP S by more than 80% on real world data

    APRIL: Active Preference-learning based Reinforcement Learning

    Get PDF
    This paper focuses on reinforcement learning (RL) with limited prior knowledge. In the domain of swarm robotics for instance, the expert can hardly design a reward function or demonstrate the target behavior, forbidding the use of both standard RL and inverse reinforcement learning. Although with a limited expertise, the human expert is still often able to emit preferences and rank the agent demonstrations. Earlier work has presented an iterative preference-based RL framework: expert preferences are exploited to learn an approximate policy return, thus enabling the agent to achieve direct policy search. Iteratively, the agent selects a new candidate policy and demonstrates it; the expert ranks the new demonstration comparatively to the previous best one; the expert's ranking feedback enables the agent to refine the approximate policy return, and the process is iterated. In this paper, preference-based reinforcement learning is combined with active ranking in order to decrease the number of ranking queries to the expert needed to yield a satisfactory policy. Experiments on the mountain car and the cancer treatment testbeds witness that a couple of dozen rankings enable to learn a competent policy

    Movement primitives as a robotic tool to interpret trajectories through learning-by-doing

    Get PDF
    Articulated movements are fundamental in many human and robotic tasks. While humans can learn and generalise arbitrarily long sequences of movements, and particularly can optimise them to fit the constraints and features of their body, robots are often programmed to execute point-to-point precise but fixed patterns. This study proposes a new approach to interpreting and reproducing articulated and complex trajectories as a set of known robot-based primitives. Instead of achieving accurate reproductions, the proposed approach aims at interpreting data in an agent-centred fashion, according to an agent's primitive movements. The method improves the accuracy of a reproduction with an incremental process that seeks first a rough approximation by capturing the most essential features of a demonstrated trajectory. Observing the discrepancy between the demonstrated and reproduced trajectories, the process then proceeds with incremental decompositions and new searches in sub-optimal parts of the trajectory. The aim is to achieve an agent-centred interpretation and progressive learning that fits in the first place the robots' capability, as opposed to a data-centred decomposition analysis. Tests on both geometric and human generated trajectories reveal that the use of own primitives results in remarkable robustness and generalisation properties of the method. In particular, because trajectories are understood and abstracted by means of agent-optimised primitives, the method has two main features: 1) Reproduced trajectories are general and represent an abstraction of the data. 2) The algorithm is capable of reconstructing highly noisy or corrupted data without pre-processing thanks to an implicit and emergent noise suppression and feature detection. This study suggests a novel bio-inspired approach to interpreting, learning and reproducing articulated movements and trajectories. Possible applications include drawing, writing, movement generation, object manipulation, and other tasks where the performance requires human-like interpretation and generalisation capabilities

    Conceptive Artificial Intelligence: Insights from design theory

    Full text link
    The current paper offers a perspective on what we term conceptive intelligence - the capacity of an agent to continuously think of new object definitions (tasks, problems, physical systems, etc.) and to look for methods to realize them. The framework, called a Brouwer machine, is inspired by previous research in design theory and modeling, with its roots in the constructivist mathematics of intuitionism. The dual constructivist perspective we describe offers the possibility to create novelty both in terms of the types of objects and the methods for constructing objects. More generally, the theoretical work on which Brouwer machines are based is called imaginative constructivism. Based on the framework and the theory, we discuss many paradigms and techniques omnipresent in AI research and their merits and shortcomings for modeling aspects of design, as described by imaginative constructivism. To demonstrate and explain the type of creative process expressed by the notion of a Brouwer machine, we compare this concept with a system using genetic algorithms for scientific law discovery

    Restarted Bayesian Online Change-point Detector achieves Optimal Detection Delay

    Get PDF
    International audienceIn this paper, we consider the problem of sequential change-point detection where both the changepoints and the distributions before and after the change are assumed to be unknown. For this problem of primary importance in statistical and sequential learning theory, we derive a variant of the Bayesian Online Change Point Detector proposed by (Fearnhead & Liu, 2007) which is easier to analyze than the original version while keeping its powerful message-passing algorithm. We provide a non-asymptotic analysis of the false-alarm rate and the detection delay that matches the existing lower-bound. We further provide the first explicit high-probability control of the detection delay for such approach. Experiments on synthetic and realworld data show that this proposal outperforms the state-of-art change-point detection strategy, namely the Improved Generalized Likelihood Ratio (Improved GLR) while compares favorably with the original Bayesian Online Change Point Detection strategy

    Entwicklung einer Skriptsprache zur interaktiven Robotersteuerung

    Get PDF
    In der vorliegenden Arbeit wird eine Skriptsprache zur interaktiven Robotersteuerung entwickelt. Sie ermöglicht die Steuerung des PR2 mittels eines Python-Kommandointerpreters. So kann die Planung von Roboterverhalten übernommen und auf unvorhergesehene Ereignisse eingegangen werden. Die Evaluierung der Nutzbarkeit der entwickelten Skriptsprache geschieht mittels einer Benutzerstudie. Diese kommt zu dem Ergebnis, dass der Roboter durch die Skriptsprache intuitiv steuerbar ist, jedoch auch noch Potential zur weiteren Entwicklung bietet

    Constructing skill trees for reinforcement learning agents from demonstration trajectories

    No full text
    We introduce CST, an algorithm for constructing skill trees from demonstration trajectories in continuous reinforcement learning domains. CST uses a changepoint detection method to segment each trajectory into a skill chain by detecting a change of appropriate abstraction, or that a segment is too complex to model as a single skill. The skill chains from each trajectory are then merged to form a skill tree. We demonstrate that CST constructs an appropriate skill tree that can be further refined through learning in a challenging continuous domain, and that it can be used to segment demonstration trajectories on a mobile manipulator into chains of skills where each skill is assigned an appropriate abstraction.
    corecore