3,248 research outputs found

    Safety-Aware Apprenticeship Learning

    Full text link
    Apprenticeship learning (AL) is a kind of Learning from Demonstration techniques where the reward function of a Markov Decision Process (MDP) is unknown to the learning agent and the agent has to derive a good policy by observing an expert's demonstrations. In this paper, we study the problem of how to make AL algorithms inherently safe while still meeting its learning objective. We consider a setting where the unknown reward function is assumed to be a linear combination of a set of state features, and the safety property is specified in Probabilistic Computation Tree Logic (PCTL). By embedding probabilistic model checking inside AL, we propose a novel counterexample-guided approach that can ensure safety while retaining performance of the learnt policy. We demonstrate the effectiveness of our approach on several challenging AL scenarios where safety is essential.Comment: Accepted by International Conference on Computer Aided Verification (CAV) 201

    Online Apprenticeship Learning

    Full text link
    In Apprenticeship Learning (AL), we are given a Markov Decision Process (MDP) without access to the cost function. Instead, we observe trajectories sampled by an expert that acts according to some policy. The goal is to find a policy that matches the expert's performance on some predefined set of cost functions. We introduce an online variant of AL (Online Apprenticeship Learning; OAL), where the agent is expected to perform comparably to the expert while interacting with the environment. We show that the OAL problem can be effectively solved by combining two mirror descent based no-regret algorithms: one for policy optimization and another for learning the worst case cost. To this end, we derive a convergent algorithm with O(K)O(\sqrt{K}) regret, where KK is the number of interactions with the MDP, and an additional linear error term that depends on the amount of expert trajectories available. Importantly, our algorithm avoids the need to solve an MDP at each iteration, making it more practical compared to prior AL methods. Finally, we implement a deep variant of our algorithm which shares some similarities to GAIL \cite{ho2016generative}, but where the discriminator is replaced with the costs learned by the OAL problem. Our simulations demonstrate our theoretically grounded approach outperforms the baselines

    Semi-Supervised Apprenticeship Learning

    Get PDF
    International audienceIn apprenticeship learning we aim to learn a good policy by observing the behavior of an expert or a set of experts. In particular, we consider the case where the expert acts so as to maximize an unknown reward function defined as a linear combination of a set of state features. In this paper, we consider the setting where we observe many sample trajectories (i.e., sequences of states) but only one or a few of them are labeled as experts' trajectories. We investigate the conditions under which the remaining unlabeled trajectories can help in learning a policy with a good performance. In particular, we define an extension to the max-margin inverse reinforcement learning proposed by Abbeel and Ng (2004) where, at each iteration, the max-margin optimization step is replaced by a semi-supervised optimization problem which favors classifiers separating clusters of trajectories. Finally, we report empirical results on two grid-world domains showing that the semi-supervised algorithm is able to output a better policy in fewer iterations than the related algorithm that does not take the unlabeled trajectories into account

    Safety-aware apprenticeship learning

    Full text link
    It is well acknowledged in the AI community that finding a good reward function for reinforcement learning is extremely challenging. Apprenticeship learning (AL) is a class of “learning from demonstration” techniques where the reward function of a Markov Decision Process (MDP) is unknown to the learning agent and the agent uses inverse reinforcement learning (IRL) methods to recover expert policy from a set of expert demonstrations. However, as the agent learns exclusively from observations, given a constraint on the probability of the agent running into unwanted situations, there is no verification, nor guarantee, for the learnt policy on the satisfaction of the restriction. In this dissertation, we study the problem of how to guide AL to learn a policy that is inherently safe while still meeting its learning objective. By combining formal methods with imitation learning, a Counterexample-Guided Apprenticeship Learning algorithm is proposed. We consider a setting where the unknown reward function is assumed to be a linear combination of a set of state features, and the safety property is specified in Probabilistic Computation Tree Logic (PCTL). By embedding probabilistic model checking inside AL, we propose a novel counterexample-guided approach that can ensure both safety and performance of the learnt policy. This algorithm guarantees that given some formal safety specification defined by probabilistic temporal logic, the learnt policy shall satisfy this specification. We demonstrate the effectiveness of our approach on several challenging AL scenarios where safety is essential

    Difference of Convex Functions Programming Applied to Control with Expert Data

    Get PDF
    This paper reports applications of Difference of Convex functions (DC) programming to Learning from Demonstrations (LfD) and Reinforcement Learning (RL) with expert data. This is made possible because the norm of the Optimal Bellman Residual (OBR), which is at the heart of many RL and LfD algorithms, is DC. Improvement in performance is demonstrated on two specific algorithms, namely Reward-regularized Classification for Apprenticeship Learning (RCAL) and Reinforcement Learning with Expert Demonstrations (RLED), through experiments on generic Markov Decision Processes (MDP), called Garnets

    A Sociocultural Approach to Recognition and Learning

    Get PDF
    This is a case study of goldsmith craft apprenticeship learning and recognition. The study includes 13 participants in a goldsmith's workshop. The theoretical approach to recognition and learning is inspired by sociocultural theory. In this article recognition is defined with reference to Hegel’s understanding of the concept as a transformed struggle of granting acknowledgement to another person plus receiving acknowledgement as a person. It is argued that the notion of recognition can enhance sociocultural notions of learning. In analysing the case study of apprenticeship learning, the article suggests that recognition is expressed in the act of participants staking their lives to prove their autonomy, in work activity in terms of the role of artefacts and in the form of abstract and concrete recognition. Finally recognition is discussed in relation to learning and development. The study concludes that recognition is an important category not only to explain apprenticeship learning but also to give a sociocultural explanation of learning in general

    Eggpreneur Enhanced Apprenticeship Learning Experience

    Get PDF
    Our recommendations present the feedback received from the Sisters throughout our interviews. A major component of this process was revising existing financial models to advance students’ financial literacy. Eggpreneur was able to adjust their playbook to better meet the needs of their students

    An apprenticeship learning hyper-heuristic for vehicle routing in HyFlex

    Get PDF
    Apprenticeship learning occurs via observations while an expert is in action. A hyper-heuristic is a search method or a learning mechanism that controls a set of low level heuristics or combines different heuristic components to generate heuristics for solving a given computationally hard problem. In this study, we investigate into a novel apprenticeship learning-based approach which is used to automatically generate a hyper-heuristic for vehicle routing. This approach itself can be considered as a hyper-heuristic which operates in a train and test fashion. A state-of-the-art hyper-heuristic is chosen as an expert which is the winner of a previous hyper-heuristic competition. Trained on small vehicle routing instances, the learning approach yields various classifiers, each capturing different actions that the expert hyper-heuristic performs during the search process. Those classifiers are then used to produce a hyper-heuristic which is potentially capable of generalizing the actions of the expert hyperheuristic while solving the unseen instances. The experimental results on vehicle routing using the Hyper-heuristic Flexible (HyFlex) framework shows that the apprenticeship-learning based hyper-heuristic delivers an outstanding performance when compared to the expert and some other previously proposed hyper-heuristics
    • …
    corecore