Search CORE

3,248 research outputs found

Safety-Aware Apprenticeship Learning

Author: A Solar-Lezama
H Hansson
M Kwiatkowska
M Kwiatkowska
M Kwiatkowska
R Bellman
R Wimmer
S Jha
S Junges
T Han
Y-J Kuo
Publication venue
Publication date: 28/04/2018
Field of study

Apprenticeship learning (AL) is a kind of Learning from Demonstration techniques where the reward function of a Markov Decision Process (MDP) is unknown to the learning agent and the agent has to derive a good policy by observing an expert's demonstrations. In this paper, we study the problem of how to make AL algorithms inherently safe while still meeting its learning objective. We consider a setting where the unknown reward function is assumed to be a linear combination of a set of state features, and the safety property is specified in Probabilistic Computation Tree Logic (PCTL). By embedding probabilistic model checking inside AL, we propose a novel counterexample-guided approach that can ensure safety while retaining performance of the learnt policy. We demonstrate the effectiveness of our approach on several challenging AL scenarios where safety is essential.Comment: Accepted by International Conference on Computer Aided Verification (CAV) 201

arXiv.org e-Print Archive

Crossref

Online Apprenticeship Learning

Author: Mannor Shie
Shani Lior
Zahavy Tom
Publication venue
Publication date: 13/02/2021
Field of study

In Apprenticeship Learning (AL), we are given a Markov Decision Process (MDP) without access to the cost function. Instead, we observe trajectories sampled by an expert that acts according to some policy. The goal is to find a policy that matches the expert's performance on some predefined set of cost functions. We introduce an online variant of AL (Online Apprenticeship Learning; OAL), where the agent is expected to perform comparably to the expert while interacting with the environment. We show that the OAL problem can be effectively solved by combining two mirror descent based no-regret algorithms: one for policy optimization and another for learning the worst case cost. To this end, we derive a convergent algorithm with

O(\sqrt{K})

regret, where

K

is the number of interactions with the MDP, and an additional linear error term that depends on the amount of expert trajectories available. Importantly, our algorithm avoids the need to solve an MDP at each iteration, making it more practical compared to prior AL methods. Finally, we implement a deep variant of our algorithm which shares some similarities to GAIL \cite{ho2016generative}, but where the discriminator is replaced with the costs learned by the OAL problem. Our simulations demonstrate our theoretically grounded approach outperforms the baselines

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Semi-Supervised Apprenticeship Learning

Author: Ghavamzadeh Mohammad
Lazaric Alessandro
Valko Michal
Publication venue: Sparc
Publication date: 30/06/2012
Field of study

International audienceIn apprenticeship learning we aim to learn a good policy by observing the behavior of an expert or a set of experts. In particular, we consider the case where the expert acts so as to maximize an unknown reward function defined as a linear combination of a set of state features. In this paper, we consider the setting where we observe many sample trajectories (i.e., sequences of states) but only one or a few of them are labeled as experts' trajectories. We investigate the conditions under which the remaining unlabeled trajectories can help in learning a policy with a good performance. In particular, we define an extension to the max-margin inverse reinforcement learning proposed by Abbeel and Ng (2004) where, at each iteration, the max-margin optimization step is replaced by a semi-supervised optimization problem which favors classifiers separating clusters of trajectories. Finally, we report empirical results on two grid-world domains showing that the semi-supervised algorithm is able to output a better policy in fewer iterations than the related algorithm that does not take the unlabeled trajectories into account

HAL - Lille 3

INRIA a CCSD electronic archive server

Safety-aware apprenticeship learning

Author: Zhou Weichao
Publication venue
Publication date: 03/07/2018
Field of study

It is well acknowledged in the AI community that finding a good reward function for reinforcement learning is extremely challenging. Apprenticeship learning (AL) is a class of “learning from demonstration” techniques where the reward function of a Markov Decision Process (MDP) is unknown to the learning agent and the agent uses inverse reinforcement learning (IRL) methods to recover expert policy from a set of expert demonstrations. However, as the agent learns exclusively from observations, given a constraint on the probability of the agent running into unwanted situations, there is no verification, nor guarantee, for the learnt policy on the satisfaction of the restriction. In this dissertation, we study the problem of how to guide AL to learn a policy that is inherently safe while still meeting its learning objective. By combining formal methods with imitation learning, a Counterexample-Guided Apprenticeship Learning algorithm is proposed. We consider a setting where the unknown reward function is assumed to be a linear combination of a set of state features, and the safety property is specified in Probabilistic Computation Tree Logic (PCTL). By embedding probabilistic model checking inside AL, we propose a novel counterexample-guided approach that can ensure both safety and performance of the learnt policy. This algorithm guarantees that given some formal safety specification defined by probabilistic temporal logic, the learnt policy shall satisfy this specification. We demonstrate the effectiveness of our approach on several challenging AL scenarios where safety is essential

Boston University Institutional Repository (OpenBU)

Difference of Convex Functions Programming Applied to Control with Expert Data

Author: Geist Matthieu
Pietquin Olivier
Piot Bilal
Publication venue
Publication date: 05/09/2016
Field of study

This paper reports applications of Difference of Convex functions (DC) programming to Learning from Demonstrations (LfD) and Reinforcement Learning (RL) with expert data. This is made possible because the norm of the Optimal Bellman Residual (OBR), which is at the heart of many RL and LfD algorithms, is DC. Improvement in performance is demonstrated on two specific algorithms, namely Reward-regularized Classification for Apprenticeship Learning (RCAL) and Reinforcement Learning with Expert Demonstrations (RLED), through experiments on generic Markov Decision Processes (MDP), called Garnets

arXiv.org e-Print Archive

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

HAL-Rennes 1

A Sociocultural Approach to Recognition and Learning

Author: Musaeus Peter
Publication venue: The Outlines Association
Publication date: 01/04/2006
Field of study

This is a case study of goldsmith craft apprenticeship learning and recognition. The study includes 13 participants in a goldsmith's workshop. The theoretical approach to recognition and learning is inspired by sociocultural theory. In this article recognition is defined with reference to Hegel’s understanding of the concept as a transformed struggle of granting acknowledgement to another person plus receiving acknowledgement as a person. It is argued that the notion of recognition can enhance sociocultural notions of learning. In analysing the case study of apprenticeship learning, the article suggests that recognition is expressed in the act of participants staking their lives to prove their autonomy, in work activity in terms of the role of artefacts and in the form of abstract and concrete recognition. Finally recognition is discussed in relation to learning and development. The study concludes that recognition is an important category not only to explain apprenticeship learning but also to give a sociocultural explanation of learning in general

Directory of Open Access Journals

Tidsskrift.dk (Det Kongelige Bibliotek)

Eggpreneur Enhanced Apprenticeship Learning Experience

Author: Boylan Marissa
Malcolm Erin
Publication venue: Scholar Commons
Publication date: 01/01/2020
Field of study

Our recommendations present the feedback received from the Sisters throughout our interviews. A major component of this process was revising existing financial models to advance students’ financial literacy. Eggpreneur was able to adjust their playbook to better meet the needs of their students

Scholar Commons - Santa Clara University

An apprenticeship learning hyper-heuristic for vehicle routing in HyFlex

Author: Asta Shahriar
Özcan Ender
Publication venue
Publication date: 01/12/2014
Field of study

Apprenticeship learning occurs via observations while an expert is in action. A hyper-heuristic is a search method or a learning mechanism that controls a set of low level heuristics or combines different heuristic components to generate heuristics for solving a given computationally hard problem. In this study, we investigate into a novel apprenticeship learning-based approach which is used to automatically generate a hyper-heuristic for vehicle routing. This approach itself can be considered as a hyper-heuristic which operates in a train and test fashion. A state-of-the-art hyper-heuristic is chosen as an expert which is the winner of a previous hyper-heuristic competition. Trained on small vehicle routing instances, the learning approach yields various classifiers, each capturing different actions that the expert hyper-heuristic performs during the search process. Those classifiers are then used to produce a hyper-heuristic which is potentially capable of generalizing the actions of the expert hyperheuristic while solving the unseen instances. The experimental results on vehicle routing using the Hyper-heuristic Flexible (HyFlex) framework shows that the apprenticeship-learning based hyper-heuristic delivers an outstanding performance when compared to the expert and some other previously proposed hyper-heuristics

Nottingham ePrints

Nottingham eTheses

Crossref