Search CORE

2,566 research outputs found

Learning Task Specifications from Demonstrations

Author: Ho Mark K.
Jha Susmit
Seshia Sanjit A.
Tiwari Ashish
Vazquez-Chanlatte Marcell
Publication venue
Publication date: 01/01/2018
Field of study

Real world applications often naturally decompose into several sub-tasks. In many settings (e.g., robotics) demonstrations provide a natural way to specify the sub-tasks. However, most methods for learning from demonstrations either do not provide guarantees that the artifacts learned for the sub-tasks can be safely recombined or limit the types of composition available. Motivated by this deficit, we consider the problem of inferring Boolean non-Markovian rewards (also known as logical trace properties or specifications) from demonstrations provided by an agent operating in an uncertain, stochastic environment. Crucially, specifications admit well-defined composition rules that are typically easy to interpret. In this paper, we formulate the specification inference task as a maximum a posteriori (MAP) probability inference problem, apply the principle of maximum entropy to derive an analytic demonstration likelihood model and give an efficient approach to search for the most likely specification in a large candidate pool of specifications. In our experiments, we demonstrate how learning specifications can help avoid common problems that often arise due to ad-hoc reward composition.Comment: NIPS 201

arXiv.org e-Print Archive

eScholarship - University of California

A nonparametric Bayesian approach toward robot learning by demonstration

Author: Antoniak
Argall
Argall
Billard
Billard
Billard
Billard
Bishop
Blackwell
Blei
Celeux
Chandler
Chatzis
Demiris
Dimitrios Korkinof
Ferguson
Ghahramani
Jordan
Leroux
Lopes
Lopes
Muller
Myersand
Neal
Pearlmutter
Qi
Rasmussen
Schwarz
Sethuraman
Skoglund
Sotirios P. Chatzis
Ude
Vapnik
Walker
Yiannis Demiris
Zegers
Publication venue: 'Elsevier BV'
Publication date: 01/06/2012
Field of study

In the past years, many authors have considered application of machine learning methodologies to effect robot learning by demonstration. Gaussian mixture regression (GMR) is one of the most successful methodologies used for this purpose. A major limitation of GMR models concerns automatic selection of the proper number of model states, i.e., the number of model component densities. Existing methods, including likelihood- or entropy-based criteria, usually tend to yield noisy model size estimates while imposing heavy computational requirements. Recently, Dirichlet process (infinite) mixture models have emerged in the cornerstone of nonparametric Bayesian statistics as promising candidates for clustering applications where the number of clusters is unknown a priori. Under this motivation, to resolve the aforementioned issues of GMR-based methods for robot learning by demonstration, in this paper we introduce a nonparametric Bayesian formulation for the GMR model, the Dirichlet process GMR model. We derive an efficient variational Bayesian inference algorithm for the proposed model, and we experimentally investigate its efficacy as a robot learning by demonstration methodology, considering a number of demanding robot learning by demonstration scenarios

Crossref

Ktisis

Spiral - Imperial College Digital Repository

Few-Shot Bayesian Imitation Learning with Logical Program Policies

Author: Allen Kelsey R.
Kaelbling Leslie Pack
Lew Alex K.
Silver Tom
Tenenbaum Josh
Publication venue
Publication date: 16/11/2019
Field of study

Humans can learn many novel tasks from a very small number (1--5) of demonstrations, in stark contrast to the data requirements of nearly tabula rasa deep learning methods. We propose an expressive class of policies, a strong but general prior, and a learning algorithm that, together, can learn interesting policies from very few examples. We represent policies as logical combinations of programs drawn from a domain-specific language (DSL), define a prior over policies with a probabilistic grammar, and derive an approximate Bayesian inference algorithm to learn policies from demonstrations. In experiments, we study five strategy games played on a 2D grid with one shared DSL. After a few demonstrations of each game, the inferred policies generalize to new game instances that differ substantially from the demonstrations. Our policy learning is 20--1,000x more data efficient than convolutional and fully convolutional policy learning and many orders of magnitude more computationally efficient than vanilla program induction. We argue that the proposed method is an apt choice for tasks that have scarce training data and feature significant, structured variation between task instances.Comment: AAAI 202

arXiv.org e-Print Archive

DSpace@MIT

Association for the Advancement of Artificial Intelligence: AAAI Publications

Maximum Causal Entropy Specification Inference from Demonstrations

Author: A Ignatiev
B Farwer
C De la Higuera
D Angluin
ET Jaynes
M Kwiatkowska
MJH Heule
RE Bryant
Publication venue
Publication date: 01/01/2020
Field of study

In many settings (e.g., robotics) demonstrations provide a natural way to specify tasks; however, most methods for learning from demonstrations either do not provide guarantees that the artifacts learned for the tasks, such as rewards or policies, can be safely composed and/or do not explicitly capture history dependencies. Motivated by this deficit, recent works have proposed learning Boolean task specifications, a class of Boolean non-Markovian rewards which admit well-defined composition and explicitly handle historical dependencies. This work continues this line of research by adapting maximum causal entropy inverse reinforcement learning to estimate the posteriori probability of a specification given a multi-set of demonstrations. The key algorithmic insight is to leverage the extensive literature and tooling on reduced ordered binary decision diagrams to efficiently encode a time unrolled Markov Decision Process. This enables transforming a naive exponential time algorithm into a polynomial time algorithm.Comment: Computer Aided Verification, 202

arXiv.org e-Print Archive

Crossref

eScholarship - University of California