38 research outputs found
Environment-Independent Task Specifications via GLTL
We propose a new task-specification language for Markov decision processes
that is designed to be an improvement over reward functions by being
environment independent. The language is a variant of Linear Temporal Logic
(LTL) that is extended to probabilistic specifications in a way that permits
approximations to be learned in finite time. We provide several small
environments that demonstrate the advantages of our geometric LTL (GLTL)
language and illustrate how it can be used to specify standard
reinforcement-learning tasks straightforwardly
Temporal Logic Guided Safe Reinforcement Learning Using Control Barrier Functions
Using reinforcement learning to learn control policies is a challenge when
the task is complex with potentially long horizons. Ensuring adequate but safe
exploration is also crucial for controlling physical systems. In this paper, we
use temporal logic to facilitate specification and learning of complex tasks.
We combine temporal logic with control Lyapunov functions to improve
exploration. We incorporate control barrier functions to safeguard the
exploration and deployment process. We develop a flexible and learnable system
that allows users to specify task objectives and constraints in different forms
and at various levels. The framework is also able to take advantage of known
system dynamics and handle unknown environmental dynamics by integrating
model-free learning with model-based planning
Learning Task Specifications from Demonstrations
Real world applications often naturally decompose into several sub-tasks. In
many settings (e.g., robotics) demonstrations provide a natural way to specify
the sub-tasks. However, most methods for learning from demonstrations either do
not provide guarantees that the artifacts learned for the sub-tasks can be
safely recombined or limit the types of composition available. Motivated by
this deficit, we consider the problem of inferring Boolean non-Markovian
rewards (also known as logical trace properties or specifications) from
demonstrations provided by an agent operating in an uncertain, stochastic
environment. Crucially, specifications admit well-defined composition rules
that are typically easy to interpret. In this paper, we formulate the
specification inference task as a maximum a posteriori (MAP) probability
inference problem, apply the principle of maximum entropy to derive an analytic
demonstration likelihood model and give an efficient approach to search for the
most likely specification in a large candidate pool of specifications. In our
experiments, we demonstrate how learning specifications can help avoid common
problems that often arise due to ad-hoc reward composition.Comment: NIPS 201
Learning to Compose Skills
We present a differentiable framework capable of learning a wide variety of
compositions of simple policies that we call skills. By recursively composing
skills with themselves, we can create hierarchies that display complex
behavior. Skill networks are trained to generate skill-state embeddings that
are provided as inputs to a trainable composition function, which in turn
outputs a policy for the overall task. Our experiments on an environment
consisting of multiple collect and evade tasks show that this architecture is
able to quickly build complex skills from simpler ones. Furthermore, the
learned composition function displays some transfer to unseen combinations of
skills, allowing for zero-shot generalizations.Comment: Presented at NIPS 2017 Deep RL Symposiu
Towards Sharing Task Environments to Support Reproducible Evaluations of Interactive Recommender Systems
Beyond sharing datasets or simulations, we believe the Recommender Systems
(RS) community should share Task Environments. In this work, we propose a
high-level logical architecture that will help to reason about the core
components of a RS Task Environment, identify the differences between
Environments, datasets and simulations; and most importantly, understand what
needs to be shared about Environments to achieve reproducible experiments. The
work presents itself as valuable initial groundwork, open to discussion and
extensions.Comment: Included in the Offline Evaluation for Recommender Systems Workshop
(REVEAL'19), collocated with ACM RecSys 2019. REVEAL'19, September 20th,
2019, Copenhagen, Denmar
Interactive Robot Training for Non-Markov Tasks
Defining sound and complete specifications for robots using formal languages
is challenging, while learning formal specifications directly from
demonstrations can lead to over-constrained task policies. In this paper, we
propose a Bayesian interactive robot training framework that allows the robot
to learn from both demonstrations provided by a teacher, and that teacher's
assessments of the robot's task executions. We also present an active learning
approach -- inspired by uncertainty sampling -- to identify the task execution
with the most uncertain degree of acceptability. Through a simulated
experiment, we demonstrate that our active learning approach identifies a
teacher's intended task specification with an equivalent or greater similarity
when compared to an approach that learns purely from demonstrations. Finally,
we demonstrate the efficacy of our approach in a real-world setting through a
user-study based on teaching a robot to set a dinner table
Noisy Symbolic Abstractions for Deep RL: A case study with Reward Machines
Natural and formal languages provide an effective mechanism for humans to
specify instructions and reward functions. We investigate how to generate
policies via RL when reward functions are specified in a symbolic language
captured by Reward Machines, an increasingly popular automaton-inspired
structure. We are interested in the case where the mapping of environment state
to a symbolic (here, Reward Machine) vocabulary -- commonly known as the
labelling function -- is uncertain from the perspective of the agent. We
formulate the problem of policy learning in Reward Machines with noisy symbolic
abstractions as a special class of POMDP optimization problem, and investigate
several methods to address the problem, building on existing and new
techniques, the latter focused on predicting Reward Machine state, rather than
on grounding of individual symbols. We analyze these methods and evaluate them
experimentally under varying degrees of uncertainty in the correct
interpretation of the symbolic vocabulary. We verify the strength of our
approach and the limitation of existing methods via an empirical investigation
on both illustrative, toy domains and partially observable, deep RL domains.Comment: NeurIPS Deep Reinforcement Learning Workshop 202
Embedding Symbolic Temporal Knowledge into Deep Sequential Models
Sequences and time-series often arise in robot tasks, e.g., in activity
recognition and imitation learning. In recent years, deep neural networks
(DNNs) have emerged as an effective data-driven methodology for processing
sequences given sufficient training data and compute resources. However, when
data is limited, simpler models such as logic/rule-based methods work
surprisingly well, especially when relevant prior knowledge is applied in their
construction. However, unlike DNNs, these "structured" models can be difficult
to extend, and do not work well with raw unstructured data. In this work, we
seek to learn flexible DNNs, yet leverage prior temporal knowledge when
available. Our approach is to embed symbolic knowledge expressed as linear
temporal logic (LTL) and use these embeddings to guide the training of deep
models. Specifically, we construct semantic-based embeddings of automata
generated from LTL formula via a Graph Neural Network. Experiments show that
these learnt embeddings can lead to improvements in downstream robot tasks such
as sequential action recognition and imitation learning
Supervised Bayesian Specification Inference from Demonstrations
When observing task demonstrations, human apprentices are able to identify
whether a given task is executed correctly long before they gain expertise in
actually performing that task. Prior research into learning from demonstrations
(LfD) has failed to capture this notion of the acceptability of a task's
execution; meanwhile, temporal logics provide a flexible language for
expressing task specifications. Inspired by this, we present Bayesian
specification inference, a probabilistic model for inferring task specification
as a temporal logic formula. We incorporate methods from probabilistic
programming to define our priors, along with a domain-independent likelihood
function to enable sampling-based inference. We demonstrate the efficacy of our
model for inferring specifications, with over 90% similarity observed between
the inferred specification and the ground truth, both within a synthetic domain
and during a real-world table setting task
Verifiable and Compositional Reinforcement Learning Systems
We propose a novel framework for verifiable and compositional reinforcement
learning (RL) in which a collection of RL sub-systems, each of which learns to
accomplish a separate sub-task, are composed to achieve an overall task. The
framework consists of a high-level model, represented as a parametric Markov
decision process (pMDP) which is used to plan and to analyze compositions of
sub-systems, and of the collection of low-level sub-systems themselves. By
defining interfaces between the sub-systems, the framework enables automatic
decompositons of task specifications, e.g., reach a target set of states with a
probability of at least 0.95, into individual sub-task specifications, i.e.
achieve the sub-system's exit conditions with at least some minimum
probability, given that its entry conditions are met. This in turn allows for
the independent training and testing of the sub-systems; if they each learn a
policy satisfying the appropriate sub-task specification, then their
composition is guaranteed to satisfy the overall task specification.
Conversely, if the sub-task specifications cannot all be satisfied by the
learned policies, we present a method, formulated as the problem of finding an
optimal set of parameters in the pMDP, to automatically update the sub-task
specifications to account for the observed shortcomings. The result is an
iterative procedure for defining sub-task specifications, and for training the
sub-systems to meet them. As an additional benefit, this procedure allows for
particularly challenging or important components of an overall task to be
determined automatically, and focused on, during training. Experimental results
demonstrate the presented framework's novel capabilities