379 research outputs found
Learning Task Specifications from Demonstrations
Real world applications often naturally decompose into several sub-tasks. In
many settings (e.g., robotics) demonstrations provide a natural way to specify
the sub-tasks. However, most methods for learning from demonstrations either do
not provide guarantees that the artifacts learned for the sub-tasks can be
safely recombined or limit the types of composition available. Motivated by
this deficit, we consider the problem of inferring Boolean non-Markovian
rewards (also known as logical trace properties or specifications) from
demonstrations provided by an agent operating in an uncertain, stochastic
environment. Crucially, specifications admit well-defined composition rules
that are typically easy to interpret. In this paper, we formulate the
specification inference task as a maximum a posteriori (MAP) probability
inference problem, apply the principle of maximum entropy to derive an analytic
demonstration likelihood model and give an efficient approach to search for the
most likely specification in a large candidate pool of specifications. In our
experiments, we demonstrate how learning specifications can help avoid common
problems that often arise due to ad-hoc reward composition.Comment: NIPS 201
STL: Surprisingly Tricky Logic (for System Validation)
Much of the recent work developing formal methods techniques to specify or
learn the behavior of autonomous systems is predicated on a belief that formal
specifications are interpretable and useful for humans when checking systems.
Though frequently asserted, this assumption is rarely tested. We performed a
human experiment (N = 62) with a mix of people who were and were not familiar
with formal methods beforehand, asking them to validate whether a set of signal
temporal logic (STL) constraints would keep an agent out of harm and allow it
to complete a task in a gridworld capture-the-flag setting. Validation accuracy
was (mean standard deviation). The ground-truth validity
of a specification, subjects' familiarity with formal methods, and subjects'
level of education were found to be significant factors in determining
validation correctness. Participants exhibited an affirmation bias, causing
significantly increased accuracy on valid specifications, but significantly
decreased accuracy on invalid specifications. Additionally, participants,
particularly those familiar with formal methods, tended to be overconfident in
their answers, and be similarly confident regardless of actual correctness.
Our data do not support the belief that formal specifications are inherently
human-interpretable to a meaningful degree for system validation. We recommend
ergonomic improvements to data presentation and validation training, which
should be tested before claims of interpretability make their way back into the
formal methods literature
Learning Interpretable Temporal Properties from Positive Examples Only
We consider the problem of explaining the temporal behavior of black-boxsystems using human-interpretable models. To this end, based on recent researchtrends, we rely on the fundamental yet interpretable models of deterministicfinite automata (DFAs) and linear temporal logic (LTL) formulas. In contrast tomost existing works for learning DFAs and LTL formulas, we rely on onlypositive examples. Our motivation is that negative examples are generallydifficult to observe, in particular, from black-box systems. To learnmeaningful models from positive examples only, we design algorithms that relyon conciseness and language minimality of models as regularizers. To this end,our algorithms adopt two approaches: a symbolic and a counterexample-guidedone. While the symbolic approach exploits an efficient encoding of languageminimality as a constraint satisfaction problem, the counterexample-guided onerelies on generating suitable negative examples to prune the search. Both theapproaches provide us with effective algorithms with theoretical guarantees onthe learned models. To assess the effectiveness of our algorithms, we evaluateall of them on synthetic data.<br
Learning Probabilistic Temporal Safety Properties from Examples in Relational Domains
We propose a framework for learning a fragment of probabilistic computation
tree logic (pCTL) formulae from a set of states that are labeled as safe or
unsafe. We work in a relational setting and combine ideas from relational
Markov Decision Processes with pCTL model-checking. More specifically, we
assume that there is an unknown relational pCTL target formula that is
satisfied by only safe states, and has a horizon of maximum steps and a
threshold probability . The task then consists of learning this unknown
formula from states that are labeled as safe or unsafe by a domain expert. We
apply principles of relational learning to induce a pCTL formula that is
satisfied by all safe states and none of the unsafe ones. This formula can then
be used as a safety specification for this domain, so that the system can avoid
getting into dangerous situations in future. Following relational learning
principles, we introduce a candidate formula generation process, as well as a
method for deciding which candidate formula is a satisfactory specification for
the given labeled states. The cases where the expert knows and does not know
the system policy are treated, however, much of the learning process is the
same for both cases. We evaluate our approach on a synthetic relational domain.Comment: 25 pages, 3 figures, 5 tables, 2 algorithms, preprin
Generating Multi-Agent Trajectories using Programmatic Weak Supervision
We study the problem of training sequential generative models for capturing
coordinated multi-agent trajectory behavior, such as offensive basketball
gameplay. When modeling such settings, it is often beneficial to design
hierarchical models that can capture long-term coordination using intermediate
variables. Furthermore, these intermediate variables should capture interesting
high-level behavioral semantics in an interpretable and manipulatable way. We
present a hierarchical framework that can effectively learn such sequential
generative models. Our approach is inspired by recent work on leveraging
programmatically produced weak labels, which we extend to the spatiotemporal
regime. In addition to synthetic settings, we show how to instantiate our
framework to effectively model complex interactions between basketball players
and generate realistic multi-agent trajectories of basketball gameplay over
long time periods. We validate our approach using both quantitative and
qualitative evaluations, including a user study comparison conducted with
professional sports analysts
- …