7,139 research outputs found
Certified Reinforcement Learning with Logic Guidance
This paper proposes the first model-free Reinforcement Learning (RL)
framework to synthesise policies for unknown, and continuous-state Markov
Decision Processes (MDPs), such that a given linear temporal property is
satisfied. We convert the given property into a Limit Deterministic Buchi
Automaton (LDBA), namely a finite-state machine expressing the property.
Exploiting the structure of the LDBA, we shape a synchronous reward function
on-the-fly, so that an RL algorithm can synthesise a policy resulting in traces
that probabilistically satisfy the linear temporal property. This probability
(certificate) is also calculated in parallel with policy learning when the
state space of the MDP is finite: as such, the RL algorithm produces a policy
that is certified with respect to the property. Under the assumption of finite
state space, theoretical guarantees are provided on the convergence of the RL
algorithm to an optimal policy, maximising the above probability. We also show
that our method produces ''best available'' control policies when the logical
property cannot be satisfied. In the general case of a continuous state space,
we propose a neural network architecture for RL and we empirically show that
the algorithm finds satisfying policies, if there exist such policies. The
performance of the proposed framework is evaluated via a set of numerical
examples and benchmarks, where we observe an improvement of one order of
magnitude in the number of iterations required for the policy synthesis,
compared to existing approaches whenever available.Comment: This article draws from arXiv:1801.08099, arXiv:1809.0782
Learning Task Specifications from Demonstrations
Real world applications often naturally decompose into several sub-tasks. In
many settings (e.g., robotics) demonstrations provide a natural way to specify
the sub-tasks. However, most methods for learning from demonstrations either do
not provide guarantees that the artifacts learned for the sub-tasks can be
safely recombined or limit the types of composition available. Motivated by
this deficit, we consider the problem of inferring Boolean non-Markovian
rewards (also known as logical trace properties or specifications) from
demonstrations provided by an agent operating in an uncertain, stochastic
environment. Crucially, specifications admit well-defined composition rules
that are typically easy to interpret. In this paper, we formulate the
specification inference task as a maximum a posteriori (MAP) probability
inference problem, apply the principle of maximum entropy to derive an analytic
demonstration likelihood model and give an efficient approach to search for the
most likely specification in a large candidate pool of specifications. In our
experiments, we demonstrate how learning specifications can help avoid common
problems that often arise due to ad-hoc reward composition.Comment: NIPS 201
NNgTL: Neural Network Guided Optimal Temporal Logic Task Planning for Mobile Robots
In this work, we investigate task planning for mobile robots under linear
temporal logic (LTL) specifications. This problem is particularly challenging
when robots navigate in continuous workspaces due to the high computational
complexity involved. Sampling-based methods have emerged as a promising avenue
for addressing this challenge by incrementally constructing random trees,
thereby sidestepping the need to explicitly explore the entire state-space.
However, the performance of this sampling-based approach hinges crucially on
the chosen sampling strategy, and a well-informed heuristic can notably enhance
sample efficiency. In this work, we propose a novel neural-network guided
(NN-guided) sampling strategy tailored for LTL planning. Specifically, we
employ a multi-modal neural network capable of extracting features concurrently
from both the workspace and the B\"{u}chi automaton. This neural network
generates predictions that serve as guidance for random tree construction,
directing the sampling process toward more optimal directions. Through
numerical experiments, we compare our approach with existing methods and
demonstrate its superior efficiency, requiring less than 15% of the time of the
existing methods to find a feasible solution.Comment: submitte
- …