12,141 research outputs found
Correct-by-synthesis reinforcement learning with temporal logic constraints
We consider a problem on the synthesis of reactive controllers that optimize
some a priori unknown performance criterion while interacting with an
uncontrolled environment such that the system satisfies a given temporal logic
specification. We decouple the problem into two subproblems. First, we extract
a (maximally) permissive strategy for the system, which encodes multiple
(possibly all) ways in which the system can react to the adversarial
environment and satisfy the specifications. Then, we quantify the a priori
unknown performance criterion as a (still unknown) reward function and compute
an optimal strategy for the system within the operating envelope allowed by the
permissive strategy by using the so-called maximin-Q learning algorithm. We
establish both correctness (with respect to the temporal logic specifications)
and optimality (with respect to the a priori unknown performance criterion) of
this two-step technique for a fragment of temporal logic specifications. For
specifications beyond this fragment, correctness can still be preserved, but
the learned strategy may be sub-optimal. We present an algorithm to the overall
problem, and demonstrate its use and computational requirements on a set of
robot motion planning examples.Comment: 8 pages, 3 figures, 2 tables, submitted to IROS 201
Towards adaptive multi-robot systems: self-organization and self-adaptation
Dieser Beitrag ist mit Zustimmung des Rechteinhabers aufgrund einer (DFG geförderten) Allianz- bzw. Nationallizenz frei zugänglich.This publication is with permission of the rights owner freely accessible due to an Alliance licence and a national licence (funded by the DFG, German Research Foundation) respectively.The development of complex systems ensembles that operate in uncertain environments is a major challenge. The reason for this is that system designers are not able to fully specify the system during specification and development and before it is being deployed. Natural swarm systems enjoy similar characteristics, yet, being self-adaptive and being able to self-organize, these systems show beneficial emergent behaviour. Similar concepts can be extremely helpful for artificial systems, especially when it comes to multi-robot scenarios, which require such solution in order to be applicable to highly uncertain real world application. In this article, we present a comprehensive overview over state-of-the-art solutions in emergent systems, self-organization, self-adaptation, and robotics. We discuss these approaches in the light of a framework for multi-robot systems and identify similarities, differences missing links and open gaps that have to be addressed in order to make this framework possible
Certified Reinforcement Learning with Logic Guidance
This paper proposes the first model-free Reinforcement Learning (RL)
framework to synthesise policies for unknown, and continuous-state Markov
Decision Processes (MDPs), such that a given linear temporal property is
satisfied. We convert the given property into a Limit Deterministic Buchi
Automaton (LDBA), namely a finite-state machine expressing the property.
Exploiting the structure of the LDBA, we shape a synchronous reward function
on-the-fly, so that an RL algorithm can synthesise a policy resulting in traces
that probabilistically satisfy the linear temporal property. This probability
(certificate) is also calculated in parallel with policy learning when the
state space of the MDP is finite: as such, the RL algorithm produces a policy
that is certified with respect to the property. Under the assumption of finite
state space, theoretical guarantees are provided on the convergence of the RL
algorithm to an optimal policy, maximising the above probability. We also show
that our method produces ''best available'' control policies when the logical
property cannot be satisfied. In the general case of a continuous state space,
we propose a neural network architecture for RL and we empirically show that
the algorithm finds satisfying policies, if there exist such policies. The
performance of the proposed framework is evaluated via a set of numerical
examples and benchmarks, where we observe an improvement of one order of
magnitude in the number of iterations required for the policy synthesis,
compared to existing approaches whenever available.Comment: This article draws from arXiv:1801.08099, arXiv:1809.0782
SOTER: A Runtime Assurance Framework for Programming Safe Robotics Systems
The recent drive towards achieving greater autonomy and intelligence in
robotics has led to high levels of complexity. Autonomous robots increasingly
depend on third party off-the-shelf components and complex machine-learning
techniques. This trend makes it challenging to provide strong design-time
certification of correct operation.
To address these challenges, we present SOTER, a robotics programming
framework with two key components: (1) a programming language for implementing
and testing high-level reactive robotics software and (2) an integrated runtime
assurance (RTA) system that helps enable the use of uncertified components,
while still providing safety guarantees. SOTER provides language primitives to
declaratively construct a RTA module consisting of an advanced,
high-performance controller (uncertified), a safe, lower-performance controller
(certified), and the desired safety specification. The framework provides a
formal guarantee that a well-formed RTA module always satisfies the safety
specification, without completely sacrificing performance by using higher
performance uncertified components whenever safe. SOTER allows the complex
robotics software stack to be constructed as a composition of RTA modules,
where each uncertified component is protected using a RTA module.
To demonstrate the efficacy of our framework, we consider a real-world
case-study of building a safe drone surveillance system. Our experiments both
in simulation and on actual drones show that the SOTER-enabled RTA ensures the
safety of the system, including when untrusted third-party components have bugs
or deviate from the desired behavior
- …