632 research outputs found
Decomposing GR(1) Games with Singleton Liveness Guarantees for Efficient Synthesis
Temporal logic based synthesis approaches are often used to find trajectories
that are correct-by-construction for tasks in systems with complex behavior.
Some examples of such tasks include synchronization for multi-agent hybrid
systems, reactive motion planning for robots. However, the scalability of such
approaches is of concern and at times a bottleneck when transitioning from
theory to practice. In this paper, we identify a class of problems in the GR(1)
fragment of linear-time temporal logic (LTL) where the synthesis problem allows
for a decomposition that enables easy parallelization. This decomposition also
reduces the alternation depth, resulting in more efficient synthesis. A
multi-agent robot gridworld example with coordination tasks is presented to
demonstrate the application of the developed ideas and also to perform
empirical analysis for benchmarking the decomposition-based synthesis approach
Cooperative Task Planning of Multi-Agent Systems Under Timed Temporal Specifications
In this paper the problem of cooperative task planning of multi-agent systems
when timed constraints are imposed to the system is investigated. We consider
timed constraints given by Metric Interval Temporal Logic (MITL). We propose a
method for automatic control synthesis in a two-stage systematic procedure.
With this method we guarantee that all the agents satisfy their own individual
task specifications as well as that the team satisfies a team global task
specification.Comment: Submitted to American Control Conference 201
Certified Reinforcement Learning with Logic Guidance
This paper proposes the first model-free Reinforcement Learning (RL)
framework to synthesise policies for unknown, and continuous-state Markov
Decision Processes (MDPs), such that a given linear temporal property is
satisfied. We convert the given property into a Limit Deterministic Buchi
Automaton (LDBA), namely a finite-state machine expressing the property.
Exploiting the structure of the LDBA, we shape a synchronous reward function
on-the-fly, so that an RL algorithm can synthesise a policy resulting in traces
that probabilistically satisfy the linear temporal property. This probability
(certificate) is also calculated in parallel with policy learning when the
state space of the MDP is finite: as such, the RL algorithm produces a policy
that is certified with respect to the property. Under the assumption of finite
state space, theoretical guarantees are provided on the convergence of the RL
algorithm to an optimal policy, maximising the above probability. We also show
that our method produces ''best available'' control policies when the logical
property cannot be satisfied. In the general case of a continuous state space,
we propose a neural network architecture for RL and we empirically show that
the algorithm finds satisfying policies, if there exist such policies. The
performance of the proposed framework is evaluated via a set of numerical
examples and benchmarks, where we observe an improvement of one order of
magnitude in the number of iterations required for the policy synthesis,
compared to existing approaches whenever available.Comment: This article draws from arXiv:1801.08099, arXiv:1809.0782
- …