10 research outputs found
Certified Reinforcement Learning with Logic Guidance
This paper proposes the first model-free Reinforcement Learning (RL)
framework to synthesise policies for unknown, and continuous-state Markov
Decision Processes (MDPs), such that a given linear temporal property is
satisfied. We convert the given property into a Limit Deterministic Buchi
Automaton (LDBA), namely a finite-state machine expressing the property.
Exploiting the structure of the LDBA, we shape a synchronous reward function
on-the-fly, so that an RL algorithm can synthesise a policy resulting in traces
that probabilistically satisfy the linear temporal property. This probability
(certificate) is also calculated in parallel with policy learning when the
state space of the MDP is finite: as such, the RL algorithm produces a policy
that is certified with respect to the property. Under the assumption of finite
state space, theoretical guarantees are provided on the convergence of the RL
algorithm to an optimal policy, maximising the above probability. We also show
that our method produces ''best available'' control policies when the logical
property cannot be satisfied. In the general case of a continuous state space,
we propose a neural network architecture for RL and we empirically show that
the algorithm finds satisfying policies, if there exist such policies. The
performance of the proposed framework is evaluated via a set of numerical
examples and benchmarks, where we observe an improvement of one order of
magnitude in the number of iterations required for the policy synthesis,
compared to existing approaches whenever available.Comment: This article draws from arXiv:1801.08099, arXiv:1809.0782
Cautious Reinforcement Learning with Logical Constraints
This paper presents the concept of an adaptive safe padding that forces
Reinforcement Learning (RL) to synthesise optimal control policies while
ensuring safety during the learning process. Policies are synthesised to
satisfy a goal, expressed as a temporal logic formula, with maximal
probability. Enforcing the RL agent to stay safe during learning might limit
the exploration, however we show that the proposed architecture is able to
automatically handle the trade-off between efficient progress in exploration
(towards goal satisfaction) and ensuring safety. Theoretical guarantees are
available on the optimality of the synthesised policies and on the convergence
of the learning algorithm. Experimental results are provided to showcase the
performance of the proposed method.Comment: Accepted to AAMAS 2020. arXiv admin note: text overlap with
arXiv:1902.0077
DeepSynth: Automata Synthesis for Automatic Task Segmentation in Deep Reinforcement Learning
This paper proposes DeepSynth, a method for effective training of deep
Reinforcement Learning (RL) agents when the reward is sparse and non-Markovian,
but at the same time progress towards the reward requires achieving an unknown
sequence of high-level objectives. Our method employs a novel algorithm for
synthesis of compact automata to uncover this sequential structure
automatically. We synthesise a human-interpretable automaton from trace data
collected by exploring the environment. The state space of the environment is
then enriched with the synthesised automaton so that the generation of a
control policy by deep RL is guided by the discovered structure encoded in the
automaton. The proposed approach is able to cope with both high-dimensional,
low-level features and unknown sparse non-Markovian rewards. We have evaluated
DeepSynth's performance in a set of experiments that includes the Atari game
Montezuma's Revenge. Compared to existing approaches, we obtain a reduction of
two orders of magnitude in the number of iterations required for policy
synthesis, and also a significant improvement in scalability.Comment: Extended version of AAAI 2021 pape
Multi-agent Learning in Coverage Control Games
Multi-agent systems have found a variety of industrial applications from economics to robotics. With the increasing complexity of multi-agent systems, multi-agent control has become a challenging problem in many areas. While studying multi-agent systems is not identical to studying game theory, there is no doubt that game theory can be a key tool to manage such complex systems. Game theoretic multi-agent learning is one of relatively new solutions to the complex problem of multi-agent control. In such learning scheme, each agent eventually discovers a solution on his own. The main focus of this thesis is on enhancement of multi-agent learning in game theory and its application in multi-robot control. Each algorithm proposed in this thesis, relaxes and imposes different assumptions to fit a class of multi-robot learning problems. Numerical experiments are also conducted to verify each algorithm's robustness and performance.M.A.S
Towards verifiable and safe model-free reinforcement learning
Reinforcement Learning (RL) is a widely employed machine learning architecture
that has been applied to a variety of decision-making problems, from resource
management to robot locomotion, from recommendation systems to systems biology,
and from traffic control to superhuman-level gaming. However, RL has experienced
limited success beyond rigidly controlled or constrained applications, and successful
employment of RL in safety-critical scenarios is yet to be achieved. A principal
reason for this limitation is the lack of formal approaches to specify requirements
as tasks and learning constraints, and to provide guarantees with respect to these
requirements and constraints, during and after learning. This line of work addresses
these issues by proposing a general framework that leverages the success of RL in
learning high-performance controllers, while guaranteeing the satisfaction of given
requirements and guiding the learning process within safe configurations
Shielding atari games with bounded prescience
Deep reinforcement learning (DRL) is applied in safety-critical domains such
as robotics and autonomous driving. It achieves superhuman abilities in many
tasks, however whether DRL agents can be shown to act safely is an open
problem. Atari games are a simple yet challenging exemplar for evaluating the
safety of DRL agents and feature a diverse portfolio of game mechanics. The
safety of neural agents has been studied before using methods that either
require a model of the system dynamics or an abstraction; unfortunately, these
are unsuitable to Atari games because their low-level dynamics are complex and
hidden inside their emulator. We present the first exact method for analysing
and ensuring the safety of DRL agents for Atari games. Our method only requires
access to the emulator. First, we give a set of 43 properties that characterise
"safe behaviour" for 30 games. Second, we develop a method for exploring all
traces induced by an agent and a game and consider a variety of sources of game
non-determinism. We observe that the best available DRL agents reliably satisfy
only very few properties; several critical properties are violated by all
agents. Finally, we propose a countermeasure that combines a bounded
explicit-state exploration with shielding. We demonstrate that our method
improves the safety of all agents over multiple properties.Comment: To appear at AAMAS 202