580 research outputs found
Discovering logical knowledge in non-symbolic domains
Deep learning and symbolic artificial intelligence remain the two main paradigms in Artificial Intelligence (AI), each presenting their own strengths and weaknesses. Artificial agents should integrate both of these aspects of AI in order to show general intelligence and solve complex problems in real-world scenarios; similarly to how humans use both the analytical left side and the intuitive right side of their brain in their lives. However, one of the main obstacles hindering this integration is the Symbol Grounding Problem [144], which is the capacity to map physical world observations to a set of symbols. In this thesis, we combine symbolic reasoning and deep learning in order to better represent and reason with abstract knowledge. In particular, we focus on solving non-symbolic-state Reinforcement Learning environments using a symbolic logical domain. We consider different configurations: (i) unknown knowledge of both the symbol grounding function and the symbolic logical domain, (ii) unknown knowledge of the symbol grounding function and prior knowledge of the domain, (iii) imperfect knowledge of the symbols grounding function and unknown knowledge of the domain. We develop algorithms and neural network architectures that are general enough to be applied to different kinds of environments, which we test on both continuous-state control problems and image-based environments. Specifically, we develop two kinds of architectures: one for Markovian RL tasks and one for non-Markovian RL domains. The first is based on model-based RL and representation learning, and is inspired by the substantial prior work in state abstraction for RL [115]. The second is mainly based on recurrent neural networks and continuous relaxations of temporal logic domains. In particular, the first approach extracts a symbolic STRIPS-like abstraction for control problems. For the second approach, we explore connections between recurrent neural networks and finite state machines, and we define Visual Reward Machines, an extension to non-symbolic domains of Reward Machines [27], which are a popular approach to non-Markovian RL tasks
Robust satisfaction of temporal logic specifications via reinforcement learning
We consider the problem of steering a system with unknown, stochastic dynamics to satisfy a rich, temporally-layered task given as a signal temporal logic formula. We represent the system as a finite-memory Markov decision process with unknown transition probabilities and whose states are built from a partition of the state space. We present provably convergent reinforcement learning algorithms to maximize the probability of satisfying a given specification and to maximize the average expected robustness, i.e. a measure of how strongly the formula is satisfied. Robustness allows us to quantify progress towards satisfying a given specification. We demonstrate via a pair of robot navigation simulation case studies that, due to the quantification of progress towards satisfaction, reinforcement learning with robustness maximization performs better than probability maximization in terms of both probability of satisfaction and expected robustness with a low number of training examples
Robust Satisfaction of Temporal Logic Specifications via Reinforcement Learning
We consider the problem of steering a system with unknown, stochastic
dynamics to satisfy a rich, temporally layered task given as a signal temporal
logic formula. We represent the system as a Markov decision process in which
the states are built from a partition of the state space and the transition
probabilities are unknown. We present provably convergent reinforcement
learning algorithms to maximize the probability of satisfying a given formula
and to maximize the average expected robustness, i.e., a measure of how
strongly the formula is satisfied. We demonstrate via a pair of robot
navigation simulation case studies that reinforcement learning with robustness
maximization performs better than probability maximization in terms of both
probability of satisfaction and expected robustness.Comment: 8 pages, 4 figure
LTL Control in Uncertain Environments with Probabilistic Satisfaction Guarantees
We present a method to generate a robot control strategy that maximizes the
probability to accomplish a task. The task is given as a Linear Temporal Logic
(LTL) formula over a set of properties that can be satisfied at the regions of
a partitioned environment. We assume that the probabilities with which the
properties are satisfied at the regions are known, and the robot can determine
the truth value of a proposition only at the current region. Motivated by
several results on partitioned-based abstractions, we assume that the motion is
performed on a graph. To account for noisy sensors and actuators, we assume
that a control action enables several transitions with known probabilities. We
show that this problem can be reduced to the problem of generating a control
policy for a Markov Decision Process (MDP) such that the probability of
satisfying an LTL formula over its states is maximized. We provide a complete
solution for the latter problem that builds on existing results from
probabilistic model checking. We include an illustrative case study.Comment: Technical Report accompanying IFAC 201
Noisy Symbolic Abstractions for Deep RL: A case study with Reward Machines
Natural and formal languages provide an effective mechanism for humans to
specify instructions and reward functions. We investigate how to generate
policies via RL when reward functions are specified in a symbolic language
captured by Reward Machines, an increasingly popular automaton-inspired
structure. We are interested in the case where the mapping of environment state
to a symbolic (here, Reward Machine) vocabulary -- commonly known as the
labelling function -- is uncertain from the perspective of the agent. We
formulate the problem of policy learning in Reward Machines with noisy symbolic
abstractions as a special class of POMDP optimization problem, and investigate
several methods to address the problem, building on existing and new
techniques, the latter focused on predicting Reward Machine state, rather than
on grounding of individual symbols. We analyze these methods and evaluate them
experimentally under varying degrees of uncertainty in the correct
interpretation of the symbolic vocabulary. We verify the strength of our
approach and the limitation of existing methods via an empirical investigation
on both illustrative, toy domains and partially observable, deep RL domains.Comment: NeurIPS Deep Reinforcement Learning Workshop 202
Formal Controller Synthesis for Markov Jump Linear Systems with Uncertain Dynamics
Automated synthesis of provably correct controllers for cyber-physical
systems is crucial for deployment in safety-critical scenarios. However, hybrid
features and stochastic or unknown behaviours make this problem challenging. We
propose a method for synthesising controllers for Markov jump linear systems
(MJLSs), a class of discrete-time models for cyber-physical systems, so that
they certifiably satisfy probabilistic computation tree logic (PCTL) formulae.
An MJLS consists of a finite set of stochastic linear dynamics and discrete
jumps between these dynamics that are governed by a Markov decision process
(MDP). We consider the cases where the transition probabilities of this MDP are
either known up to an interval or completely unknown. Our approach is based on
a finite-state abstraction that captures both the discrete (mode-jumping) and
continuous (stochastic linear) behaviour of the MJLS. We formalise this
abstraction as an interval MDP (iMDP) for which we compute intervals of
transition probabilities using sampling techniques from the so-called 'scenario
approach', resulting in a probabilistically sound approximation. We apply our
method to multiple realistic benchmark problems, in particular, a temperature
control and an aerial vehicle delivery problem.Comment: 14 pages, 6 figures, under review at QES
Process mining meets model learning: Discovering deterministic finite state automata from event logs for business process analysis
Within the process mining field, Deterministic Finite State Automata (DFAs) are largely employed as foundation mechanisms to perform formal reasoning tasks over the information contained in the event logs, such as conformance checking, compliance monitoring and cross-organization process analysis, just to name a few. To support the above use cases, in this paper, we investigate how to leverage Model Learning (ML) algorithms for the automated discovery of DFAs from event logs. DFAs can be used as a fundamental building block to support not only the development of process analysis techniques, but also the implementation of instruments to support other phases of the Business Process Management (BPM) lifecycle such as business process design and enactment. The quality of the discovered DFAs is assessed wrt customized definitions of fitness, precision, generalization, and a standard notion of DFA simplicity. Finally, we use these metrics to benchmark ML algorithms against real-life and synthetically generated datasets, with the aim of studying their performance and investigate their suitability to be used for the development of BPM tools
- …