239 research outputs found
A robust genetic algorithm for learning temporal specifications from data
We consider the problem of mining signal temporal logical requirements from a dataset of regular (good) and anomalous (bad) trajectories of a dynamical system. We assume the training set to be labeled by human experts and that we have access only to a limited amount of data, typically noisy. We provide a systematic approach to synthesize both the syntactical structure and the parameters of the temporal logic formula using a two-steps procedure: first, we leverage a novel evolutionary algorithm for learning the structure of the formula; second, we perform the parameter synthesis operating on the statistical emulation of the average robustness for a candidate formula w.r.t. its parameters. We compare our results with our previous work [9] and with a recently proposed decision-tree [8] based method. We present experimental results on two case studies: an anomalous trajectory detection problem of a naval surveillance system and the characterization of an Ineffective Respiratory effort, showing the usefulness of our work
Active Finite Reward Automaton Inference and Reinforcement Learning Using Queries and Counterexamples
Despite the fact that deep reinforcement learning (RL) has surpassed
human-level performances in various tasks, it still has several fundamental
challenges. First, most RL methods require intensive data from the exploration
of the environment to achieve satisfactory performance. Second, the use of
neural networks in RL renders it hard to interpret the internals of the system
in a way that humans can understand. To address these two challenges, we
propose a framework that enables an RL agent to reason over its exploration
process and distill high-level knowledge for effectively guiding its future
explorations. Specifically, we propose a novel RL algorithm that learns
high-level knowledge in the form of a finite reward automaton by using the L*
learning algorithm. We prove that in episodic RL, a finite reward automaton can
express any non-Markovian bounded reward functions with finitely many reward
values and approximate any non-Markovian bounded reward function (with
infinitely many reward values) with arbitrary precision. We also provide a
lower bound for the episode length such that the proposed RL approach almost
surely converges to an optimal policy in the limit. We test this approach on
two RL environments with non-Markovian reward functions, choosing a variety of
tasks with increasing complexity for each environment. We compare our algorithm
with the state-of-the-art RL algorithms for non-Markovian reward functions,
such as Joint Inference of Reward machines and Policies for RL (JIRP), Learning
Reward Machine (LRM), and Proximal Policy Optimization (PPO2). Our results show
that our algorithm converges to an optimal policy faster than other baseline
methods
- …