Search CORE

118 research outputs found

A PAC Learning Algorithm for LTL and Omega-regular Objectives in MDPs

Author: Perez Mateo
Somenzi Fabio
Trivedi Ashutosh
Publication venue
Publication date: 18/10/2023
Field of study

Linear temporal logic (LTL) and omega-regular objectives -- a superset of LTL -- have seen recent use as a way to express non-Markovian objectives in reinforcement learning. We introduce a model-based probably approximately correct (PAC) learning algorithm for omega-regular objectives in Markov decision processes. Unlike prior approaches, our algorithm learns from sampled trajectories of the system and does not require prior knowledge of the system's topology

arXiv.org e-Print Archive

Formal Controller Synthesis for Continuous-Space MDPs via Model-Free Reinforcement Learning

Author: Lavaei Abolfazl
Somenzi Fabio
Soudjani Sadegh
Trivedi Ashutosh
Zamani Majid
Publication venue
Publication date: 02/03/2020
Field of study

A novel reinforcement learning scheme to synthesize policies for continuous-space Markov decision processes (MDPs) is proposed. This scheme enables one to apply model-free, off-the-shelf reinforcement learning algorithms for finite MDPs to compute optimal strategies for the corresponding continuous-space MDPs without explicitly constructing the finite-state abstraction. The proposed approach is based on abstracting the system with a finite MDP (without constructing it explicitly) with unknown transition probabilities, synthesizing strategies over the abstract MDP, and then mapping the results back over the concrete continuous-space MDP with approximate optimality guarantees. The properties of interest for the system belong to a fragment of linear temporal logic, known as syntactically co-safe linear temporal logic (scLTL), and the synthesis requirement is to maximize the probability of satisfaction within a given bounded time horizon. A key contribution of the paper is to leverage the classical convergence results for reinforcement learning on finite MDPs and provide control strategies maximizing the probability of satisfaction over unknown, continuous-space MDPs while providing probabilistic closeness guarantees. Automata-based reward functions are often sparse; we present a novel potential-based reward shaping technique to produce dense rewards to speed up learning. The effectiveness of the proposed approach is demonstrated by applying it to three physical benchmarks concerning the regulation of a room's temperature, control of a road traffic cell, and of a 7-dimensional nonlinear model of a BMW 320i car.Comment: This work is accepted at the 11th ACM/IEEE Conference on Cyber-Physical Systems (ICCPS

arXiv.org e-Print Archive

Crossref

Policy Synthesis and Reinforcement Learning for Discounted LTL

Author: Alur Rajeev
Bastani Osbert
Jothimurugan Kishor
Perez Mateo
Somenzi Fabio
Trivedi Ashutosh
Publication venue
Publication date: 29/05/2023
Field of study

The difficulty of manually specifying reward functions has led to an interest in using linear temporal logic (LTL) to express objectives for reinforcement learning (RL). However, LTL has the downside that it is sensitive to small perturbations in the transition probabilities, which prevents probably approximately correct (PAC) learning without additional assumptions. Time discounting provides a way of removing this sensitivity, while retaining the high expressivity of the logic. We study the use of discounted LTL for policy synthesis in Markov decision processes with unknown transition probabilities, and show how to reduce discounted LTL to discounted-sum reward via a reward machine when all discount factors are identical

arXiv.org e-Print Archive

Alternating Good-for-MDPs Automata

Author: Hahn Ernst Moritz
Perez Mateo
Schewe Sven
Somenzi Fabio
Trivedi Ashutosh
Wojtczak Dominik
Publication venue: Springer Nature
Publication date: 21/10/2022
Field of study

University of Twente Research Information

Omega-Regular Reward Machines

Author: Hahn Ernst Moritz
Perez Mateo
Schewe Sven
Somenzi Fabio
Trivedi Ashutosh
Wojtczak Dominik
Publication venue
Publication date: 14/08/2023
Field of study

Reinforcement learning (RL) is a powerful approach for training agents to perform tasks, but designing an appropriate reward mechanism is critical to its success. However, in many cases, the complexity of the learning objectives goes beyond the capabilities of the Markovian assumption, necessitating a more sophisticated reward mechanism. Reward machines and omega-regular languages are two formalisms used to express non-Markovian rewards for quantitative and qualitative objectives, respectively. This paper introduces omega-regular reward machines, which integrate reward machines with omega-regular languages to enable an expressive and effective reward mechanism for RL. We present a model-free RL algorithm to compute epsilon-optimal strategies against omega-egular reward machines and evaluate the effectiveness of the proposed algorithm through experiments.Comment: To appear in ECAI-202

arXiv.org e-Print Archive

Good-for-MDPs Automata for Probabilistic Analysis and Reinforcement Learning

Author: Hahn Ernst Moritz
Perez Mateo
Schewe Sven
Somenzi Fabio
Trivedi Ashutosh
Wojtczak Dominik
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 30/10/2019
Field of study

We characterize the class of nondeterministic

{\omega}

-automata that can be used for the analysis of finite Markov decision processes (MDPs). We call these automata `good-for-MDPs' (GFM). We show that GFM automata are closed under classic simulation as well as under more powerful simulation relations that leverage properties of optimal control strategies for MDPs. This closure enables us to exploit state-space reduction techniques, such as those based on direct and delayed simulation, that guarantee simulation equivalence. We demonstrate the promise of GFM automata by defining a new class of automata with favorable properties - they are B\"uchi automata with low branching degree obtained through a simple construction - and show that going beyond limit-deterministic automata may significantly benefit reinforcement learning

arXiv.org e-Print Archive

University of Liverpool Repository

Crossref

Model-Free Reinforcement Learning for Lexicographic Omega-Regular Objectives

Author: Hahn Ernst-Moritz
Perez Mateo
Schewe Sven
Somenzi Fabio
Trivedi Ashutosh
Wojtczak Dominik
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/11/2021
Field of study

University of Liverpool Repository

University of Twente Research Information

Runs of homozygosity in the Italian goat breeds: impact of management practices in low‑input systems

Author: Alessio Negro
Andrea Talenti
Arianna Bionda
Donata Marletta
Elena Ciani
Elisa Somenzi
Emiliano Lasagna
Fabio Pilla
Francesca M. Sarti
Licia Colli
Luigi Liotta
Matteo Cortellari
Paola Crepaldi
Paolo Ajmone Marsan
Roberta Ciampolini
Salvatore Mastrangelo
Stefano Frattini
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

Background Climate and farming systems, several of which are considered as low-input agricultural systems, vary between goat populations from Northern and Southern Italy and have led to different management practices. These processes have impacted genome shaping in terms of inbreeding and regions under selection and resulted in differences between the northern and southern populations. Both inbreeding and signatures of selection can be pinpointed by the analysis of runs of homozygosity (ROH), which provides useful information to assist the management of this species in different rural areas. Results We analyzed the ROH distribution and inbreeding (FROH) in 902 goats from the Italian Goat Consortium2 dataset. We evaluated the differences in individual ROH number and length between goat breeds from Northern (NRD) and Central-southern (CSD) Italy. Then, we identified the signatures of selection that differentiate these two groups using three methods: ROH, ΔROH, and averaged FST. ROH analyses showed that some Italian goat breeds have a lower inbreeding coefficient, which is attributable to their management and history. ROH are longer in breeds that are undergoing non-optimal management or with small population size. In several small breeds, the ROH length classes are balanced, reflecting more accurate mating planning. The differences in climate and management between the NRD and CSD groups have resulted in different ROH lengths and numbers: the NRD populations bred in isolated valleys present more and shorter ROH segments, while the CSD populations have fewer and longer ROH, which is likely due to the fact that they have undergone more admixture events during the horizontal transhumance practice followed by a more recent standardization. We identified four genes within signatures of selection on chromosome 11 related to fertility in the NRD group, and 23 genes on chromosomes 5 and 6 related to growth in the CSD group. Finally, we identified 17 genes on chromosome 12 related to environmental adaptation and body size with high homozygosity in both groups. Conclusions These results show how different management practices have impacted the level of genomic inbreeding in two Italian goat groups and could be useful to assist management in a low-input system while safeguarding the diversity of small populations

AIR Universita degli studi di Milano

Archivio della Ricerca - Università di Pisa