118 research outputs found
A PAC Learning Algorithm for LTL and Omega-regular Objectives in MDPs
Linear temporal logic (LTL) and omega-regular objectives -- a superset of LTL
-- have seen recent use as a way to express non-Markovian objectives in
reinforcement learning. We introduce a model-based probably approximately
correct (PAC) learning algorithm for omega-regular objectives in Markov
decision processes. Unlike prior approaches, our algorithm learns from sampled
trajectories of the system and does not require prior knowledge of the system's
topology
Formal Controller Synthesis for Continuous-Space MDPs via Model-Free Reinforcement Learning
A novel reinforcement learning scheme to synthesize policies for
continuous-space Markov decision processes (MDPs) is proposed. This scheme
enables one to apply model-free, off-the-shelf reinforcement learning
algorithms for finite MDPs to compute optimal strategies for the corresponding
continuous-space MDPs without explicitly constructing the finite-state
abstraction. The proposed approach is based on abstracting the system with a
finite MDP (without constructing it explicitly) with unknown transition
probabilities, synthesizing strategies over the abstract MDP, and then mapping
the results back over the concrete continuous-space MDP with approximate
optimality guarantees. The properties of interest for the system belong to a
fragment of linear temporal logic, known as syntactically co-safe linear
temporal logic (scLTL), and the synthesis requirement is to maximize the
probability of satisfaction within a given bounded time horizon. A key
contribution of the paper is to leverage the classical convergence results for
reinforcement learning on finite MDPs and provide control strategies maximizing
the probability of satisfaction over unknown, continuous-space MDPs while
providing probabilistic closeness guarantees. Automata-based reward functions
are often sparse; we present a novel potential-based reward shaping technique
to produce dense rewards to speed up learning. The effectiveness of the
proposed approach is demonstrated by applying it to three physical benchmarks
concerning the regulation of a room's temperature, control of a road traffic
cell, and of a 7-dimensional nonlinear model of a BMW 320i car.Comment: This work is accepted at the 11th ACM/IEEE Conference on
Cyber-Physical Systems (ICCPS
Policy Synthesis and Reinforcement Learning for Discounted LTL
The difficulty of manually specifying reward functions has led to an interest
in using linear temporal logic (LTL) to express objectives for reinforcement
learning (RL). However, LTL has the downside that it is sensitive to small
perturbations in the transition probabilities, which prevents probably
approximately correct (PAC) learning without additional assumptions. Time
discounting provides a way of removing this sensitivity, while retaining the
high expressivity of the logic. We study the use of discounted LTL for policy
synthesis in Markov decision processes with unknown transition probabilities,
and show how to reduce discounted LTL to discounted-sum reward via a reward
machine when all discount factors are identical
Omega-Regular Reward Machines
Reinforcement learning (RL) is a powerful approach for training agents to
perform tasks, but designing an appropriate reward mechanism is critical to its
success. However, in many cases, the complexity of the learning objectives goes
beyond the capabilities of the Markovian assumption, necessitating a more
sophisticated reward mechanism. Reward machines and omega-regular languages are
two formalisms used to express non-Markovian rewards for quantitative and
qualitative objectives, respectively. This paper introduces omega-regular
reward machines, which integrate reward machines with omega-regular languages
to enable an expressive and effective reward mechanism for RL. We present a
model-free RL algorithm to compute epsilon-optimal strategies against
omega-egular reward machines and evaluate the effectiveness of the proposed
algorithm through experiments.Comment: To appear in ECAI-202
Good-for-MDPs Automata for Probabilistic Analysis and Reinforcement Learning
We characterize the class of nondeterministic -automata that can be
used for the analysis of finite Markov decision processes (MDPs). We call these
automata `good-for-MDPs' (GFM). We show that GFM automata are closed under
classic simulation as well as under more powerful simulation relations that
leverage properties of optimal control strategies for MDPs. This closure
enables us to exploit state-space reduction techniques, such as those based on
direct and delayed simulation, that guarantee simulation equivalence. We
demonstrate the promise of GFM automata by defining a new class of automata
with favorable properties - they are B\"uchi automata with low branching degree
obtained through a simple construction - and show that going beyond
limit-deterministic automata may significantly benefit reinforcement learning
Runs of homozygosity in the Italian goat breeds: impact of management practices in low‑input systems
Background
Climate and farming systems, several of which are considered as low-input agricultural systems, vary between goat populations from Northern and Southern Italy and have led to different management practices. These processes have impacted genome shaping in terms of inbreeding and regions under selection and resulted in differences between the northern and southern populations. Both inbreeding and signatures of selection can be pinpointed by the analysis of runs of homozygosity (ROH), which provides useful information to assist the management of this species in different rural areas.
Results
We analyzed the ROH distribution and inbreeding (FROH) in 902 goats from the Italian Goat Consortium2 dataset. We evaluated the differences in individual ROH number and length between goat breeds from Northern (NRD) and Central-southern (CSD) Italy. Then, we identified the signatures of selection that differentiate these two groups using three methods: ROH, ΔROH, and averaged FST. ROH analyses showed that some Italian goat breeds have a lower inbreeding coefficient, which is attributable to their management and history. ROH are longer in breeds that are undergoing non-optimal management or with small population size. In several small breeds, the ROH length classes are balanced, reflecting more accurate mating planning. The differences in climate and management between the NRD and CSD groups have resulted in different ROH lengths and numbers: the NRD populations bred in isolated valleys present more and shorter ROH segments, while the CSD populations have fewer and longer ROH, which is likely due to the fact that they have undergone more admixture events during the horizontal transhumance practice followed by a more recent standardization. We identified four genes within signatures of selection on chromosome 11 related to fertility in the NRD group, and 23 genes on chromosomes 5 and 6 related to growth in the CSD group. Finally, we identified 17 genes on chromosome 12 related to environmental adaptation and body size with high homozygosity in both groups.
Conclusions
These results show how different management practices have impacted the level of genomic inbreeding in two Italian goat groups and could be useful to assist management in a low-input system while safeguarding the diversity of small populations
- …