Search CORE

962 research outputs found

A Practitioner's Guide to MDP Model Checking Algorithms

Author: Hartmanns Arnd
Junges Sebastian
Quatmann Tim
Weininger Maximilian
Publication venue: Springer Nature
Publication date: 22/04/2023
Field of study

University of Twente Research Information

Robust and Risk-Sensitive Markov Decision Processes with Applications to Dynamic Optimal Reinsurance

Author: Glauner Alexander Harald
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2020
Field of study

KITopen

Reliable Off-policy Evaluation for Reinforcement Learning

Author: Gao Rui
Wang Jie
Zha Hongyuan
Publication venue
Publication date: 15/01/2021
Field of study

In a sequential decision-making problem, off-policy evaluation estimates the expected cumulative reward of a target policy using logged trajectory data generated from a different behavior policy, without execution of the target policy. Reinforcement learning in high-stake environments, such as healthcare and education, is often limited to off-policy settings due to safety or ethical concerns, or inability of exploration. Hence it is imperative to quantify the uncertainty of the off-policy estimate before deployment of the target policy. In this paper, we propose a novel framework that provides robust and optimistic cumulative reward estimates using one or multiple logged trajectories data. Leveraging methodologies from distributionally robust optimization, we show that with proper selection of the size of the distributional uncertainty set, these estimates serve as confidence bounds with non-asymptotic and asymptotic guarantees under stochastic or adversarial environments. Our results are also generalized to batch reinforcement learning and are supported by empirical analysis.Comment: 39 pages, 4 figure

arXiv.org e-Print Archive

Episodic Bayesian Optimal Control with Unknown Randomness Distributions

Author: Lin Yifan
Shapiro Alexander
Wang Yuhao
Zhou Enlu
Publication venue
Publication date: 16/08/2023
Field of study

Stochastic optimal control with unknown randomness distributions has been studied for a long time, encompassing robust control, distributionally robust control, and adaptive control. We propose a new episodic Bayesian approach that incorporates Bayesian learning with optimal control. In each episode, the approach learns the randomness distribution with a Bayesian posterior and subsequently solves the corresponding Bayesian average estimate of the true problem. The resulting policy is exercised during the episode, while additional data/observations of the randomness are collected to update the Bayesian posterior for the next episode. We show that the resulting episodic value functions and policies converge almost surely to their optimal counterparts of the true problem if the parametrized model of the randomness distribution is correctly specified. We further show that the asymptotic convergence rate of the episodic value functions is of the order

O(N^{-1/2})

. We develop an efficient computational method based on stochastic dual dynamic programming for a class of problems that have convex value functions. Our numerical results on a classical inventory control problem verify the theoretical convergence results and demonstrate the effectiveness of the proposed computational method

arXiv.org e-Print Archive

Probabilistic Guarantees for Safe Deep Reinforcement Learning

Author: E Ohn-Bar
EM Hahn
G Katz
J Garcia
J Kemeny
M Kattenbelt
M Kwiatkowska
M Lahijania
MC Machado
R Ehlers
S Junges
SEZ Soudjani
T Brázdil
V Mnih
X Huang
Publication venue
Publication date: 29/06/2020
Field of study

Deep reinforcement learning has been successfully applied to many control tasks, but the application of such agents in safety-critical scenarios has been limited due to safety concerns. Rigorous testing of these controllers is challenging, particularly when they operate in probabilistic environments due to, for example, hardware faults or noisy sensors. We propose MOSAIC, an algorithm for measuring the safety of deep reinforcement learning agents in stochastic settings. Our approach is based on the iterative construction of a formal abstraction of a controller's execution in an environment, and leverages probabilistic model checking of Markov decision processes to produce probabilistic guarantees on safe behaviour over a finite time horizon. It produces bounds on the probability of safe operation of the controller for different initial configurations and identifies regions where correct behaviour can be guaranteed. We implement and evaluate our approach on agents trained for several benchmark control problems

arXiv.org e-Print Archive

Crossref

University of Birmingham Research Portal

Recommended from our members

Abstractions in Reasoning for Long-Term Autonomy

Author: Wray Kyle Hollins
Publication venue: ScholarWorks@UMass Amherst
Publication date: 02/07/2019
Field of study

The path to building adaptive, robust, intelligent agents has led researchers to develop a suite of powerful models and algorithms for agents with a single objective. However, in recent years, attempts to use this monolithic approach to solve an ever-expanding set of complex real-world problems, which increasingly include long-term autonomous deployments, have illuminated challenges in its ability to scale. Consequently, a fragmented collection of hierarchical and multi-objective models were developed. This trend continues into the algorithms as well, as each approximates an optimal solution in a different manner for scalability. These models and algorithms represent an attempt to solve pieces of an overarching problem: how can an agent explicitly model and integrate the necessary aspects of reasoning required to achieve long-term autonomy? This thesis presents a general hierarchical and multi-objective model called a policy network that unifies prior fragmented solutions into a single graphical decision-making structure. Policy networks are broadly useful to solve numerous real-world problems. This thesis focuses on autonomous vehicle (AV) problems: (1) route-planning with multiple objectives; (2) semi-autonomy with proactive transfer of control; and (3) intersection decision-making for reasoning online about any number of other vehicles and pedestrians. Formal models are presented for each of the distinct problems. Solutions are evaluated using real-world map data in simulation and demonstrated on a fully operational AV prototype driving on real public roads. Policy networks serve as a shared underlying framework for all three, enabling their seamless integration as parts of an overall solution for rich, real-world, scalable decision-making in agents with long-term autonomy

ScholarWorks@UMass Amherst

From Infinite to Finite Programs: Explicit Error Bounds with Applications to Approximate Dynamic Programming

Author: Esfahani Peyman Mohajerin
Kuhn Daniel
Lygeros John
Sutter Tobias
Publication venue
Publication date: 20/02/2017
Field of study

We consider linear programming (LP) problems in infinite dimensional spaces that are in general computationally intractable. Under suitable assumptions, we develop an approximation bridge from the infinite-dimensional LP to tractable finite convex programs in which the performance of the approximation is quantified explicitly. To this end, we adopt the recent developments in two areas of randomized optimization and first order methods, leading to a priori as well as a posterior performance guarantees. We illustrate the generality and implications of our theoretical results in the special case of the long-run average cost and discounted cost optimal control problems for Markov decision processes on Borel spaces. The applicability of the theoretical results is demonstrated through a constrained linear quadratic optimal control problem and a fisheries management problem.Comment: 30 pages, 5 figure

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne