Search CORE

11 research outputs found

Decision-Making Under Uncertainty: Beyond Probabilities

Author: Badings Thom
Jansen Nils
Simão Thiago D.
Suilen Marnix
Publication venue
Publication date: 10/03/2023
Field of study

This position paper reflects on the state-of-the-art in decision-making under uncertainty. A classical assumption is that probabilities can sufficiently capture all uncertainty in a system. In this paper, the focus is on the uncertainty that goes beyond this classical interpretation, particularly by employing a clear distinction between aleatoric and epistemic uncertainty. The paper features an overview of Markov decision processes (MDPs) and extensions to account for partial observability and adversarial behavior. These models sufficiently capture aleatoric uncertainty but fail to account for epistemic uncertainty robustly. Consequently, we present a thorough overview of so-called uncertainty models that exhibit uncertainty in a more robust interpretation. We show several solution techniques for both discrete and continuous models, ranging from formal verification, over control-based abstractions, to reinforcement learning. As an integral part of this paper, we list and discuss several key challenges that arise when dealing with rich types of uncertainty in a model-based fashion

arXiv.org e-Print Archive

Reinforcement Learning by Guided Safe Exploration

Author: Jansen Nils
Simão Thiago D.
Spaan Matthijs T. J.
Tindemans Simon H.
Yang Qisong
Publication venue
Publication date: 26/07/2023
Field of study

Safety is critical to broadening the application of reinforcement learning (RL). Often, we train RL agents in a controlled environment, such as a laboratory, before deploying them in the real world. However, the real-world target task might be unknown prior to deployment. Reward-free RL trains an agent without the reward to adapt quickly once the reward is revealed. We consider the constrained reward-free setting, where an agent (the guide) learns to explore safely without the reward signal. This agent is trained in a controlled environment, which allows unsafe interactions and still provides the safety signal. After the target task is revealed, safety violations are not allowed anymore. Thus, the guide is leveraged to compose a safe behaviour policy. Drawing from transfer learning, we also regularize a target policy (the student) towards the guide while the student is unreliable and gradually eliminate the influence of the guide as training progresses. The empirical analysis shows that this method can achieve safe transfer learning and helps the student solve the target task faster.Comment: Accecpted at ECAI 202

arXiv.org e-Print Archive

Act-Then-Measure: Reinforcement Learning for Partially Observable Environments with Active Measuring

Author: Jansen Nils
Krale Merlijn
Simão Thiago D.
Publication venue: Association for the Advancement of Artificial Intelligence
Publication date: 01/07/2023
Field of study

We study Markov decision processes (MDPs), where agents control when and how they gather information, as formalized by action-contingent noiselessly observable MDPs (ACNO-MPDs). In these models, actions have two components: a control action that influences how the environment changes and a measurement action that affects the agent's observation. To solve ACNO-MDPs, we introduce the act-then-measure (ATM) heuristic, which assumes that we can ignore future state uncertainty when choosing control actions. To decide whether or not to measure, we introduce the concept of measuring value. We show how following this heuristic may lead to shorter policy computation times and prove a bound on the performance loss it incurs. We develop a reinforcement learning algorithm based on the ATM heuristic, using a Dyna-Q variant adapted for partially observable domains, and showcase its superior performance compared to prior methods on a number of partially-observable environments

Association for the Advancement of Artificial Intelligence: AAAI Publications

Safe Policy Improvement for POMDPs via Finite-State Controllers

Author: Jansen Nils
Simão Thiago D.
Suilen Marnix
Publication venue: Association for the Advancement of Artificial Intelligence
Publication date: 26/06/2023
Field of study

We study safe policy improvement (SPI) for partially observable Markov decision processes (POMDPs). SPI is an offline reinforcement learning (RL) problem that assumes access to (1) historical data about an environment, and (2) the so-called behavior policy that previously generated this data by interacting with the environment. SPI methods neither require access to a model nor the environment itself, and aim to reliably improve upon the behavior policy in an offline manner. Existing methods make the strong assumption that the environment is fully observable. In our novel approach to the SPI problem for POMDPs, we assume that a finite-state controller (FSC) represents the behavior policy and that finite memory is sufficient to derive optimal policies. This assumption allows us to map the POMDP to a finite-state fully observable MDP, the history MDP. We estimate this MDP by combining the historical data and the memory of the FSC, and compute an improved policy using an off-the-shelf SPI algorithm. The underlying SPI method constrains the policy space according to the available data, such that the newly computed policy only differs from the behavior policy when sufficient data is available. We show that this new policy, converted into a new FSC for the (unknown) POMDP, outperforms the behavior policy with high probability. Experimental results on several well-established benchmarks show the applicability of the approach, even in cases where finite memory is not sufficient

Association for the Advancement of Artificial Intelligence: AAAI Publications

Robust Anytime Learning of Markov Decision Processes

Author: Jansen Nils
Parker David
Simão Thiago D.
Suilen Marnix
Publication venue
Publication date: 31/05/2022
Field of study

Markov decision processes (MDPs) are formal models commonly used in sequential decision-making. MDPs capture the stochasticity that may arise, for instance, from imprecise actuators via probabilities in the transition function. However, in data-driven applications, deriving precise probabilities from (limited) data introduces statistical errors that may lead to unexpected or undesirable outcomes. Uncertain MDPs (uMDPs) do not require precise probabilities but instead use so-called uncertainty sets in the transitions, accounting for such limited data. Tools from the formal verification community efficiently compute robust policies that provably adhere to formal specifications, like safety constraints, under the worst-case instance in the uncertainty set. We continuously learn the transition probabilities of an MDP in a robust anytime-learning approach that combines a dedicated Bayesian inference scheme with the computation of robust policies. In particular, our method (1) approximates probabilities as intervals, (2) adapts to new data that may be inconsistent with an intermediate model, and (3) may be stopped at any time to compute a robust policy on the uMDP that faithfully captures the data so far. We show the effectiveness of our approach and compare it to robust policies computed on uMDPs learned by the UCRL2 reinforcement learning algorithm in an experimental evaluation on several benchmarks

arXiv.org e-Print Archive

Oxford University Research Archive

More for Less: Safe Policy Improvement With Stronger Performance Guarantees

Author: Baier Christel
Dubslaff Clemens
Jansen Nils
Simão Thiago D.
Suilen Marnix
Wienhöft Patrick
Publication venue
Publication date: 13/05/2023
Field of study

In an offline reinforcement learning setting, the safe policy improvement (SPI) problem aims to improve the performance of a behavior policy according to which sample data has been generated. State-of-the-art approaches to SPI require a high number of samples to provide practical probabilistic guarantees on the improved policy's performance. We present a novel approach to the SPI problem that provides the means to require less data for such guarantees. Specifically, to prove the correctness of these guarantees, we devise implicit transformations on the data set and the underlying environment model that serve as theoretical foundations to derive tighter improvement bounds for SPI. Our empirical evaluation, using the well-established SPI with baseline bootstrapping (SPIBB) algorithm, on standard benchmarks shows that our method indeed significantly reduces the sample complexity of the SPIBB algorithm.Comment: Accecpted at IJCAI 202

arXiv.org e-Print Archive

Scalable Safe Policy Improvement via Monte Carlo Tree Search

Author: Bianchi Federico (author)
Castellini Alberto (author)
Farinelli Alessandro (author)
Simão Thiago D. (author)
Spaan M.T.J. (author)
Zorzi Edoardo (author)
Publication venue
Publication date: 01/01/2023
Field of study

Algorithms for safely improving policies are important to deploy reinforcement learning approaches in real-world scenarios. In this work, we propose an algorithm, called MCTS-SPIBB, that computes safe policy improvement online using a Monte Carlo Tree Search based strategy. We theoretically prove that the policy generated by MCTS-SPIBB converges, as the number of simulations grows, to the optimal safely improved policy generated by Safe Policy Improvement with Baseline Bootstrapping (SPIBB), a popular algorithm based on policy iteration. Moreover, our empirical analysis performed on three standard benchmark domains shows that MCTS-SPIBB scales to significantly larger problems than SPIBB because it computes the policy online and locally, i.e., only in the states actually visited by the agent.Algorithmic

TU Delft Repository

Robotic assisted gait as a tool for rehabilitation of individuals with spinal cord injury: a systematic review

Author: A Esquenazi
A Houldin
A Yang
AC Smith
AE Chisholm
AJ Kozlowski
ARC Donati
C Hartigan
Camila R. Simão
D Varoqui
DR Louie
E Moreh
Edgard Morya
F Hoekstra
F Sylos-Labini
HJA Van Hedel
I Schwartz
I Schwartz
JL Contreras-Vidal
K Nas
KE Gordon
KE Gordon
KJ Manella
L Lünenburger
Ledycnarf J. Holanda
M Aach
M Arazpour
M Bolliger
M Hubli
M Knikou
M Sczesny-Kaiser
Matheus O. Lacerda
MM Mirbagheri
N Evans
N Kawashima
P Asselin
Patrícia M. M. Silva
R Banz
RT Frood
S Tanabe
SS Galen
T Aurich-Schuler
T Lam
T Lam
Thiago C. Amorim
V Dietz
X Niu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Ciliate Diversity From Aquatic Environments in the Brazilian Atlantic Forest as Revealed by High-Throughput DNA Sequencing

Author: A Behnke
A Gimmler
A Rehman
A Stock
AR Linde-Arias
B Kjerfve
B Pan
BJ Callahan
C Bachy
C Berney
C Camacho
C de Vargas
C Lozupone
C Pedrós-Alió
C Quast
CA Durán-ramírez
CA Lozupone
Carlos A. G. Soares
D Lynn
DF Onda
DH Lynn
DH Lynn
DS Almeida
E Bolyen
E Gentekaki
E Grinienė
E Lasek-Nesselquist
EH Simpson
F Luciana
F Mahé
F Zhao
G Pitsch
H Wickham
HJ Elwood
I Letunic
Inácio D. Silva-Neto
J Gong
J Gong
J Trifinopoulos
JAL Pontes
JC Wasserman
JR Rideout
K Katoh
LA Amaral-Zettler
LA Amaral-Zettler
LF Santoferrara
LF Santoferrara
LF Santoferrara
M Dunthorn
M Dunthorn
M Kattar
M Martin
M Sawaya
MK Cheung
ML Sogin
N Myers
NA Bokulich
NM Fernandes
Noemi M. Fernandes
P Madoni
Pedro H. Campello-Nunes
Q Zhang
R Massana
RH Whittaker
S Andrews
SM Gibbons
T Fenchel
T Magoč
T Rognes
T Sorenson
T Stoeck
T Stoeck
T Stoeck
T Weisse
T Weisse
TDS Paiva
Thiago S. Paiva
TL Simão
TZ DeSantis
V Boscaro
V Boscaro
W Foissner
W Foissner
W Foissner
W Foissner
W Orsi
WC Pfeiffer
Y Xu
Z Zhan
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref