Search CORE

66,393 research outputs found

Probabilistic Guarantees for Safe Deep Reinforcement Learning

Author: E Ohn-Bar
EM Hahn
G Katz
J Garcia
J Kemeny
M Kattenbelt
M Kwiatkowska
M Lahijania
MC Machado
R Ehlers
S Junges
SEZ Soudjani
T Brázdil
V Mnih
X Huang
Publication venue
Publication date: 29/06/2020
Field of study

Deep reinforcement learning has been successfully applied to many control tasks, but the application of such agents in safety-critical scenarios has been limited due to safety concerns. Rigorous testing of these controllers is challenging, particularly when they operate in probabilistic environments due to, for example, hardware faults or noisy sensors. We propose MOSAIC, an algorithm for measuring the safety of deep reinforcement learning agents in stochastic settings. Our approach is based on the iterative construction of a formal abstraction of a controller's execution in an environment, and leverages probabilistic model checking of Markov decision processes to produce probabilistic guarantees on safe behaviour over a finite time horizon. It produces bounds on the probability of safe operation of the controller for different initial configurations and identifies regions where correct behaviour can be guaranteed. We implement and evaluate our approach on agents trained for several benchmark control problems

arXiv.org e-Print Archive

Crossref

University of Birmingham Research Portal

Synaptic mechanisms of interference in working memory

Author: Kilpatrick Zachary P
Publication venue
Publication date: 18/07/2017
Field of study

Information from preceding trials of cognitive tasks can bias performance in the current trial, a phenomenon referred to as interference. Subjects performing visual working memory tasks exhibit interference in their trial-to-trial response correlations: the recalled target location in the current trial is biased in the direction of the target presented on the previous trial. We present modeling work that (a) develops a probabilistic inference model of this history-dependent bias, and (b) links our probabilistic model to computations of a recurrent network wherein short-term facilitation accounts for the dynamics of the observed bias. Network connectivity is reshaped dynamically during each trial, providing a mechanism for generating predictions from prior trial observations. Applying timescale separation methods, we can obtain a low-dimensional description of the trial-to-trial bias based on the history of target locations. The model has response statistics whose mean is centered at the true target location across many trials, typical of such visual working memory tasks. Furthermore, we demonstrate task protocols for which the plastic model performs better than a model with static connectivity: repetitively presented targets are better retained in working memory than targets drawn from uncorrelated sequences.Comment: 28 pages, 7 figure

arXiv.org e-Print Archive

CU Scholar Institutional Repository

Crossref