66,393 research outputs found
Probabilistic Guarantees for Safe Deep Reinforcement Learning
Deep reinforcement learning has been successfully applied to many control
tasks, but the application of such agents in safety-critical scenarios has been
limited due to safety concerns. Rigorous testing of these controllers is
challenging, particularly when they operate in probabilistic environments due
to, for example, hardware faults or noisy sensors. We propose MOSAIC, an
algorithm for measuring the safety of deep reinforcement learning agents in
stochastic settings. Our approach is based on the iterative construction of a
formal abstraction of a controller's execution in an environment, and leverages
probabilistic model checking of Markov decision processes to produce
probabilistic guarantees on safe behaviour over a finite time horizon. It
produces bounds on the probability of safe operation of the controller for
different initial configurations and identifies regions where correct behaviour
can be guaranteed. We implement and evaluate our approach on agents trained for
several benchmark control problems
Synaptic mechanisms of interference in working memory
Information from preceding trials of cognitive tasks can bias performance in
the current trial, a phenomenon referred to as interference. Subjects
performing visual working memory tasks exhibit interference in their
trial-to-trial response correlations: the recalled target location in the
current trial is biased in the direction of the target presented on the
previous trial. We present modeling work that (a) develops a probabilistic
inference model of this history-dependent bias, and (b) links our probabilistic
model to computations of a recurrent network wherein short-term facilitation
accounts for the dynamics of the observed bias. Network connectivity is
reshaped dynamically during each trial, providing a mechanism for generating
predictions from prior trial observations. Applying timescale separation
methods, we can obtain a low-dimensional description of the trial-to-trial bias
based on the history of target locations. The model has response statistics
whose mean is centered at the true target location across many trials, typical
of such visual working memory tasks. Furthermore, we demonstrate task protocols
for which the plastic model performs better than a model with static
connectivity: repetitively presented targets are better retained in working
memory than targets drawn from uncorrelated sequences.Comment: 28 pages, 7 figure
- …