18,288 research outputs found
Simulation-Based Inference for Global Health Decisions
The COVID-19 pandemic has highlighted the importance of in-silico
epidemiological modelling in predicting the dynamics of infectious diseases to
inform health policy and decision makers about suitable prevention and
containment strategies. Work in this setting involves solving challenging
inference and control problems in individual-based models of ever increasing
complexity. Here we discuss recent breakthroughs in machine learning,
specifically in simulation-based inference, and explore its potential as a
novel venue for model calibration to support the design and evaluation of
public health interventions. To further stimulate research, we are developing
software interfaces that turn two cornerstone COVID-19 and malaria epidemiology
models COVID-sim, (https://github.com/mrc-ide/covid-sim/) and OpenMalaria
(https://github.com/SwissTPH/openmalaria) into probabilistic programs, enabling
efficient interpretable Bayesian inference within those simulators
Controlled Sequential Monte Carlo
Sequential Monte Carlo methods, also known as particle methods, are a popular
set of techniques for approximating high-dimensional probability distributions
and their normalizing constants. These methods have found numerous applications
in statistics and related fields; e.g. for inference in non-linear non-Gaussian
state space models, and in complex static models. Like many Monte Carlo
sampling schemes, they rely on proposal distributions which crucially impact
their performance. We introduce here a class of controlled sequential Monte
Carlo algorithms, where the proposal distributions are determined by
approximating the solution to an associated optimal control problem using an
iterative scheme. This method builds upon a number of existing algorithms in
econometrics, physics, and statistics for inference in state space models, and
generalizes these methods so as to accommodate complex static models. We
provide a theoretical analysis concerning the fluctuation and stability of this
methodology that also provides insight into the properties of related
algorithms. We demonstrate significant gains over state-of-the-art methods at a
fixed computational complexity on a variety of applications
An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning
In this paper we introduce the idea of improving the performance of
parametric temporal-difference (TD) learning algorithms by selectively
emphasizing or de-emphasizing their updates on different time steps. In
particular, we show that varying the emphasis of linear TD()'s updates
in a particular way causes its expected update to become stable under
off-policy training. The only prior model-free TD methods to achieve this with
per-step computation linear in the number of function approximation parameters
are the gradient-TD family of methods including TDC, GTD(), and
GQ(). Compared to these methods, our _emphatic TD()_ is
simpler and easier to use; it has only one learned parameter vector and one
step-size parameter. Our treatment includes general state-dependent discounting
and bootstrapping functions, and a way of specifying varying degrees of
interest in accurately valuing different states.Comment: 29 pages This is a significant revision based on the first set of
reviews. The most important change was to signal early that the main result
is about stability, not convergenc
Motion Planning of Uncertain Ordinary Differential Equation Systems
This work presents a novel motion planning framework, rooted in nonlinear programming theory, that treats uncertain fully and under-actuated dynamical systems described by ordinary differential equations. Uncertainty in multibody dynamical systems comes from various sources, such as: system parameters, initial conditions, sensor and actuator noise, and external forcing. Treatment of uncertainty in design is of paramount practical importance because all real-life systems are affected by it, and poor robustness and suboptimal performance result if it’s not accounted for in a given design. In this work uncertainties are modeled using Generalized Polynomial Chaos and are solved quantitatively using a least-square collocation method. The computational efficiency of this approach enables the inclusion of uncertainty statistics in the nonlinear programming optimization process. As such, the proposed framework allows the user to pose, and answer, new design questions related to uncertain dynamical systems.
Specifically, the new framework is explained in the context of forward, inverse, and hybrid dynamics formulations. The forward dynamics formulation, applicable to both fully and under-actuated systems, prescribes deterministic actuator inputs which yield uncertain state trajectories. The inverse dynamics formulation is the dual to the forward dynamic, and is only applicable to fully-actuated systems; deterministic state trajectories are prescribed and yield uncertain actuator inputs. The inverse dynamics formulation is more computationally efficient as it requires only algebraic evaluations and completely avoids numerical integration. Finally, the hybrid dynamics formulation is applicable to under-actuated systems where it leverages the benefits of inverse dynamics for actuated joints and forward dynamics for unactuated joints; it prescribes actuated state and unactuated input trajectories which yield uncertain unactuated states and actuated inputs.
The benefits of the ability to quantify uncertainty when planning the motion of multibody dynamic systems are illustrated through several case-studies. The resulting designs determine optimal motion plans—subject to deterministic and statistical constraints—for all possible systems within the probability space
Addressing Function Approximation Error in Actor-Critic Methods
In value-based reinforcement learning methods such as deep Q-learning,
function approximation errors are known to lead to overestimated value
estimates and suboptimal policies. We show that this problem persists in an
actor-critic setting and propose novel mechanisms to minimize its effects on
both the actor and the critic. Our algorithm builds on Double Q-learning, by
taking the minimum value between a pair of critics to limit overestimation. We
draw the connection between target networks and overestimation bias, and
suggest delaying policy updates to reduce per-update error and further improve
performance. We evaluate our method on the suite of OpenAI gym tasks,
outperforming the state of the art in every environment tested.Comment: Accepted at ICML 201
- …