5,364 research outputs found
Fingerprint Policy Optimisation for Robust Reinforcement Learning
Policy gradient methods ignore the potential value of adjusting environment
variables: unobservable state features that are randomly determined by the
environment in a physical setting, but are controllable in a simulator. This
can lead to slow learning, or convergence to suboptimal policies, if the
environment variable has a large impact on the transition dynamics. In this
paper, we present fingerprint policy optimisation (FPO), which finds a policy
that is optimal in expectation across the distribution of environment
variables. The central idea is to use Bayesian optimisation (BO) to actively
select the distribution of the environment variable that maximises the
improvement generated by each iteration of the policy gradient method. To make
this BO practical, we contribute two easy-to-compute low-dimensional
fingerprints of the current policy. Our experiments show that FPO can
efficiently learn policies that are robust to significant rare events, which
are unlikely to be observable under random sampling, but are key to learning
good policies.Comment: ICML 201
Estimating the Expected Value of Partial Perfect Information in Health Economic Evaluations using Integrated Nested Laplace Approximation
The Expected Value of Perfect Partial Information (EVPPI) is a
decision-theoretic measure of the "cost" of parametric uncertainty in decision
making used principally in health economic decision making. Despite this
decision-theoretic grounding, the uptake of EVPPI calculations in practice has
been slow. This is in part due to the prohibitive computational time required
to estimate the EVPPI via Monte Carlo simulations. However, recent developments
have demonstrated that the EVPPI can be estimated by non-parametric regression
methods, which have significantly decreased the computation time required to
approximate the EVPPI. Under certain circumstances, high-dimensional Gaussian
Process regression is suggested, but this can still be prohibitively expensive.
Applying fast computation methods developed in spatial statistics using
Integrated Nested Laplace Approximations (INLA) and projecting from a
high-dimensional into a low-dimensional input space allows us to decrease the
computation time for fitting these high-dimensional Gaussian Processes, often
substantially. We demonstrate that the EVPPI calculated using our method for
Gaussian Process regression is in line with the standard Gaussian Process
regression method and that despite the apparent methodological complexity of
this new method, R functions are available in the package BCEA to implement it
simply and efficiently
Revisiting maximum-a-posteriori estimation in log-concave models
Maximum-a-posteriori (MAP) estimation is the main Bayesian estimation
methodology in imaging sciences, where high dimensionality is often addressed
by using Bayesian models that are log-concave and whose posterior mode can be
computed efficiently by convex optimisation. Despite its success and wide
adoption, MAP estimation is not theoretically well understood yet. The
prevalent view in the community is that MAP estimation is not proper Bayesian
estimation in a decision-theoretic sense because it does not minimise a
meaningful expected loss function (unlike the minimum mean squared error (MMSE)
estimator that minimises the mean squared loss). This paper addresses this
theoretical gap by presenting a decision-theoretic derivation of MAP estimation
in Bayesian models that are log-concave. A main novelty is that our analysis is
based on differential geometry, and proceeds as follows. First, we use the
underlying convex geometry of the Bayesian model to induce a Riemannian
geometry on the parameter space. We then use differential geometry to identify
the so-called natural or canonical loss function to perform Bayesian point
estimation in that Riemannian manifold. For log-concave models, this canonical
loss is the Bregman divergence associated with the negative log posterior
density. We then show that the MAP estimator is the only Bayesian estimator
that minimises the expected canonical loss, and that the posterior mean or MMSE
estimator minimises the dual canonical loss. We also study the question of MAP
and MSSE estimation performance in large scales and establish a universal bound
on the expected canonical error as a function of dimension, offering new
insights into the good performance observed in convex problems. These results
provide a new understanding of MAP and MMSE estimation in log-concave settings,
and of the multiple roles that convex geometry plays in imaging problems.Comment: Accepted for publication in SIAM Imaging Science
A Survey of Monte Carlo Tree Search Methods
Monte Carlo tree search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarize the results from the key game and nongame domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work
Active learning for feasible region discovery
Often in the design process of an engineer, the design specifications of the system are not completely known initially. However, usually there are some physical constraints which are already known, corresponding to a region of interest in the design space that is called feasible. These constraints often have no analytical form but need to be characterised based on expensive simulations or measurements. Therefore, it is important that the feasible region can be modeled sufficiently accurate using only a limited amount of samples. This can be solved by using active learning techniques that minimize the amount of samples w.r.t. what we try to model. Most active learning strategies focus on classification models or regression models with classification accuracy and regression accuracy in mind respectively. In this work, regression models of the constraints are used, but only the (in) feasibility is of interest. To tackle this problem, an information-theoretic sampling strategy is constructed to discover these regions. The proposed method is then tested on two synthetic examples and one engineering example and proves to outperform the current state-of-the-art
- …