3,959 research outputs found
Efficient construction of Bayes optimal designs for stochastic process models
Stochastic process models are now commonly used to analyse complex
biological, ecological and industrial systems. Increasingly there is a need to
deliver accurate estimates of model parameters and assess model fit by
optimizing the timing of measurement of these processes. Standard methods to
construct Bayes optimal designs, such as the well known \Muller algorithm, are
computationally intensive even for relatively simple models. A key issue is
that, in determining the merit of a design, the utility function typically
requires summaries of many parameter posterior distributions, each determined
via a computer-intensive scheme such as MCMC. This paper describes a fast and
computationally efficient scheme to determine optimal designs for stochastic
process models. The algorithm compares favourably with other methods for
determining optimal designs and can require up to an order of magnitude fewer
utility function evaluations for the same accuracy in the optimal design
solution. It benefits from being embarrassingly parallel and is ideal for
running on multi-core computers. The method is illustrated by determining
different sized optimal designs for three problems of increasing complexity
Design Issues for Generalized Linear Models: A Review
Generalized linear models (GLMs) have been used quite effectively in the
modeling of a mean response under nonstandard conditions, where discrete as
well as continuous data distributions can be accommodated. The choice of design
for a GLM is a very important task in the development and building of an
adequate model. However, one major problem that handicaps the construction of a
GLM design is its dependence on the unknown parameters of the fitted model.
Several approaches have been proposed in the past 25 years to solve this
problem. These approaches, however, have provided only partial solutions that
apply in only some special cases, and the problem, in general, remains largely
unresolved. The purpose of this article is to focus attention on the
aforementioned dependence problem. We provide a survey of various existing
techniques dealing with the dependence problem. This survey includes
discussions concerning locally optimal designs, sequential designs, Bayesian
designs and the quantile dispersion graph approach for comparing designs for
GLMs.Comment: Published at http://dx.doi.org/10.1214/088342306000000105 in the
Statistical Science (http://www.imstat.org/sts/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Active Learning Via Sequential Design and Uncertainty Sampling
Classification is an important task in many fields including biomedical
research and machine learning. Traditionally, a classification rule is
constructed based a bunch of labeled data. Recently, due to technological
innovation and automatic data collection schemes, we easily encounter with data
sets containing large amounts of unlabeled samples. Because to label each of
them is usually costly and inefficient, how to utilize these unlabeled data in
a classifier construction process becomes an important problem. In machine
learning literature, active learning or semi-supervised learning are popular
concepts discussed under this situation, where classification algorithms
recruit new unlabeled subjects sequentially based on the information learned
from previous stages of its learning process, and these new subjects are then
labeled and included as new training samples. From a statistical aspect, these
methods can be recognized as a hybrid of the sequential design and stochastic
approximation procedure. In this paper, we study sequential learning procedures
for building efficient and effective classifiers, where only the selected
subjects are labeled and included in its learning stage. The proposed algorithm
combines the ideas of Bayesian sequential optimal design and uncertainty
sampling. Computational issues of the algorithm are discussed. Numerical
results using both synthesized data and real examples are reported.Comment: 25 pages, 8 figure
Optimal Experimental Design for Constrained Inverse Problems
In this paper, we address the challenging problem of optimal experimental
design (OED) of constrained inverse problems. We consider two OED formulations
that allow reducing the experimental costs by minimizing the number of
measurements. The first formulation assumes a fine discretization of the design
parameter space and uses sparsity promoting regularization to obtain an
efficient design. The second formulation parameterizes the design and seeks
optimal placement for these measurements by solving a small-dimensional
optimization problem. We consider both problems in a Bayes risk as well as an
empirical Bayes risk minimization framework. For the unconstrained inverse
state problem, we exploit the closed form solution for the inner problem to
efficiently compute derivatives for the outer OED problem. The empirical
formulation does not require an explicit solution of the inverse problem and
therefore allows to integrate constraints efficiently. A key contribution is an
efficient optimization method for solving the resulting, typically
high-dimensional, bilevel optimization problem using derivative-based methods.
To overcome the lack of non-differentiability in active set methods for
inequality constraints problems, we use a relaxed interior point method. To
address the growing computational complexity of empirical Bayes OED, we
parallelize the computation over the training models. Numerical examples and
illustrations from tomographic reconstruction, for various data sets and under
different constraints, demonstrate the impact of constraints on the optimal
design and highlight the importance of OED for constrained problems.Comment: 19 pages, 8 figure
Bayesian Modeling of Inconsistent Plastic Response due to Material Variability
The advent of fabrication techniques such as additive manufacturing has
focused attention on the considerable variability of material response due to
defects and other microstructural aspects. This variability motivates the
development of an enhanced design methodology that incorporates inherent
material variability to provide robust predictions of performance. In this
work, we develop plasticity models capable of representing the distribution of
mechanical responses observed in experiments using traditional plasticity
models of the mean response and recently developed uncertainty quantification
(UQ) techniques. We demonstrate that the new method provides predictive
realizations that are superior to more traditional ones, and how these UQ
techniques can be used in model selection and assessing the quality of
calibrated physical parameters.Comment: 21 pages, 6 composite figures. arXiv admin note: substantial text
overlap with arXiv:1802.0148
Efficient Bayesian experimentation using an expected information gain lower bound
Experimental design is crucial for inference where limitations in the data
collection procedure are present due to cost or other restrictions. Optimal
experimental designs determine parameters that in some appropriate sense make
the data the most informative possible. In a Bayesian setting this is
translated to updating to the best possible posterior. Information theoretic
arguments have led to the formation of the expected information gain as a
design criterion. This can be evaluated mainly by Monte Carlo sampling and
maximized by using stochastic approximation methods, both known for being
computationally expensive tasks. We propose a framework where a lower bound of
the expected information gain is used as an alternative design criterion. In
addition to alleviating the computational burden, this also addresses issues
concerning estimation bias. The problem of permeability inference in a large
contaminated area is used to demonstrate the validity of our approach where we
employ the massively parallel version of the multiphase multicomponent
simulator TOUGH2 to simulate contaminant transport and a Polynomial Chaos
approximation of the forward model that further accelerates the objective
function evaluations. The proposed methodology is demonstrated to a setting
where field measurements are available
Probabilistic representation and inverse design of metamaterials based on a deep generative model with semi-supervised learning strategy
The research of metamaterials has achieved enormous success in the
manipulation of light in an artificially prescribed manner using delicately
designed sub-wavelength structures, so-called meta-atoms. Even though modern
numerical methods allow to accurately calculate the optical response of complex
structures, the inverse design of metamaterials is still a challenging task due
to the non-intuitive and non-unique relationship between physical structures
and optical responses. To better unveil this implicit relationship and thus
facilitate metamaterial design, we propose to represent metamaterials and model
the inverse design problem in a probabilistically generative manner. By
employing an encoder-decoder configuration, our deep generative model
compresses the meta-atom design and optical response into a latent space, where
similar designs and similar optical responses are automatically clustered
together. Therefore, by sampling in the latent space, the stochastic latent
variables function as codes, from which the candidate designs are generated
upon given requirements in a decoding process. With the effective latent
representation of metamaterials, we can elegantly model the complex
structure-performance relationship in an interpretable way, and solve the
one-to-many mapping issue that is intractable in a deterministic model.
Moreover, to alleviate the burden of numerical calculation in data collection,
we develop a semi-supervised learning strategy that allows our model to utilize
unlabeled data in addition to labeled data during training, simultaneously
optimizing the generative inverse design and deterministic forward prediction
in an end-to-end manner. On a data-driven basis, the proposed model can serve
as a comprehensive and efficient tool that accelerates the design,
characterization and even new discovery in the research domain of metamaterials
and photonics in general.Comment: 28 pages, 5 figure
Unbiased MLMC stochastic gradient-based optimization of Bayesian experimental designs
In this paper we propose an efficient stochastic optimization algorithm to
search for Bayesian experimental designs such that the expected information
gain is maximized. The gradient of the expected information gain with respect
to experimental design parameters is given by a nested expectation, for which
the standard Monte Carlo method using a fixed number of inner samples yields a
biased estimator. In this paper, applying the idea of randomized multilevel
Monte Carlo (MLMC) methods, we introduce an unbiased Monte Carlo estimator for
the gradient of the expected information gain with finite expected squared
-norm and finite expected computational cost per sample. Our unbiased
estimator can be combined well with stochastic gradient descent algorithms,
which results in our proposal of an optimization algorithm to search for an
optimal Bayesian experimental design. Numerical experiments confirm that our
proposed algorithm works well not only for a simple test problem but also for a
more realistic pharmacokinetic problem.Comment: major revision, 26 pages, 6 figure
Bayesian Nonparametric Estimation for Dynamic Treatment Regimes with Sequential Transition Times
Dynamic treatment regimes in oncology and other disease areas often can be
characterized by an alternating sequence of treatments or other actions and
transition times between disease states. The sequence of transition states may
vary substantially from patient to patient, depending on how the regime plays
out, and in practice there often are many possible counterfactual outcome
sequences. For evaluating the regimes, the mean final overall time may be
expressed as a weighted average of the means of all possible sums of successive
transitions times. A common example arises in cancer therapies where the
transition times between various sequences of treatments, disease remission,
disease progression, and death characterize overall survival time. For the
general setting, we propose estimating mean overall outcome time by assuming a
Bayesian nonparametric regression model for the logarithm of each transition
time. A dependent Dirichlet process prior with Gaussian process base measure
(DDP-GP) is assumed, and a joint posterior is obtained by Markov chain Monte
Carlo (MCMC) sampling. We provide general guidelines for constructing a prior
using empirical Bayes methods. We compare the proposed approach with inverse
probability of treatment weighting. These comparisons are done by simulation
studies of both single-stage and multi-stage regimes, with treatment assignment
depending on baseline covariates. The method is applied to analyze a dataset
arising from a clinical trial involving multi-stage chemotherapy regimes for
acute leukemia. An R program for implementing the DDP-GP-based Bayesian
nonparametric analysis is freely available at
https://www.ma.utexas.edu/users/yxu/
Computer emulation with non-stationary Gaussian processes
Gaussian process (GP) models are widely used to emulate propagation
uncertainty in computer experiments. GP emulation sits comfortably within an
analytically tractable Bayesian framework. Apart from propagating uncertainty
of the input variables, a GP emulator trained on finitely many runs of the
experiment also offers error bars for response surface estimates at unseen
input values. This helps select future input values where the experiment should
be run to minimize the uncertainty in the response surface estimation. However,
traditional GP emulators use stationary covariance functions, which perform
poorly and lead to sub-optimal selection of future input points when the
response surface has sharp local features, such as a jump discontinuity or an
isolated tall peak. We propose an easily implemented non-stationary GP
emulator, based on two stationary GPs, one nested into the other, and
demonstrate its superior ability in handling local features and selecting
future input points from the boundaries of such features
- …