3,959 research outputs found

    Efficient construction of Bayes optimal designs for stochastic process models

    Full text link
    Stochastic process models are now commonly used to analyse complex biological, ecological and industrial systems. Increasingly there is a need to deliver accurate estimates of model parameters and assess model fit by optimizing the timing of measurement of these processes. Standard methods to construct Bayes optimal designs, such as the well known \Muller algorithm, are computationally intensive even for relatively simple models. A key issue is that, in determining the merit of a design, the utility function typically requires summaries of many parameter posterior distributions, each determined via a computer-intensive scheme such as MCMC. This paper describes a fast and computationally efficient scheme to determine optimal designs for stochastic process models. The algorithm compares favourably with other methods for determining optimal designs and can require up to an order of magnitude fewer utility function evaluations for the same accuracy in the optimal design solution. It benefits from being embarrassingly parallel and is ideal for running on multi-core computers. The method is illustrated by determining different sized optimal designs for three problems of increasing complexity

    Design Issues for Generalized Linear Models: A Review

    Full text link
    Generalized linear models (GLMs) have been used quite effectively in the modeling of a mean response under nonstandard conditions, where discrete as well as continuous data distributions can be accommodated. The choice of design for a GLM is a very important task in the development and building of an adequate model. However, one major problem that handicaps the construction of a GLM design is its dependence on the unknown parameters of the fitted model. Several approaches have been proposed in the past 25 years to solve this problem. These approaches, however, have provided only partial solutions that apply in only some special cases, and the problem, in general, remains largely unresolved. The purpose of this article is to focus attention on the aforementioned dependence problem. We provide a survey of various existing techniques dealing with the dependence problem. This survey includes discussions concerning locally optimal designs, sequential designs, Bayesian designs and the quantile dispersion graph approach for comparing designs for GLMs.Comment: Published at http://dx.doi.org/10.1214/088342306000000105 in the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Active Learning Via Sequential Design and Uncertainty Sampling

    Full text link
    Classification is an important task in many fields including biomedical research and machine learning. Traditionally, a classification rule is constructed based a bunch of labeled data. Recently, due to technological innovation and automatic data collection schemes, we easily encounter with data sets containing large amounts of unlabeled samples. Because to label each of them is usually costly and inefficient, how to utilize these unlabeled data in a classifier construction process becomes an important problem. In machine learning literature, active learning or semi-supervised learning are popular concepts discussed under this situation, where classification algorithms recruit new unlabeled subjects sequentially based on the information learned from previous stages of its learning process, and these new subjects are then labeled and included as new training samples. From a statistical aspect, these methods can be recognized as a hybrid of the sequential design and stochastic approximation procedure. In this paper, we study sequential learning procedures for building efficient and effective classifiers, where only the selected subjects are labeled and included in its learning stage. The proposed algorithm combines the ideas of Bayesian sequential optimal design and uncertainty sampling. Computational issues of the algorithm are discussed. Numerical results using both synthesized data and real examples are reported.Comment: 25 pages, 8 figure

    Optimal Experimental Design for Constrained Inverse Problems

    Full text link
    In this paper, we address the challenging problem of optimal experimental design (OED) of constrained inverse problems. We consider two OED formulations that allow reducing the experimental costs by minimizing the number of measurements. The first formulation assumes a fine discretization of the design parameter space and uses sparsity promoting regularization to obtain an efficient design. The second formulation parameterizes the design and seeks optimal placement for these measurements by solving a small-dimensional optimization problem. We consider both problems in a Bayes risk as well as an empirical Bayes risk minimization framework. For the unconstrained inverse state problem, we exploit the closed form solution for the inner problem to efficiently compute derivatives for the outer OED problem. The empirical formulation does not require an explicit solution of the inverse problem and therefore allows to integrate constraints efficiently. A key contribution is an efficient optimization method for solving the resulting, typically high-dimensional, bilevel optimization problem using derivative-based methods. To overcome the lack of non-differentiability in active set methods for inequality constraints problems, we use a relaxed interior point method. To address the growing computational complexity of empirical Bayes OED, we parallelize the computation over the training models. Numerical examples and illustrations from tomographic reconstruction, for various data sets and under different constraints, demonstrate the impact of constraints on the optimal design and highlight the importance of OED for constrained problems.Comment: 19 pages, 8 figure

    Bayesian Modeling of Inconsistent Plastic Response due to Material Variability

    Full text link
    The advent of fabrication techniques such as additive manufacturing has focused attention on the considerable variability of material response due to defects and other microstructural aspects. This variability motivates the development of an enhanced design methodology that incorporates inherent material variability to provide robust predictions of performance. In this work, we develop plasticity models capable of representing the distribution of mechanical responses observed in experiments using traditional plasticity models of the mean response and recently developed uncertainty quantification (UQ) techniques. We demonstrate that the new method provides predictive realizations that are superior to more traditional ones, and how these UQ techniques can be used in model selection and assessing the quality of calibrated physical parameters.Comment: 21 pages, 6 composite figures. arXiv admin note: substantial text overlap with arXiv:1802.0148

    Efficient Bayesian experimentation using an expected information gain lower bound

    Full text link
    Experimental design is crucial for inference where limitations in the data collection procedure are present due to cost or other restrictions. Optimal experimental designs determine parameters that in some appropriate sense make the data the most informative possible. In a Bayesian setting this is translated to updating to the best possible posterior. Information theoretic arguments have led to the formation of the expected information gain as a design criterion. This can be evaluated mainly by Monte Carlo sampling and maximized by using stochastic approximation methods, both known for being computationally expensive tasks. We propose a framework where a lower bound of the expected information gain is used as an alternative design criterion. In addition to alleviating the computational burden, this also addresses issues concerning estimation bias. The problem of permeability inference in a large contaminated area is used to demonstrate the validity of our approach where we employ the massively parallel version of the multiphase multicomponent simulator TOUGH2 to simulate contaminant transport and a Polynomial Chaos approximation of the forward model that further accelerates the objective function evaluations. The proposed methodology is demonstrated to a setting where field measurements are available

    Probabilistic representation and inverse design of metamaterials based on a deep generative model with semi-supervised learning strategy

    Full text link
    The research of metamaterials has achieved enormous success in the manipulation of light in an artificially prescribed manner using delicately designed sub-wavelength structures, so-called meta-atoms. Even though modern numerical methods allow to accurately calculate the optical response of complex structures, the inverse design of metamaterials is still a challenging task due to the non-intuitive and non-unique relationship between physical structures and optical responses. To better unveil this implicit relationship and thus facilitate metamaterial design, we propose to represent metamaterials and model the inverse design problem in a probabilistically generative manner. By employing an encoder-decoder configuration, our deep generative model compresses the meta-atom design and optical response into a latent space, where similar designs and similar optical responses are automatically clustered together. Therefore, by sampling in the latent space, the stochastic latent variables function as codes, from which the candidate designs are generated upon given requirements in a decoding process. With the effective latent representation of metamaterials, we can elegantly model the complex structure-performance relationship in an interpretable way, and solve the one-to-many mapping issue that is intractable in a deterministic model. Moreover, to alleviate the burden of numerical calculation in data collection, we develop a semi-supervised learning strategy that allows our model to utilize unlabeled data in addition to labeled data during training, simultaneously optimizing the generative inverse design and deterministic forward prediction in an end-to-end manner. On a data-driven basis, the proposed model can serve as a comprehensive and efficient tool that accelerates the design, characterization and even new discovery in the research domain of metamaterials and photonics in general.Comment: 28 pages, 5 figure

    Unbiased MLMC stochastic gradient-based optimization of Bayesian experimental designs

    Full text link
    In this paper we propose an efficient stochastic optimization algorithm to search for Bayesian experimental designs such that the expected information gain is maximized. The gradient of the expected information gain with respect to experimental design parameters is given by a nested expectation, for which the standard Monte Carlo method using a fixed number of inner samples yields a biased estimator. In this paper, applying the idea of randomized multilevel Monte Carlo (MLMC) methods, we introduce an unbiased Monte Carlo estimator for the gradient of the expected information gain with finite expected squared â„“2\ell_2-norm and finite expected computational cost per sample. Our unbiased estimator can be combined well with stochastic gradient descent algorithms, which results in our proposal of an optimization algorithm to search for an optimal Bayesian experimental design. Numerical experiments confirm that our proposed algorithm works well not only for a simple test problem but also for a more realistic pharmacokinetic problem.Comment: major revision, 26 pages, 6 figure

    Bayesian Nonparametric Estimation for Dynamic Treatment Regimes with Sequential Transition Times

    Full text link
    Dynamic treatment regimes in oncology and other disease areas often can be characterized by an alternating sequence of treatments or other actions and transition times between disease states. The sequence of transition states may vary substantially from patient to patient, depending on how the regime plays out, and in practice there often are many possible counterfactual outcome sequences. For evaluating the regimes, the mean final overall time may be expressed as a weighted average of the means of all possible sums of successive transitions times. A common example arises in cancer therapies where the transition times between various sequences of treatments, disease remission, disease progression, and death characterize overall survival time. For the general setting, we propose estimating mean overall outcome time by assuming a Bayesian nonparametric regression model for the logarithm of each transition time. A dependent Dirichlet process prior with Gaussian process base measure (DDP-GP) is assumed, and a joint posterior is obtained by Markov chain Monte Carlo (MCMC) sampling. We provide general guidelines for constructing a prior using empirical Bayes methods. We compare the proposed approach with inverse probability of treatment weighting. These comparisons are done by simulation studies of both single-stage and multi-stage regimes, with treatment assignment depending on baseline covariates. The method is applied to analyze a dataset arising from a clinical trial involving multi-stage chemotherapy regimes for acute leukemia. An R program for implementing the DDP-GP-based Bayesian nonparametric analysis is freely available at https://www.ma.utexas.edu/users/yxu/

    Computer emulation with non-stationary Gaussian processes

    Full text link
    Gaussian process (GP) models are widely used to emulate propagation uncertainty in computer experiments. GP emulation sits comfortably within an analytically tractable Bayesian framework. Apart from propagating uncertainty of the input variables, a GP emulator trained on finitely many runs of the experiment also offers error bars for response surface estimates at unseen input values. This helps select future input values where the experiment should be run to minimize the uncertainty in the response surface estimation. However, traditional GP emulators use stationary covariance functions, which perform poorly and lead to sub-optimal selection of future input points when the response surface has sharp local features, such as a jump discontinuity or an isolated tall peak. We propose an easily implemented non-stationary GP emulator, based on two stationary GPs, one nested into the other, and demonstrate its superior ability in handling local features and selecting future input points from the boundaries of such features
    • …
    corecore