48 research outputs found
Comparison of Gaussian process modeling software
Gaussian process fitting, or kriging, is often used to create a model from a
set of data. Many available software packages do this, but we show that very
different results can be obtained from different packages even when using the
same data and model. We describe the parameterization, features, and
optimization used by eight different fitting packages that run on four
different platforms. We then compare these eight packages using various data
functions and data sets, revealing that there are stark differences between the
packages. In addition to comparing the prediction accuracy, the predictive
variance--which is important for evaluating precision of predictions and is
often used in stopping criteria--is also evaluated
Bayesian optimization for materials design
We introduce Bayesian optimization, a technique developed for optimizing
time-consuming engineering simulations and for fitting machine learning models
on large datasets. Bayesian optimization guides the choice of experiments
during materials design and discovery to find good material designs in as few
experiments as possible. We focus on the case when materials designs are
parameterized by a low-dimensional vector. Bayesian optimization is built on a
statistical technique called Gaussian process regression, which allows
predicting the performance of a new design based on previously tested designs.
After providing a detailed introduction to Gaussian process regression, we
introduce two Bayesian optimization methods: expected improvement, for design
problems with noise-free evaluations; and the knowledge-gradient method, which
generalizes expected improvement and may be used in design problems with noisy
evaluations. Both methods are derived using a value-of-information analysis,
and enjoy one-step Bayes-optimality
Global sensitivity analysis of stochastic computer models with joint metamodels
The global sensitivity analysis method used to quantify the influence of uncertain input variables on the variability in numerical model responses has already been applied to deterministic computer codes; deterministic means here that the same set of input variables gives always the same output value. This paper proposes a global sensitivity analysis methodology for stochastic computer codes, for which the result of each code run is itself random. The framework of the joint modeling of the mean and dispersion of heteroscedastic data is used. To deal with the complexity of computer experiment outputs, nonparametric joint models are discussed and a new Gaussian process-based joint model is proposed. The relevance of these models is analyzed based upon two case studies. Results show that the joint modeling approach yields accurate sensitivity index estimatiors even when heteroscedasticity is strong
Effects of eight neuropsychiatric copy number variants on human brain structure
peer reviewedMany copy number variants (CNVs) confer risk for the same range of neurodevelopmental symptoms and psychiatric conditions including autism and schizophrenia. Yet, to date neuroimaging studies have typically been carried out one mutation at a time, showing that CNVs have large effects on brain anatomy. Here, we aimed to characterize and quantify the distinct brain morphometry effects and latent dimensions across 8 neuropsychiatric CNVs. We analyzed T1-weighted MRI data from clinically and non-clinically ascertained CNV carriers (deletion/duplication) at the 1q21.1 (n = 39/28), 16p11.2 (n = 87/78), 22q11.2 (n = 75/30), and 15q11.2 (n = 72/76) loci as well as 1296 non-carriers (controls). Case-control contrasts of all examined genomic loci demonstrated effects on brain anatomy, with deletions and duplications showing mirror effects at the global and regional levels. Although CNVs mainly showed distinct brain patterns, principal component analysis (PCA) loaded subsets of CNVs on two latent brain dimensions, which explained 32 and 29% of the variance of the 8 Cohen’s d maps. The cingulate gyrus, insula, supplementary motor cortex, and cerebellum were identified by PCA and multi-view pattern learning as top regions contributing to latent dimension shared across subsets of CNVs. The large proportion of distinct CNV effects on brain morphology may explain the small neuroimaging effect sizes reported in polygenic psychiatric conditions. Nevertheless, latent gene brain morphology dimensions will help subgroup the rapidly expanding landscape of neuropsychiatric variants and dissect the heterogeneity of idiopathic conditions. © 2021, The Author(s)
Effects of eight neuropsychiatric copy number variants on human brain structure
Many copy number variants (CNVs) confer risk for the same range of neurodevelopmental symptoms and psychiatric conditions including autism and schizophrenia. Yet, to date neuroimaging studies have typically been carried out one mutation at a time, showing that CNVs have large effects on brain anatomy. Here, we aimed to characterize and quantify the distinct brain morphometry effects and latent dimensions across 8 neuropsychiatric CNVs. We analyzed T1-weighted MRI data from clinically and non-clinically ascertained CNV carriers (deletion/duplication) at the 1q21.1 (n = 39/28), 16p11.2 (n = 87/78), 22q11.2 (n = 75/30), and 15q11.2 (n = 72/76) loci as well as 1296 non-carriers (controls). Case-control contrasts of all examined genomic loci demonstrated effects on brain anatomy, with deletions and duplications showing mirror effects at the global and regional levels. Although CNVs mainly showed distinct brain patterns, principal component analysis (PCA) loaded subsets of CNVs on two latent brain dimensions, which explained 32 and 29% of the variance of the 8 Cohen’s d maps. The cingulate gyrus, insula, supplementary motor cortex, and cerebellum were identified by PCA and multi-view pattern learning as top regions contributing to latent dimension shared across subsets of CNVs. The large proportion of distinct CNV effects on brain morphology may explain the small neuroimaging effect sizes reported in polygenic psychiatric conditions. Nevertheless, latent gene brain morphology dimensions will help subgroup the rapidly expanding landscape of neuropsychiatric variants and dissect the heterogeneity of idiopathic conditions
Enhancing stochastic kriging metamodels with gradient estimators
Stochastic kriging is a new metamodeling technique for effectively representing the mean response surface implied by a stochastic simulation; it takes into account both stochastic simulation noise and uncertainty about the underlying response surface of interest. We show theoretically, through some simplified models, that incorporating gradient estimators into stochastic kriging tends to significantly improve surface prediction. To address the issue of which type of gradient estimator to use, when there is a choice, we briefly review stochastic gradient estimation techniques; we then focus on the properties of infinitesimal perturbation analysis and likelihood ratio/score function gradient estimators and make recommendations. To conclude, we use simulation experiments with no simplifying assumptions to demonstrate that the use of stochastic kriging with gradient estimators provides more reliable prediction results than stochastic kriging alone
Data from fitting Gaussian process models to various data sets using eight Gaussian process software packages
The article of record as published may be found at http://dx.doi.org/10.1016/j.dib.2017.12.012This data article provides the summary data from tests comparing various Gaussian process software packages. Each spreadsheet represents a single function or type of function using a particular input sample size. In each spreadsheet, a row gives the results for a particular replication using a single package. Within each spreadsheet there are the results from eight Gaussian process model-fitting packages on five replicates of the surface. There is also one spreadsheet comparing the results from two packages performing stochastic kriging. These data enable comparisons between the packages to determine which package will give users the best results.Office of Naval Research via NPS's CRUSERNaval Supply Systems Command Fleet LogisticsGrant number N00244-15-2-000
Gradient Based Criteria for Sequential Design
Computer simulation experiments are commonly used as an inexpensive alternative to real-world experiments to form a metamodel that approximates the input-output relationship of the real-world experiment. While a user may want to understand the entire response surface, they may also want to focus on interesting regions of the design space, such as where the gradient is large. In this paper we present an algorithm that adaptively runs a simulation experiment that focuses on finding areas of the response surface with a large gradient while also gathering an understanding of the entire surface. We consider the scenario where small batches of points can be run simultaneously, such as with multi-core processors
Data from fitting Gaussian process models to various data sets using eight Gaussian process software packages
This data article provides the summary data from tests comparing various Gaussian process software packages. Each spreadsheet represents a single function or type of function using a particular input sample size. In each spreadsheet, a row gives the results for a particular replication using a single package. Within each spreadsheet there are the results from eight Gaussian process model-fitting packages on five replicates of the surface. There is also one spreadsheet comparing the results from two packages performing stochastic kriging. These data enable comparisons between the packages to determine which package will give users the best results