5,644 research outputs found
Building nonparametric -body force fields using Gaussian process regression
Constructing a classical potential suited to simulate a given atomic system
is a remarkably difficult task. This chapter presents a framework under which
this problem can be tackled, based on the Bayesian construction of
nonparametric force fields of a given order using Gaussian process (GP) priors.
The formalism of GP regression is first reviewed, particularly in relation to
its application in learning local atomic energies and forces. For accurate
regression it is fundamental to incorporate prior knowledge into the GP kernel
function. To this end, this chapter details how properties of smoothness,
invariance and interaction order of a force field can be encoded into
corresponding kernel properties. A range of kernels is then proposed,
possessing all the required properties and an adjustable parameter
governing the interaction order modelled. The order best suited to describe
a given system can be found automatically within the Bayesian framework by
maximisation of the marginal likelihood. The procedure is first tested on a toy
model of known interaction and later applied to two real materials described at
the DFT level of accuracy. The models automatically selected for the two
materials were found to be in agreement with physical intuition. More in
general, it was found that lower order (simpler) models should be chosen when
the data are not sufficient to resolve more complex interactions. Low GPs
can be further sped up by orders of magnitude by constructing the corresponding
tabulated force field, here named "MFF".Comment: 31 pages, 11 figures, book chapte
Recommended from our members
Nonparametric regression analysis
textNonparametric regression uses nonparametric and flexible methods in analyzing complex data with unknown regression relationships by imposing minimum assumptions on the regression function. The theory and applications of nonparametric regression methods with an emphasis on kernel regression, smoothing spines and Gaussian process regression are reviewed in this report. Two datasets are analyzed to demonstrate and compare the three nonparametric regression models in R.Statistic
Bayesian Approximate Kernel Regression with Variable Selection
Nonlinear kernel regression models are often used in statistics and machine
learning because they are more accurate than linear models. Variable selection
for kernel regression models is a challenge partly because, unlike the linear
regression setting, there is no clear concept of an effect size for regression
coefficients. In this paper, we propose a novel framework that provides an
effect size analog of each explanatory variable for Bayesian kernel regression
models when the kernel is shift-invariant --- for example, the Gaussian kernel.
We use function analytic properties of shift-invariant reproducing kernel
Hilbert spaces (RKHS) to define a linear vector space that: (i) captures
nonlinear structure, and (ii) can be projected onto the original explanatory
variables. The projection onto the original explanatory variables serves as an
analog of effect sizes. The specific function analytic property we use is that
shift-invariant kernel functions can be approximated via random Fourier bases.
Based on the random Fourier expansion we propose a computationally efficient
class of Bayesian approximate kernel regression (BAKR) models for both
nonlinear regression and binary classification for which one can compute an
analog of effect sizes. We illustrate the utility of BAKR by examining two
important problems in statistical genetics: genomic selection (i.e. phenotypic
prediction) and association mapping (i.e. inference of significant variants or
loci). State-of-the-art methods for genomic selection and association mapping
are based on kernel regression and linear models, respectively. BAKR is the
first method that is competitive in both settings.Comment: 22 pages, 3 figures, 3 tables; theory added; new simulations
presented; references adde
Stochastic expansions using continuous dictionaries: L\'{e}vy adaptive regression kernels
This article describes a new class of prior distributions for nonparametric
function estimation. The unknown function is modeled as a limit of weighted
sums of kernels or generator functions indexed by continuous parameters that
control local and global features such as their translation, dilation,
modulation and shape. L\'{e}vy random fields and their stochastic integrals are
employed to induce prior distributions for the unknown functions or,
equivalently, for the number of kernels and for the parameters governing their
features. Scaling, shape, and other features of the generating functions are
location-specific to allow quite different function properties in different
parts of the space, as with wavelet bases and other methods employing
overcomplete dictionaries. We provide conditions under which the stochastic
expansions converge in specified Besov or Sobolev norms. Under a Gaussian error
model, this may be viewed as a sparse regression problem, with regularization
induced via the L\'{e}vy random field prior distribution. Posterior inference
for the unknown functions is based on a reversible jump Markov chain Monte
Carlo algorithm. We compare the L\'{e}vy Adaptive Regression Kernel (LARK)
method to wavelet-based methods using some of the standard test functions, and
illustrate its flexibility and adaptability in nonstationary applications.Comment: Published in at http://dx.doi.org/10.1214/11-AOS889 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
- …