26,026 research outputs found
A Kernel Approach for PDE Discovery and Operator Learning
This article presents a three-step framework for learning and solving partial
differential equations (PDEs) using kernel methods. Given a training set
consisting of pairs of noisy PDE solutions and source/boundary terms on a mesh,
kernel smoothing is utilized to denoise the data and approximate derivatives of
the solution. This information is then used in a kernel regression model to
learn the algebraic form of the PDE. The learned PDE is then used within a
kernel based solver to approximate the solution of the PDE with a new
source/boundary term, thereby constituting an operator learning framework.
Numerical experiments compare the method to state-of-the-art algorithms and
demonstrate its competitive performance
Approximate inference of the bandwidth in multivariate kernel density estimation
Kernel density estimation is a popular and widely used non-parametric method for data-driven density estimation. Its appeal lies in its simplicity and ease of implementation, as well as its strong asymptotic results regarding its convergence to the true data distribution. However, a major difficulty is the setting of the bandwidth, particularly in high dimensions and with limited amount of data. An approximate Bayesian method is proposed, based on the Expectation–Propagation algorithm with a likelihood obtained from a leave-one-out cross validation approach. The proposed method yields an iterative procedure to approximate the posterior distribution of the inverse bandwidth. The approximate posterior can be used to estimate the model evidence for selecting the structure of the bandwidth and approach online learning. Extensive experimental validation shows that the proposed method is competitive in terms of performance with state-of-the-art plug-in methods
Bayesian Approximate Kernel Regression with Variable Selection
Nonlinear kernel regression models are often used in statistics and machine
learning because they are more accurate than linear models. Variable selection
for kernel regression models is a challenge partly because, unlike the linear
regression setting, there is no clear concept of an effect size for regression
coefficients. In this paper, we propose a novel framework that provides an
effect size analog of each explanatory variable for Bayesian kernel regression
models when the kernel is shift-invariant --- for example, the Gaussian kernel.
We use function analytic properties of shift-invariant reproducing kernel
Hilbert spaces (RKHS) to define a linear vector space that: (i) captures
nonlinear structure, and (ii) can be projected onto the original explanatory
variables. The projection onto the original explanatory variables serves as an
analog of effect sizes. The specific function analytic property we use is that
shift-invariant kernel functions can be approximated via random Fourier bases.
Based on the random Fourier expansion we propose a computationally efficient
class of Bayesian approximate kernel regression (BAKR) models for both
nonlinear regression and binary classification for which one can compute an
analog of effect sizes. We illustrate the utility of BAKR by examining two
important problems in statistical genetics: genomic selection (i.e. phenotypic
prediction) and association mapping (i.e. inference of significant variants or
loci). State-of-the-art methods for genomic selection and association mapping
are based on kernel regression and linear models, respectively. BAKR is the
first method that is competitive in both settings.Comment: 22 pages, 3 figures, 3 tables; theory added; new simulations
presented; references adde
Implicit Kernel Meta-Learning Using Kernel Integral Forms
Meta-learning algorithms have made significant progress in the context of meta-learning for image classification but less attention has been given to the regression setting. In this paper we propose to learn the probability distribution representing a random feature kernel that we wish to use within kernel ridge regression (KRR). We introduce two instances of this meta-learning framework, learning a neural network pushforward for a translation-invariant kernel and an affine pushforward for a neural network random feature kernel, both mapping from a Gaussian latent distribution. We learn the parameters of the pushforward by minimizing a meta-loss associated to the KRR objective. Since the resulting kernel does not admit an analytical form, we adopt a random feature sampling approach to approximate it. We call the resulting method Implicit Kernel Meta-Learning (IKML). We derive a meta-learning bound for IKML, which shows the role played by the number of tasks T, the task sample size n, and the number of random features M. In particular the bound implies that M can be the chosen independently of T and only mildly dependent on n. We introduce one synthetic and two real-world meta-learning regression benchmark datasets. Experiments on these datasets show that IKML performs best or close to best when compared against competitive meta-learning methods
Learning Compositional Sparse Gaussian Processes with a Shrinkage Prior
Choosing a proper set of kernel functions is an important problem in learning
Gaussian Process (GP) models since each kernel structure has different model
complexity and data fitness. Recently, automatic kernel composition methods
provide not only accurate prediction but also attractive interpretability
through search-based methods. However, existing methods suffer from slow kernel
composition learning. To tackle large-scaled data, we propose a new sparse
approximate posterior for GPs, MultiSVGP, constructed from groups of inducing
points associated with individual additive kernels in compositional kernels. We
demonstrate that this approximation provides a better fit to learn
compositional kernels given empirical observations. We also provide
theoretically justification on error bound when compared to the traditional
sparse GP. In contrast to the search-based approach, we present a novel
probabilistic algorithm to learn a kernel composition by handling the sparsity
in the kernel selection with Horseshoe prior. We demonstrate that our model can
capture characteristics of time series with significant reductions in
computational time and have competitive regression performance on real-world
data sets.Comment: AAAI 202
Preconditioning Kernel Matrices
The computational and storage complexity of kernel machines presents the
primary barrier to their scaling to large, modern, datasets. A common way to
tackle the scalability issue is to use the conjugate gradient algorithm, which
relieves the constraints on both storage (the kernel matrix need not be stored)
and computation (both stochastic gradients and parallelization can be used).
Even so, conjugate gradient is not without its own issues: the conditioning of
kernel matrices is often such that conjugate gradients will have poor
convergence in practice. Preconditioning is a common approach to alleviating
this issue. Here we propose preconditioned conjugate gradients for kernel
machines, and develop a broad range of preconditioners particularly useful for
kernel matrices. We describe a scalable approach to both solving kernel
machines and learning their hyperparameters. We show this approach is exact in
the limit of iterations and outperforms state-of-the-art approximations for a
given computational budget
Convolutional Kernel Networks
An important goal in visual recognition is to devise image representations
that are invariant to particular transformations. In this paper, we address
this goal with a new type of convolutional neural network (CNN) whose
invariance is encoded by a reproducing kernel. Unlike traditional approaches
where neural networks are learned either to represent data or for solving a
classification task, our network learns to approximate the kernel feature map
on training data. Such an approach enjoys several benefits over classical ones.
First, by teaching CNNs to be invariant, we obtain simple network architectures
that achieve a similar accuracy to more complex ones, while being easy to train
and robust to overfitting. Second, we bridge a gap between the neural network
literature and kernels, which are natural tools to model invariance. We
evaluate our methodology on visual recognition tasks where CNNs have proven to
perform well, e.g., digit recognition with the MNIST dataset, and the more
challenging CIFAR-10 and STL-10 datasets, where our accuracy is competitive
with the state of the art.Comment: appears in Advances in Neural Information Processing Systems (NIPS),
Dec 2014, Montreal, Canada, http://nips.c
- …