14 research outputs found
Universal Graph Random Features
We propose a novel random walk-based algorithm for unbiased estimation of
arbitrary functions of a weighted adjacency matrix, coined universal graph
random features (u-GRFs). This includes many of the most popular examples of
kernels defined on the nodes of a graph. Our algorithm enjoys subquadratic time
complexity with respect to the number of nodes, overcoming the notoriously
prohibitive cubic scaling of exact graph kernel evaluation. It can also be
trivially distributed across machines, permitting learning on much larger
networks. At the heart of the algorithm is a modulation function which
upweights or downweights the contribution from different random walks depending
on their lengths. We show that by parameterising it with a neural network we
can obtain u-GRFs that give higher-quality kernel estimates or perform
efficient, scalable kernel learning. We provide robust theoretical analysis and
support our findings with experiments including pointwise estimation of fixed
graph kernels, solving non-homogeneous graph ordinary differential equations,
node clustering and kernel regression on triangular meshes
Learning Compositional Sparse Gaussian Processes with a Shrinkage Prior
Choosing a proper set of kernel functions is an important problem in learning
Gaussian Process (GP) models since each kernel structure has different model
complexity and data fitness. Recently, automatic kernel composition methods
provide not only accurate prediction but also attractive interpretability
through search-based methods. However, existing methods suffer from slow kernel
composition learning. To tackle large-scaled data, we propose a new sparse
approximate posterior for GPs, MultiSVGP, constructed from groups of inducing
points associated with individual additive kernels in compositional kernels. We
demonstrate that this approximation provides a better fit to learn
compositional kernels given empirical observations. We also provide
theoretically justification on error bound when compared to the traditional
sparse GP. In contrast to the search-based approach, we present a novel
probabilistic algorithm to learn a kernel composition by handling the sparsity
in the kernel selection with Horseshoe prior. We demonstrate that our model can
capture characteristics of time series with significant reductions in
computational time and have competitive regression performance on real-world
data sets.Comment: AAAI 202
A new family of kernels from the beta polynomial kernels with applications in density estimation
One of the fundamental data analytics tools in statistical estimation is the non-parametric kernel method that involves probability estimates production. The method uses the observations to obtain useful statistical information to aid the practicing statistician in decision making and further statistical investigations. The kernel techniques primarily examine essential characteristics in a data set, and this research aims to introduce new kernel functions that can easily detect inherent properties in any given observations. However, accurate application of kernel estimator as data analytics apparatus requires the kernel function and smoothing parameter that regulates the level of smoothness applied to the estimates. A plethora of kernel functions of different families and smoothing parameter selectors exist in the literature, but no one method is universally acceptable in all situations. Hence, more kernel functions with smoothing parameter selectors have been propounded customarily in density estimation. This article proposes a distinct kernel family from the beta polynomial kernel family using the exponential progression in its derivation. The newly proposed kernel family was evaluated with simulated and life data. The outcomes clearly indicated that this kernel family could compete favorably well with other kernel families in density estimation. A further comparison of numerical results of the new family and the existing beta family revealed that the new family outperformed the classical beta kernel family with simulation and real data examples with the aid of asymptotic mean integrated squared error (AMISE) as criterion function. The information obtained from the data analysis of this research could be used for decision making in an organization, especially when human and material resources are to be considered. In addition, Kernel functions are vital tools for data analysis and data visualization; hence the newly proposed functions are vital exploratory tools
Implicit Kernel Attention
\textit{Attention} computes the dependency between representations, and it
encourages the model to focus on the important selective features.
Attention-based models, such as Transformers and graph attention networks (GAT)
are widely utilized for sequential data and graph-structured data. This paper
suggests a new interpretation and generalized structure of the attention in
Transformer and GAT. For the attention in Transformer and GAT, we derive that
the attention is a product of two parts: 1) the RBF kernel to measure the
similarity of two instances and 2) the exponential of norm to compute
the importance of individual instances. From this decomposition, we generalize
the attention in three ways. First, we propose implicit kernel attention with
an implicit kernel function, instead of manual kernel selection. Second, we
generalize norm as the norm. Third, we extend our attention to
structured multi-head attention. Our generalized attention shows better
performance on classification, translation, and regression tasks
Kernel-Based Models for Influence Maximization on Graphs based on Gaussian Process Variance Minimization
The inference of novel knowledge, the discovery of hidden patterns, and the
uncovering of insights from large amounts of data from a multitude of sources
make Data Science (DS) to an art rather than just a mere scientific discipline.
The study and design of mathematical models able to analyze information
represents a central research topic in DS. In this work, we introduce and
investigate a novel model for influence maximization (IM) on graphs using ideas
from kernel-based approximation, Gaussian process regression, and the
minimization of a corresponding variance term. Data-driven approaches can be
applied to determine proper kernels for this IM model and machine learning
methodologies are adopted to tune the model parameters. Compared to stochastic
models in this field that rely on costly Monte-Carlo simulations, our model
allows for a simple and cost-efficient update strategy to compute optimal
influencing nodes on a graph. In several numerical experiments, we show the
properties and benefits of this new model
End-to-end Kernel Learning via Generative Random Fourier Features
Random Fourier features (RFFs) provide a promising way for kernel learning in
a spectral case. Current RFFs-based kernel learning methods usually work in a
two-stage way. In the first-stage process, learning the optimal feature map is
often formulated as a target alignment problem, which aims to align the learned
kernel with the pre-defined target kernel (usually the ideal kernel). In the
second-stage process, a linear learner is conducted with respect to the mapped
random features. Nevertheless, the pre-defined kernel in target alignment is
not necessarily optimal for the generalization of the linear learner. Instead,
in this paper, we consider a one-stage process that incorporates the kernel
learning and linear learner into a unifying framework. To be specific, a
generative network via RFFs is devised to implicitly learn the kernel, followed
by a linear classifier parameterized as a full-connected layer. Then the
generative network and the classifier are jointly trained by solving the
empirical risk minimization (ERM) problem to reach a one-stage solution. This
end-to-end scheme naturally allows deeper features, in correspondence to a
multi-layer structure, and shows superior generalization performance over the
classical two-stage, RFFs-based methods in real-world classification tasks.
Moreover, inspired by the randomized resampling mechanism of the proposed
method, its enhanced adversarial robustness is investigated and experimentally
verified.Comment: update revised versio
Learning to Learn Kernels with Variational Random Features
In this work, we introduce kernels with random Fourier features in the
meta-learning framework to leverage their strong few-shot learning ability. We
propose meta variational random features (MetaVRF) to learn adaptive kernels
for the base-learner, which is developed in a latent variable model by treating
the random feature basis as the latent variable. We formulate the optimization
of MetaVRF as a variational inference problem by deriving an evidence lower
bound under the meta-learning framework. To incorporate shared knowledge from
related tasks, we propose a context inference of the posterior, which is
established by an LSTM architecture. The LSTM-based inference network can
effectively integrate the context information of previous tasks with
task-specific information, generating informative and adaptive features. The
learned MetaVRF can produce kernels of high representational power with a
relatively low spectral sampling rate and also enables fast adaptation to new
tasks. Experimental results on a variety of few-shot regression and
classification tasks demonstrate that MetaVRF delivers much better, or at least
competitive, performance compared to existing meta-learning alternatives.Comment: ICML'2020; code is available in:
https://github.com/Yingjun-Du/MetaVR