Search CORE

14 research outputs found

Universal Graph Random Features

Author: Berger Eli
Choromanski Krzysztof
Reid Isaac
Weller Adrian
Publication venue
Publication date: 10/10/2023
Field of study

We propose a novel random walk-based algorithm for unbiased estimation of arbitrary functions of a weighted adjacency matrix, coined universal graph random features (u-GRFs). This includes many of the most popular examples of kernels defined on the nodes of a graph. Our algorithm enjoys subquadratic time complexity with respect to the number of nodes, overcoming the notoriously prohibitive cubic scaling of exact graph kernel evaluation. It can also be trivially distributed across machines, permitting learning on much larger networks. At the heart of the algorithm is a modulation function which upweights or downweights the contribution from different random walks depending on their lengths. We show that by parameterising it with a neural network we can obtain u-GRFs that give higher-quality kernel estimates or perform efficient, scalable kernel learning. We provide robust theoretical analysis and support our findings with experiments including pointwise estimation of fixed graph kernels, solving non-homogeneous graph ordinary differential equations, node clustering and kernel regression on triangular meshes

arXiv.org e-Print Archive

Learning Compositional Sparse Gaussian Processes with a Shrinkage Prior

Author: Bui Hung
Choi Jaesik
Tong Anh
Tran Toan
Publication venue
Publication date: 24/02/2021
Field of study

Choosing a proper set of kernel functions is an important problem in learning Gaussian Process (GP) models since each kernel structure has different model complexity and data fitness. Recently, automatic kernel composition methods provide not only accurate prediction but also attractive interpretability through search-based methods. However, existing methods suffer from slow kernel composition learning. To tackle large-scaled data, we propose a new sparse approximate posterior for GPs, MultiSVGP, constructed from groups of inducing points associated with individual additive kernels in compositional kernels. We demonstrate that this approximation provides a better fit to learn compositional kernels given empirical observations. We also provide theoretically justification on error bound when compared to the traditional sparse GP. In contrast to the search-based approach, we present a novel probabilistic algorithm to learn a kernel composition by handling the sparsity in the kernel selection with Horseshoe prior. We demonstrate that our model can capture characteristics of time series with significant reductions in computational time and have competitive regression performance on real-world data sets.Comment: AAAI 202

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

A new family of kernels from the beta polynomial kernels with applications in density estimation

Author: Nwankwo Wilson
Siloko Edith Akpevwe
Siloko Israel Uzuazor
Publication venue: 'Universitas Ahmad Dahlan, Kampus 3'
Publication date: 06/11/2020
Field of study

One of the fundamental data analytics tools in statistical estimation is the non-parametric kernel method that involves probability estimates production. The method uses the observations to obtain useful statistical information to aid the practicing statistician in decision making and further statistical investigations. The kernel techniques primarily examine essential characteristics in a data set, and this research aims to introduce new kernel functions that can easily detect inherent properties in any given observations. However, accurate application of kernel estimator as data analytics apparatus requires the kernel function and smoothing parameter that regulates the level of smoothness applied to the estimates. A plethora of kernel functions of different families and smoothing parameter selectors exist in the literature, but no one method is universally acceptable in all situations. Hence, more kernel functions with smoothing parameter selectors have been propounded customarily in density estimation. This article proposes a distinct kernel family from the beta polynomial kernel family using the exponential progression in its derivation. The newly proposed kernel family was evaluated with simulated and life data. The outcomes clearly indicated that this kernel family could compete favorably well with other kernel families in density estimation. A further comparison of numerical results of the new family and the existing beta family revealed that the new family outperformed the classical beta kernel family with simulation and real data examples with the aid of asymptotic mean integrated squared error (AMISE) as criterion function. The information obtained from the data analysis of this research could be used for decision making in an organization, especially when human and material resources are to be considered. In addition, Kernel functions are vital tools for data analysis and data visualization; hence the newly proposed functions are vital exploratory tools

International Journal of Advances in Intelligent Informatics

International Journal of Advances in Intelligent Informatics (IJAIN)

Implicit Kernel Attention

Author: Jung Yohan
Kim Dongjun
Moon Il-Chul
Song Kyungwoo
Publication venue
Publication date: 15/09/2020
Field of study

\textit{Attention} computes the dependency between representations, and it encourages the model to focus on the important selective features. Attention-based models, such as Transformers and graph attention networks (GAT) are widely utilized for sequential data and graph-structured data. This paper suggests a new interpretation and generalized structure of the attention in Transformer and GAT. For the attention in Transformer and GAT, we derive that the attention is a product of two parts: 1) the RBF kernel to measure the similarity of two instances and 2) the exponential of

L^{2}

norm to compute the importance of individual instances. From this decomposition, we generalize the attention in three ways. First, we propose implicit kernel attention with an implicit kernel function, instead of manual kernel selection. Second, we generalize

L^{2}

norm as the

L^{p}

norm. Third, we extend our attention to structured multi-head attention. Our generalized attention shows better performance on classification, translation, and regression tasks

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Kernel-Based Models for Influence Maximization on Graphs based on Gaussian Process Variance Minimization

Author: Cuomo Salvatore
Erb Wolfgang
Santin Gabriele
Publication venue
Publication date: 02/03/2021
Field of study

The inference of novel knowledge, the discovery of hidden patterns, and the uncovering of insights from large amounts of data from a multitude of sources make Data Science (DS) to an art rather than just a mere scientific discipline. The study and design of mathematical models able to analyze information represents a central research topic in DS. In this work, we introduce and investigate a novel model for influence maximization (IM) on graphs using ideas from kernel-based approximation, Gaussian process regression, and the minimization of a corresponding variance term. Data-driven approaches can be applied to determine proper kernels for this IM model and machine learning methodologies are adopted to tune the model parameters. Compared to stochastic models in this field that rely on costly Monte-Carlo simulations, our model allows for a simple and cost-efficient update strategy to compute optimal influencing nodes on a graph. In several numerical experiments, we show the properties and benefits of this new model

arXiv.org e-Print Archive

End-to-end Kernel Learning via Generative Random Fourier Features

Author: Fang Kun
Huang Xiaolin
Liu Fanghui
Yang Jie
Publication venue
Publication date: 10/12/2021
Field of study

Random Fourier features (RFFs) provide a promising way for kernel learning in a spectral case. Current RFFs-based kernel learning methods usually work in a two-stage way. In the first-stage process, learning the optimal feature map is often formulated as a target alignment problem, which aims to align the learned kernel with the pre-defined target kernel (usually the ideal kernel). In the second-stage process, a linear learner is conducted with respect to the mapped random features. Nevertheless, the pre-defined kernel in target alignment is not necessarily optimal for the generalization of the linear learner. Instead, in this paper, we consider a one-stage process that incorporates the kernel learning and linear learner into a unifying framework. To be specific, a generative network via RFFs is devised to implicitly learn the kernel, followed by a linear classifier parameterized as a full-connected layer. Then the generative network and the classifier are jointly trained by solving the empirical risk minimization (ERM) problem to reach a one-stage solution. This end-to-end scheme naturally allows deeper features, in correspondence to a multi-layer structure, and shows superior generalization performance over the classical two-stage, RFFs-based methods in real-world classification tasks. Moreover, inspired by the randomized resampling mechanism of the proposed method, its enhanced adversarial robustness is investigated and experimentally verified.Comment: update revised versio

arXiv.org e-Print Archive

Learning to Learn Kernels with Variational Random Features

Author: Du Yingjun
Shao Ling
Snoek Cees
Sun Haoliang
Xu Jun
Yin Yilong
Zhen Xiantong
Publication venue
Publication date: 01/01/2020
Field of study

In this work, we introduce kernels with random Fourier features in the meta-learning framework to leverage their strong few-shot learning ability. We propose meta variational random features (MetaVRF) to learn adaptive kernels for the base-learner, which is developed in a latent variable model by treating the random feature basis as the latent variable. We formulate the optimization of MetaVRF as a variational inference problem by deriving an evidence lower bound under the meta-learning framework. To incorporate shared knowledge from related tasks, we propose a context inference of the posterior, which is established by an LSTM architecture. The LSTM-based inference network can effectively integrate the context information of previous tasks with task-specific information, generating informative and adaptive features. The learned MetaVRF can produce kernels of high representational power with a relatively low spectral sampling rate and also enables fast adaptation to new tasks. Experimental results on a variety of few-shot regression and classification tasks demonstrate that MetaVRF delivers much better, or at least competitive, performance compared to existing meta-learning alternatives.Comment: ICML'2020; code is available in: https://github.com/Yingjun-Du/MetaVR

arXiv.org e-Print Archive

International Migration, Integration and Social Cohesion online publications

UvA-DARE