Search CORE

32,516 research outputs found

Kernel functions based on triplet comparisons

Author: Kleindessner Matthäus
von Luxburg Ulrike
Publication venue
Publication date: 01/01/2017
Field of study

Given only information in the form of similarity triplets "Object A is more similar to object B than to object C" about a data set, we propose two ways of defining a kernel function on the data set. While previous approaches construct a low-dimensional Euclidean embedding of the data set that reflects the given similarity triplets, we aim at defining kernel functions that correspond to high-dimensional embeddings. These kernel functions can subsequently be used to apply any kernel method to the data set

arXiv.org e-Print Archive

Publikationsserver der Universität Tübingen

MPG.PuRe

Learning from Distributions via Support Measure Machines

Author: Dinuzzo Francesco
Fukumizu Kenji
Muandet Krikamol
Schölkopf Bernhard
Publication venue
Publication date: 01/01/2012
Field of study

This paper presents a kernel-based discriminative learning framework on probability measures. Rather than relying on large collections of vectorial training examples, our framework learns using a collection of probability distributions that have been constructed to meaningfully represent training data. By representing these probability distributions as mean embeddings in the reproducing kernel Hilbert space (RKHS), we are able to apply many standard kernel-based learning techniques in straightforward fashion. To accomplish this, we construct a generalization of the support vector machine (SVM) called a support measure machine (SMM). Our analyses of SMMs provides several insights into their relationship to traditional SVMs. Based on such insights, we propose a flexible SVM (Flex-SVM) that places different kernel functions on each training example. Experimental results on both synthetic and real-world data demonstrate the effectiveness of our proposed framework.Comment: Advances in Neural Information Processing Systems 2

arXiv.org e-Print Archive

CiteSeerX

MPG.PuRe

Hyperparameter Learning via Distributional Transfer

Author: Chan Lucian
Huang Junzhou
Law Ho Chung Leon
Sejdinovic Dino
Zhao Peilin
Publication venue
Publication date: 01/01/2019
Field of study

Bayesian optimisation is a popular technique for hyperparameter learning but typically requires initial exploration even in cases where similar prior tasks have been solved. We propose to transfer information across tasks using learnt representations of training datasets used in those tasks. This results in a joint Gaussian process model on hyperparameters and data representations. Representations make use of the framework of distribution embeddings into reproducing kernel Hilbert spaces. The developed method has a faster convergence compared to existing baselines, in some cases requiring only a few evaluations of the target objective

arXiv.org e-Print Archive

Oxford University Research Archive

Improving Siamese Networks for One Shot Learning using Kernel Based Activation functions

Author: Jadon Shruti
Srinivasan Aditya Acrot
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 22/10/2019
Field of study

The lack of a large amount of training data has always been the constraining factor in solving a lot of problems in machine learning, making One Shot Learning one of the most intriguing ideas in machine learning. It aims to learn information about object categories from one, or only a few training examples. This process of learning in deep learning is usually accomplished by proper objective function, i.e; loss function and embeddings extraction i.e; architecture. In this paper, we discussed about metrics based deep learning architectures for one shot learning such as Siamese neural networks and present a method to improve on their accuracy using Kafnets (kernel-based non-parametric activation functions for neural networks) by learning proper embeddings with relatively less number of epochs. Using kernel activation functions, we are able to achieve strong results which exceed those of ReLU based deep learning models in terms of embeddings structure, loss convergence, and accuracy.Comment: 15 pages, 8 figure

arXiv.org e-Print Archive

Crossref

End-to-End Neural Ad-hoc Ranking with Kernel Pooling

Author: Callan Jamie
Dai Zhuyun
Liu Zhiyuan
Power Russell
Xiong Chenyan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 20/06/2017
Field of study

This paper proposes K-NRM, a kernel based neural model for document ranking. Given a query and a set of documents, K-NRM uses a translation matrix that models word-level similarities via word embeddings, a new kernel-pooling technique that uses kernels to extract multi-level soft match features, and a learning-to-rank layer that combines those features into the final ranking score. The whole model is trained end-to-end. The ranking layer learns desired feature patterns from the pairwise ranking loss. The kernels transfer the feature patterns into soft-match targets at each similarity level and enforce them on the translation matrix. The word embeddings are tuned accordingly so that they can produce the desired soft matches. Experiments on a commercial search engine's query log demonstrate the improvements of K-NRM over prior feature-based and neural-based states-of-the-art, and explain the source of K-NRM's advantage: Its kernel-guided embedding encodes a similarity metric tailored for matching query words to document words, and provides effective multi-level soft matches

arXiv.org e-Print Archive

Crossref

Thin and Deep Gaussian Processes

Author: de Souza Daniel Augusto
Deisenroth Marc Peter
Gomes João P. P.
John ST
Mattos César Lincoln C.
Mesquita Diego
Nikitin Alexander
Ross Magnus
Álvarez Mauricio A.
Publication venue
Publication date: 17/10/2023
Field of study

Gaussian processes (GPs) can provide a principled approach to uncertainty quantification with easy-to-interpret kernel hyperparameters, such as the lengthscale, which controls the correlation distance of function values. However, selecting an appropriate kernel can be challenging. Deep GPs avoid manual kernel engineering by successively parameterizing kernels with GP layers, allowing them to learn low-dimensional embeddings of the inputs that explain the output data. Following the architecture of deep neural networks, the most common deep GPs warp the input space layer-by-layer but lose all the interpretability of shallow GPs. An alternative construction is to successively parameterize the lengthscale of a kernel, improving the interpretability but ultimately giving away the notion of learning lower-dimensional embeddings. Unfortunately, both methods are susceptible to particular pathologies which may hinder fitting and limit their interpretability. This work proposes a novel synthesis of both previous approaches: Thin and Deep GP (TDGP). Each TDGP layer defines locally linear transformations of the original input data maintaining the concept of latent embeddings while also retaining the interpretation of lengthscales of a kernel. Moreover, unlike the prior solutions, TDGP induces non-pathological manifolds that admit learning lower-dimensional representations. We show with theoretical and experimental results that i) TDGP is, unlike previous models, tailored to specifically discover lower-dimensional manifolds in the input data, ii) TDGP behaves well when increasing the number of layers, and iii) TDGP performs well in standard benchmark datasets.Comment: Accepted at the Conference on Neural Information Processing Systems (NeurIPS) 202

arXiv.org e-Print Archive

Spectral Analysis of Kernel and Neural Embeddings: Optimization and Generalization

Author: Chehreghani Morteza Haghir
Dubhashi Devdatt
Jorge Emilio
Rahbar Arman
Publication venue
Publication date: 06/09/2019
Field of study

We extend the recent results of (Arora et al. 2019). by spectral analysis of the representations corresponding to the kernel and neural embeddings. They showed that in a simple single-layer network, the alignment of the labels to the eigenvectors of the corresponding Gram matrix determines both the convergence of the optimization during training as well as the generalization properties. We generalize their result to the kernel and neural representations and show these extensions improve both optimization and generalization of the basic setup studied in (Arora et al. 2019). In particular, we first extend the setup with the Gaussian kernel and the approximations by random Fourier features as well as with the embeddings produced by two-layer networks trained on different tasks. We then study the use of more sophisticated kernels and embeddings, those designed optimally for deep neural networks and those developed for the classification task of interest given the data and the training labels, independent of any specific classification model

arXiv.org e-Print Archive

Chalmers Research