32,516 research outputs found
Kernel functions based on triplet comparisons
Given only information in the form of similarity triplets "Object A is more
similar to object B than to object C" about a data set, we propose two ways of
defining a kernel function on the data set. While previous approaches construct
a low-dimensional Euclidean embedding of the data set that reflects the given
similarity triplets, we aim at defining kernel functions that correspond to
high-dimensional embeddings. These kernel functions can subsequently be used to
apply any kernel method to the data set
Learning from Distributions via Support Measure Machines
This paper presents a kernel-based discriminative learning framework on
probability measures. Rather than relying on large collections of vectorial
training examples, our framework learns using a collection of probability
distributions that have been constructed to meaningfully represent training
data. By representing these probability distributions as mean embeddings in the
reproducing kernel Hilbert space (RKHS), we are able to apply many standard
kernel-based learning techniques in straightforward fashion. To accomplish
this, we construct a generalization of the support vector machine (SVM) called
a support measure machine (SMM). Our analyses of SMMs provides several insights
into their relationship to traditional SVMs. Based on such insights, we propose
a flexible SVM (Flex-SVM) that places different kernel functions on each
training example. Experimental results on both synthetic and real-world data
demonstrate the effectiveness of our proposed framework.Comment: Advances in Neural Information Processing Systems 2
Hyperparameter Learning via Distributional Transfer
Bayesian optimisation is a popular technique for hyperparameter learning but
typically requires initial exploration even in cases where similar prior tasks
have been solved. We propose to transfer information across tasks using learnt
representations of training datasets used in those tasks. This results in a
joint Gaussian process model on hyperparameters and data representations.
Representations make use of the framework of distribution embeddings into
reproducing kernel Hilbert spaces. The developed method has a faster
convergence compared to existing baselines, in some cases requiring only a few
evaluations of the target objective
Improving Siamese Networks for One Shot Learning using Kernel Based Activation functions
The lack of a large amount of training data has always been the constraining
factor in solving a lot of problems in machine learning, making One Shot
Learning one of the most intriguing ideas in machine learning. It aims to learn
information about object categories from one, or only a few training examples.
This process of learning in deep learning is usually accomplished by proper
objective function, i.e; loss function and embeddings extraction i.e;
architecture. In this paper, we discussed about metrics based deep learning
architectures for one shot learning such as Siamese neural networks and present
a method to improve on their accuracy using Kafnets (kernel-based
non-parametric activation functions for neural networks) by learning proper
embeddings with relatively less number of epochs. Using kernel activation
functions, we are able to achieve strong results which exceed those of ReLU
based deep learning models in terms of embeddings structure, loss convergence,
and accuracy.Comment: 15 pages, 8 figure
End-to-End Neural Ad-hoc Ranking with Kernel Pooling
This paper proposes K-NRM, a kernel based neural model for document ranking.
Given a query and a set of documents, K-NRM uses a translation matrix that
models word-level similarities via word embeddings, a new kernel-pooling
technique that uses kernels to extract multi-level soft match features, and a
learning-to-rank layer that combines those features into the final ranking
score. The whole model is trained end-to-end. The ranking layer learns desired
feature patterns from the pairwise ranking loss. The kernels transfer the
feature patterns into soft-match targets at each similarity level and enforce
them on the translation matrix. The word embeddings are tuned accordingly so
that they can produce the desired soft matches. Experiments on a commercial
search engine's query log demonstrate the improvements of K-NRM over prior
feature-based and neural-based states-of-the-art, and explain the source of
K-NRM's advantage: Its kernel-guided embedding encodes a similarity metric
tailored for matching query words to document words, and provides effective
multi-level soft matches
Thin and Deep Gaussian Processes
Gaussian processes (GPs) can provide a principled approach to uncertainty
quantification with easy-to-interpret kernel hyperparameters, such as the
lengthscale, which controls the correlation distance of function values.
However, selecting an appropriate kernel can be challenging. Deep GPs avoid
manual kernel engineering by successively parameterizing kernels with GP
layers, allowing them to learn low-dimensional embeddings of the inputs that
explain the output data. Following the architecture of deep neural networks,
the most common deep GPs warp the input space layer-by-layer but lose all the
interpretability of shallow GPs. An alternative construction is to successively
parameterize the lengthscale of a kernel, improving the interpretability but
ultimately giving away the notion of learning lower-dimensional embeddings.
Unfortunately, both methods are susceptible to particular pathologies which may
hinder fitting and limit their interpretability. This work proposes a novel
synthesis of both previous approaches: Thin and Deep GP (TDGP). Each TDGP layer
defines locally linear transformations of the original input data maintaining
the concept of latent embeddings while also retaining the interpretation of
lengthscales of a kernel. Moreover, unlike the prior solutions, TDGP induces
non-pathological manifolds that admit learning lower-dimensional
representations. We show with theoretical and experimental results that i) TDGP
is, unlike previous models, tailored to specifically discover lower-dimensional
manifolds in the input data, ii) TDGP behaves well when increasing the number
of layers, and iii) TDGP performs well in standard benchmark datasets.Comment: Accepted at the Conference on Neural Information Processing Systems
(NeurIPS) 202
Spectral Analysis of Kernel and Neural Embeddings: Optimization and Generalization
We extend the recent results of (Arora et al. 2019). by spectral analysis of
the representations corresponding to the kernel and neural embeddings. They
showed that in a simple single-layer network, the alignment of the labels to
the eigenvectors of the corresponding Gram matrix determines both the
convergence of the optimization during training as well as the generalization
properties. We generalize their result to the kernel and neural representations
and show these extensions improve both optimization and generalization of the
basic setup studied in (Arora et al. 2019). In particular, we first extend the
setup with the Gaussian kernel and the approximations by random Fourier
features as well as with the embeddings produced by two-layer networks trained
on different tasks. We then study the use of more sophisticated kernels and
embeddings, those designed optimally for deep neural networks and those
developed for the classification task of interest given the data and the
training labels, independent of any specific classification model
- …