32,516 research outputs found

    Kernel functions based on triplet comparisons

    Full text link
    Given only information in the form of similarity triplets "Object A is more similar to object B than to object C" about a data set, we propose two ways of defining a kernel function on the data set. While previous approaches construct a low-dimensional Euclidean embedding of the data set that reflects the given similarity triplets, we aim at defining kernel functions that correspond to high-dimensional embeddings. These kernel functions can subsequently be used to apply any kernel method to the data set

    Learning from Distributions via Support Measure Machines

    Full text link
    This paper presents a kernel-based discriminative learning framework on probability measures. Rather than relying on large collections of vectorial training examples, our framework learns using a collection of probability distributions that have been constructed to meaningfully represent training data. By representing these probability distributions as mean embeddings in the reproducing kernel Hilbert space (RKHS), we are able to apply many standard kernel-based learning techniques in straightforward fashion. To accomplish this, we construct a generalization of the support vector machine (SVM) called a support measure machine (SMM). Our analyses of SMMs provides several insights into their relationship to traditional SVMs. Based on such insights, we propose a flexible SVM (Flex-SVM) that places different kernel functions on each training example. Experimental results on both synthetic and real-world data demonstrate the effectiveness of our proposed framework.Comment: Advances in Neural Information Processing Systems 2

    Hyperparameter Learning via Distributional Transfer

    Full text link
    Bayesian optimisation is a popular technique for hyperparameter learning but typically requires initial exploration even in cases where similar prior tasks have been solved. We propose to transfer information across tasks using learnt representations of training datasets used in those tasks. This results in a joint Gaussian process model on hyperparameters and data representations. Representations make use of the framework of distribution embeddings into reproducing kernel Hilbert spaces. The developed method has a faster convergence compared to existing baselines, in some cases requiring only a few evaluations of the target objective

    Improving Siamese Networks for One Shot Learning using Kernel Based Activation functions

    Full text link
    The lack of a large amount of training data has always been the constraining factor in solving a lot of problems in machine learning, making One Shot Learning one of the most intriguing ideas in machine learning. It aims to learn information about object categories from one, or only a few training examples. This process of learning in deep learning is usually accomplished by proper objective function, i.e; loss function and embeddings extraction i.e; architecture. In this paper, we discussed about metrics based deep learning architectures for one shot learning such as Siamese neural networks and present a method to improve on their accuracy using Kafnets (kernel-based non-parametric activation functions for neural networks) by learning proper embeddings with relatively less number of epochs. Using kernel activation functions, we are able to achieve strong results which exceed those of ReLU based deep learning models in terms of embeddings structure, loss convergence, and accuracy.Comment: 15 pages, 8 figure

    End-to-End Neural Ad-hoc Ranking with Kernel Pooling

    Full text link
    This paper proposes K-NRM, a kernel based neural model for document ranking. Given a query and a set of documents, K-NRM uses a translation matrix that models word-level similarities via word embeddings, a new kernel-pooling technique that uses kernels to extract multi-level soft match features, and a learning-to-rank layer that combines those features into the final ranking score. The whole model is trained end-to-end. The ranking layer learns desired feature patterns from the pairwise ranking loss. The kernels transfer the feature patterns into soft-match targets at each similarity level and enforce them on the translation matrix. The word embeddings are tuned accordingly so that they can produce the desired soft matches. Experiments on a commercial search engine's query log demonstrate the improvements of K-NRM over prior feature-based and neural-based states-of-the-art, and explain the source of K-NRM's advantage: Its kernel-guided embedding encodes a similarity metric tailored for matching query words to document words, and provides effective multi-level soft matches

    Thin and Deep Gaussian Processes

    Full text link
    Gaussian processes (GPs) can provide a principled approach to uncertainty quantification with easy-to-interpret kernel hyperparameters, such as the lengthscale, which controls the correlation distance of function values. However, selecting an appropriate kernel can be challenging. Deep GPs avoid manual kernel engineering by successively parameterizing kernels with GP layers, allowing them to learn low-dimensional embeddings of the inputs that explain the output data. Following the architecture of deep neural networks, the most common deep GPs warp the input space layer-by-layer but lose all the interpretability of shallow GPs. An alternative construction is to successively parameterize the lengthscale of a kernel, improving the interpretability but ultimately giving away the notion of learning lower-dimensional embeddings. Unfortunately, both methods are susceptible to particular pathologies which may hinder fitting and limit their interpretability. This work proposes a novel synthesis of both previous approaches: Thin and Deep GP (TDGP). Each TDGP layer defines locally linear transformations of the original input data maintaining the concept of latent embeddings while also retaining the interpretation of lengthscales of a kernel. Moreover, unlike the prior solutions, TDGP induces non-pathological manifolds that admit learning lower-dimensional representations. We show with theoretical and experimental results that i) TDGP is, unlike previous models, tailored to specifically discover lower-dimensional manifolds in the input data, ii) TDGP behaves well when increasing the number of layers, and iii) TDGP performs well in standard benchmark datasets.Comment: Accepted at the Conference on Neural Information Processing Systems (NeurIPS) 202

    Spectral Analysis of Kernel and Neural Embeddings: Optimization and Generalization

    Get PDF
    We extend the recent results of (Arora et al. 2019). by spectral analysis of the representations corresponding to the kernel and neural embeddings. They showed that in a simple single-layer network, the alignment of the labels to the eigenvectors of the corresponding Gram matrix determines both the convergence of the optimization during training as well as the generalization properties. We generalize their result to the kernel and neural representations and show these extensions improve both optimization and generalization of the basic setup studied in (Arora et al. 2019). In particular, we first extend the setup with the Gaussian kernel and the approximations by random Fourier features as well as with the embeddings produced by two-layer networks trained on different tasks. We then study the use of more sophisticated kernels and embeddings, those designed optimally for deep neural networks and those developed for the classification task of interest given the data and the training labels, independent of any specific classification model
    corecore