Search CORE

22,593 research outputs found

Stochastic Low-Rank Kernel Learning for Regression

Author: Anthoine Sandrine
Glotin Hervé
Machart Pierre
Peel Thomas
Ralaivola Liva
Publication venue
Publication date: 28/06/2011
Field of study

We present a novel approach to learn a kernel-based regression function. It is based on the useof conical combinations of data-based parameterized kernels and on a new stochastic convex optimization procedure of which we establish convergence guarantees. The overall learning procedure has the nice properties that a) the learned conical combination is automatically designed to perform the regression task at hand and b) the updates implicated by the optimization procedure are quite inexpensive. In order to shed light on the appositeness of our learning strategy, we present empirical results from experiments conducted on various benchmark datasets.Comment: International Conference on Machine Learning (ICML'11), Bellevue (Washington) : United States (2011

arXiv.org e-Print Archive

CiteSeerX

HAL AMU

Regression on fixed-rank positive semidefinite matrices: a Riemannian approach

Author: Bonnabel Silvere
Meyer Gilles
Sepulchre Rodolphe
Publication venue
Publication date: 31/01/2011
Field of study

The paper addresses the problem of learning a regression model parameterized by a fixed-rank positive semidefinite matrix. The focus is on the nonlinear nature of the search space and on scalability to high-dimensional problems. The mathematical developments rely on the theory of gradient descent algorithms adapted to the Riemannian geometry that underlies the set of fixed-rank positive semidefinite matrices. In contrast with previous contributions in the literature, no restrictions are imposed on the range space of the learned matrix. The resulting algorithms maintain a linear complexity in the problem size and enjoy important invariance properties. We apply the proposed algorithms to the problem of learning a distance function parameterized by a positive semidefinite matrix. Good performance is observed on classical benchmarks

arXiv.org e-Print Archive

Open Repository and Bibliography - Liège

Scalable Kernel Methods via Doubly Stochastic Gradients

Author: Balcan Maria-Florina
Dai Bo
He Niao
Liang Yingyu
Raj Anant
Song Le
Xie Bo
Publication venue
Publication date: 10/09/2015
Field of study

The general perception is that kernel methods are not scalable, and neural nets are the methods of choice for nonlinear learning problems. Or have we simply not tried hard enough for kernel methods? Here we propose an approach that scales up kernel methods using a novel concept called "doubly stochastic functional gradients". Our approach relies on the fact that many kernel methods can be expressed as convex optimization problems, and we solve the problems by making two unbiased stochastic approximations to the functional gradient, one using random training points and another using random functions associated with the kernel, and then descending using this noisy functional gradient. We show that a function produced by this procedure after

t

iterations converges to the optimal function in the reproducing kernel Hilbert space in rate

O(1/t)

, and achieves a generalization performance of

O(1/\sqrt{t})

. This doubly stochasticity also allows us to avoid keeping the support vectors and to implement the algorithm in a small memory footprint, which is linear in number of iterations and independent of data dimension. Our approach can readily scale kernel methods up to the regimes which are dominated by neural nets. We show that our method can achieve competitive performance to neural nets in datasets such as 8 million handwritten digits from MNIST, 2.3 million energy materials from MolecularSpace, and 1 million photos from ImageNet.Comment: 32 pages, 22 figure

arXiv.org e-Print Archive

CiteSeerX

Preconditioning Kernel Matrices

Author: Cunningham John P.
Cutajar Kurt
Filippone Maurizio
Osborne Michael A.
Publication venue
Publication date: 01/01/2016
Field of study

The computational and storage complexity of kernel machines presents the primary barrier to their scaling to large, modern, datasets. A common way to tackle the scalability issue is to use the conjugate gradient algorithm, which relieves the constraints on both storage (the kernel matrix need not be stored) and computation (both stochastic gradients and parallelization can be used). Even so, conjugate gradient is not without its own issues: the conditioning of kernel matrices is often such that conjugate gradients will have poor convergence in practice. Preconditioning is a common approach to alleviating this issue. Here we propose preconditioned conjugate gradients for kernel machines, and develop a broad range of preconditioners particularly useful for kernel matrices. We describe a scalable approach to both solving kernel machines and learning their hyperparameters. We show this approach is exact in the limit of iterations and outperforms state-of-the-art approximations for a given computational budget

arXiv.org e-Print Archive

Oxford University Research Archive