Search CORE

8 research outputs found

Statistical and Computational Trade-Offs in Kernel K-Means

Author: Calandriello D
Rosasco L
Publication venue: place:10010 NORTH TORREY PINES RD, LA JOLLA, CALIFORNIA 92037 USA
Publication date: 01/01/2018
Field of study

We investigate the efficiency of k-means in terms of both statistical and computational requirements. More precisely, we study a Nystrom approach to kernel k-means. We analyze the statistical properties of the proposed method and show that it achieves the same accuracy of exact kernel k-means with only a fraction of computations. Indeed, we prove under basic assumptions that sampling oot pn Nystrom landmarks allows to greatly reduce computational costs without incurring in any loss of accuracy. To the best of our knowledge this is the first result of this kind for unsupervised learning

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Genova

Gaussian Process Optimization with Adaptive Sketching: Scalable and No Regret

Author: Calandriello Daniele
Carratino Luigi
Lazaric Alessandro
Rosasco Lorenzo
Valko Michal
Publication venue
Publication date: 01/01/2019
Field of study

Gaussian processes (GP) are a well studied Bayesian approach for the optimization of black-box functions. Despite their effectiveness in simple problems, GP-based algorithms hardly scale to high-dimensional functions, as their per-iteration time and space cost is at least quadratic in the number of dimensions

d

and iterations

t

. Given a set of

A

alternatives to choose from, the overall runtime

O(t^3A)

is prohibitive. In this paper we introduce BKB (budgeted kernelized bandit), a new approximate GP algorithm for optimization under bandit feedback that achieves near-optimal regret (and hence near-optimal convergence rate) with near-constant per-iteration complexity and remarkably no assumption on the input space or covariance of the GP. We combine a kernelized linear bandit algorithm (GP-UCB) with randomized matrix sketching based on leverage score sampling, and we prove that randomly sampling inducing points based on their posterior variance gives an accurate low-rank approximation of the GP, preserving variance estimates and confidence intervals. As a consequence, BKB does not suffer from variance starvation, an important problem faced by many previous sparse GP approximations. Moreover, we show that our procedure selects at most

\tilde{O}(d_{eff})

points, where

d_{eff}

is the effective dimension of the explored space, which is typically much smaller than both

d

and

t

. This greatly reduces the dimensionality of the problem, thus leading to a

O(TAd_{eff}^2)

runtime and

O(A d_{eff})

space complexity.Comment: Accepted at COLT 2019. Corrected typos and improved comparison with existing method

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

HAL-Rennes 1

On Generalization Bounds for Projective Clustering

Author: Bucarelli Maria Sofia
Larsen Matilde Fjeldsø
Schwiegelshohn Chris
Toftrup Mads Bech
Publication venue
Publication date: 13/10/2023
Field of study

Given a set of points, clustering consists of finding a partition of a point set into

k

clusters such that the center to which a point is assigned is as close as possible. Most commonly, centers are points themselves, which leads to the famous

k

-median and

k

-means objectives. One may also choose centers to be

j

dimensional subspaces, which gives rise to subspace clustering. In this paper, we consider learning bounds for these problems. That is, given a set of

n

samples

P

drawn independently from some unknown, but fixed distribution

\mathcal{D}

, how quickly does a solution computed on

P

converge to the optimal clustering of

\mathcal{D}

? We give several near optimal results. In particular, For center-based objectives, we show a convergence rate of

\tilde{O}\left(\sqrt{{k}/{n}}\right)

. This matches the known optimal bounds of [Fefferman, Mitter, and Narayanan, Journal of the Mathematical Society 2016] and [Bartlett, Linder, and Lugosi, IEEE Trans. Inf. Theory 1998] for

k

-means and extends it to other important objectives such as

k

-median. For subspace clustering with

j

-dimensional subspaces, we show a convergence rate of

\tilde{O}\left(\sqrt{\frac{kj^2}{n}}\right)

. These are the first provable bounds for most of these problems. For the specific case of projective clustering, which generalizes

k

-means, we show a convergence rate of

\Omega\left(\sqrt{\frac{kj}{n}}\right)

is necessary, thereby proving that the bounds from [Fefferman, Mitter, and Narayanan, Journal of the Mathematical Society 2016] are essentially optimal

arXiv.org e-Print Archive

Anomaly detection on trigger-less muon data streams

Author: LAI NICOLÒ
Publication venue
Publication date: 13/09/2023
Field of study

open

Padua Thesis and Dissertation Archive

Resource Efficient Large-Scale Machine Learning

Author: CARRATINO LUIGI
Publication venue: Universit\ue0 degli studi di Genova
Publication date: 20/03/2020
Field of study

Non-parametric models provide a principled way to learn non-linear functions. In particular, kernel methods are accurate prediction tools that rely on solid theoretical foundations. Although they enjoy optimal statistical properties, they have limited applicability in real-world large-scale scenarios because of their stringent computational requirements in terms of time and memory. Indeed their computational costs scale at least quadratically with the number of points of the dataset and many of the modern machine learning challenges requires training on datasets of millions if not billions of points. In this thesis, we focus on scaling kernel methods, developing novel algorithmic solutions that incorporate budgeted computations. To derive these algorithms we mix ideas from statistics, optimization, and randomized linear algebra. We study the statistical and computational trade-offs for various non-parametric models, the key component to derive numerical solutions with resources tailored to the statistical accuracy allowed by the data. In particular, we study the estimator defined by stochastic gradients and random features, showing how all the free parameters provably govern both the statistical properties and the computational complexity of the algorithm. We then see how to blend the Nystr\uf6m approximation and preconditioned conjugate gradient to derive a provably statistically optimal solver that can easily scale on datasets of millions of points on a single machine. We also derive a provably accurate leverage score sampling algorithm that can further improve the latter solver. Finally, we see how the Nystr\uf6m approximation with leverage scores can be used to scale Gaussian processes in a bandit optimization setting deriving a provably accurate algorithm. The theoretical analysis and the new algorithms presented in this work represent a step towards building a new generation of efficient non-parametric algorithms with minimal time and memory footprints

Archivio istituzionale della ricerca - Università di Genova