Search CORE

96 research outputs found

Learnability of Gaussians with flexible variances

Author: Ying Yiming
Zhou Ding-Xuan
Publication venue: Microtome Publishing
Publication date: 22/07/2013
Field of study

Copyright © 2007 Yiming Ying and Ding-Xuan ZhouGaussian kernels with flexible variances provide a rich family of Mercer kernels for learning algorithms. We show that the union of the unit balls of reproducing kernel Hilbert spaces generated by Gaussian kernels with fexible variances is a uniform Glivenko-Cantelli (uGC) class. This result confirms a conjecture concerning learnability of Gaussian kernels and verifies the uniform convergence of many learning algorithms involving Gaussians with changing variances. Rademacher averages and empirical covering numbers are used to estimate sample errors of multi-kernel regularization schemes associated with general loss functions. It is then shown that the regularization error associated with the least square loss and the Gaussian kernels can be greatly improved when °exible variances are allowed. Finally, for regularization schemes generated by Gaussian kernels with fexible variances we present explicit learning rates for regression with least square loss and classification with hinge loss

Open Research Exeter

Nonlinear Approximation Using Gaussian Kernels

Author: Hangelbroek Thomas
Ron Amos
Publication venue: 'Elsevier BV'
Publication date: 02/02/2010
Field of study

It is well-known that non-linear approximation has an advantage over linear schemes in the sense that it provides comparable approximation rates to those of the linear schemes, but to a larger class of approximands. This was established for spline approximations and for wavelet approximations, and more recently by DeVore and Ron for homogeneous radial basis function (surface spline) approximations. However, no such results are known for the Gaussian function, the preferred kernel in machine learning and several engineering problems. We introduce and analyze in this paper a new algorithm for approximating functions using translates of Gaussian functions with varying tension parameters. At heart it employs the strategy for nonlinear approximation of DeVore and Ron, but it selects kernels by a method that is not straightforward. The crux of the difficulty lies in the necessity to vary the tension parameter in the Gaussian function spatially according to local information about the approximand: error analysis of Gaussian approximation schemes with varying tension are, by and large, an elusive target for approximators. We show that our algorithm is suitably optimal in the sense that it provides approximation rates similar to other established nonlinear methodologies like spline and wavelet approximations. As expected and desired, the approximation rates can be as high as needed and are essentially saturated only by the smoothness of the approximand.Comment: 15 Pages; corrected typos; to appear in J. Funct. Ana

arXiv.org e-Print Archive

CiteSeerX

Elsevier - Publisher Connector

Sketching for Large-Scale Learning of Mixture Models

Author: Bourrier Anthony
Gribonval Rémi
Keriven Nicolas
Pérez Patrick
Publication venue
Publication date: 20/03/2016
Field of study

Learning parameters from voluminous data can be prohibitive in terms of memory and computational requirements. We propose a "compressive learning" framework where we estimate model parameters from a sketch of the training data. This sketch is a collection of generalized moments of the underlying probability distribution of the data. It can be computed in a single pass on the training set, and is easily computable on streams or distributed datasets. The proposed framework shares similarities with compressive sensing, which aims at drastically reducing the dimension of high-dimensional signals while preserving the ability to reconstruct them. To perform the estimation task, we derive an iterative algorithm analogous to sparse reconstruction algorithms in the context of linear inverse problems. We exemplify our framework with the compressive estimation of a Gaussian Mixture Model (GMM), providing heuristics on the choice of the sketching procedure and theoretical guarantees of reconstruction. We experimentally show on synthetic data that the proposed algorithm yields results comparable to the classical Expectation-Maximization (EM) technique while requiring significantly less memory and fewer computations when the number of database elements is large. We further demonstrate the potential of the approach on real large-scale data (over 10 8 training samples) for the task of model-based speaker verification. Finally, we draw some connections between the proposed framework and approximate Hilbert space embedding of probability distributions using random features. We show that the proposed sketching operator can be seen as an innovative method to design translation-invariant kernels adapted to the analysis of GMMs. We also use this theoretical framework to derive information preservation guarantees, in the spirit of infinite-dimensional compressive sensing

arXiv.org e-Print Archive

HAL-CentraleSupelec

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL-Rennes 1

A super-polynomial lower bound for learning nonparametric mixtures

Author: Aragam Bryon
Tai Wai Ming
Publication venue
Publication date: 28/03/2022
Field of study

We study the problem of learning nonparametric distributions in a finite mixture, and establish a super-polynomial lower bound on the sample complexity of learning the component distributions in such models. Namely, we are given i.i.d. samples from

f

where

f=\sum_{i=1}^k w_i f_i, \quad\sum_{i=1}^k w_i=1, \quad w_i>0

and we are interested in learning each component

f_i

. Without any assumptions on

f_i

, this problem is ill-posed. In order to identify the components

f_i

, we assume that each

f_i

can be written as a convolution of a Gaussian and a compactly supported density

\nu_i

with

\text{supp}(\nu_i)\cap \text{supp}(\nu_j)=\emptyset

. Our main result shows that

\Omega((\frac{1}{\varepsilon})^{C\log\log \frac{1}{\varepsilon}})

samples are required for estimating each

f_i

. The proof relies on a fast rate for approximation with Gaussians, which may be of independent interest. This result has important implications for the hardness of learning more general nonparametric latent variable models that arise in machine learning applications

arXiv.org e-Print Archive