96 research outputs found
Learnability of Gaussians with flexible variances
Copyright © 2007 Yiming Ying and
Ding-Xuan ZhouGaussian kernels with flexible variances provide a rich family of Mercer kernels for learning algorithms. We show that the union of the unit balls of reproducing kernel Hilbert spaces generated by Gaussian kernels with fexible variances is a uniform Glivenko-Cantelli (uGC) class. This result confirms a conjecture concerning learnability of Gaussian kernels and verifies the uniform convergence of many learning algorithms involving Gaussians with changing variances. Rademacher averages and empirical covering numbers are used to estimate sample errors of multi-kernel regularization schemes associated with general loss functions. It is then shown that the regularization error associated with the least square loss and the Gaussian kernels can be greatly improved when °exible variances are allowed. Finally, for regularization schemes generated by Gaussian kernels with fexible variances we present explicit learning rates for regression with least square loss and classification with hinge loss
Nonlinear Approximation Using Gaussian Kernels
It is well-known that non-linear approximation has an advantage over linear
schemes in the sense that it provides comparable approximation rates to those
of the linear schemes, but to a larger class of approximands. This was
established for spline approximations and for wavelet approximations, and more
recently by DeVore and Ron for homogeneous radial basis function (surface
spline) approximations. However, no such results are known for the Gaussian
function, the preferred kernel in machine learning and several engineering
problems. We introduce and analyze in this paper a new algorithm for
approximating functions using translates of Gaussian functions with varying
tension parameters. At heart it employs the strategy for nonlinear
approximation of DeVore and Ron, but it selects kernels by a method that is not
straightforward. The crux of the difficulty lies in the necessity to vary the
tension parameter in the Gaussian function spatially according to local
information about the approximand: error analysis of Gaussian approximation
schemes with varying tension are, by and large, an elusive target for
approximators. We show that our algorithm is suitably optimal in the sense that
it provides approximation rates similar to other established nonlinear
methodologies like spline and wavelet approximations. As expected and desired,
the approximation rates can be as high as needed and are essentially saturated
only by the smoothness of the approximand.Comment: 15 Pages; corrected typos; to appear in J. Funct. Ana
Sketching for Large-Scale Learning of Mixture Models
Learning parameters from voluminous data can be prohibitive in terms of
memory and computational requirements. We propose a "compressive learning"
framework where we estimate model parameters from a sketch of the training
data. This sketch is a collection of generalized moments of the underlying
probability distribution of the data. It can be computed in a single pass on
the training set, and is easily computable on streams or distributed datasets.
The proposed framework shares similarities with compressive sensing, which aims
at drastically reducing the dimension of high-dimensional signals while
preserving the ability to reconstruct them. To perform the estimation task, we
derive an iterative algorithm analogous to sparse reconstruction algorithms in
the context of linear inverse problems. We exemplify our framework with the
compressive estimation of a Gaussian Mixture Model (GMM), providing heuristics
on the choice of the sketching procedure and theoretical guarantees of
reconstruction. We experimentally show on synthetic data that the proposed
algorithm yields results comparable to the classical Expectation-Maximization
(EM) technique while requiring significantly less memory and fewer computations
when the number of database elements is large. We further demonstrate the
potential of the approach on real large-scale data (over 10 8 training samples)
for the task of model-based speaker verification. Finally, we draw some
connections between the proposed framework and approximate Hilbert space
embedding of probability distributions using random features. We show that the
proposed sketching operator can be seen as an innovative method to design
translation-invariant kernels adapted to the analysis of GMMs. We also use this
theoretical framework to derive information preservation guarantees, in the
spirit of infinite-dimensional compressive sensing
A super-polynomial lower bound for learning nonparametric mixtures
We study the problem of learning nonparametric distributions in a finite
mixture, and establish a super-polynomial lower bound on the sample complexity
of learning the component distributions in such models. Namely, we are given
i.i.d. samples from where and we are interested in learning each component .
Without any assumptions on , this problem is ill-posed. In order to
identify the components , we assume that each can be written as a
convolution of a Gaussian and a compactly supported density with
. Our main result shows
that
samples are required for estimating each . The proof relies on a fast rate
for approximation with Gaussians, which may be of independent interest. This
result has important implications for the hardness of learning more general
nonparametric latent variable models that arise in machine learning
applications
- âŠ