32 research outputs found
Noisy Independent Factor Analysis Model for Density Estimation and Classification
We consider the problem of multivariate density estimation when the unknown density is assumed to follow a particular form of dimensionality reduction, a noisy independent factor analysis (IFA) model. In this model the data are generated by a number of latent independent components having unknown distributions and are observed in Gaussian noise. We do not assume that either the number of components or the matrix mixing the components are known. We show that the densities of this form can be estimated with a fast rate. Using the mirror averaging aggregation algorithm, we construct a density estimator which achieves a nearly parametric rate (log1/4 n)/√n, independent of the dimensionality of the data, as the sample size n tends to infinity. This estimator is adaptive to the number of components, their distributions and the mixing matrix. We then apply this density estimator to construct nonparametric plug-in classifiers and show that they achieve the best obtainable rate of the excess Bayes risk, to within a logarithmic factor independent of the dimension of the data. Applications of this classifier to simulated data sets and to real data from a remote sensing experiment show promising results.Financial support from the IAP research network of the Belgian government (Belgian Federal Science
Policy) is gratefully acknowledged. Research of A. Samarov was partially supported by NSF grant DMS-
0505561 and by a grant from Singapore-MIT Alliance (CSB). Research of A.B. Tsybakov was partially
supported by the grant ANR-06-BLAN-0194 and by the PASCAL Network of Excellence
A Smirnov-Bickel-Rosenblatt theorem for compactly-supported wavelets
In nonparametric statistical problems, we wish to find an estimator of an
unknown function f. We can split its error into bias and variance terms;
Smirnov, Bickel and Rosenblatt have shown that, for a histogram or kernel
estimate, the supremum norm of the variance term is asymptotically distributed
as a Gumbel random variable. In the following, we prove a version of this
result for estimators using compactly-supported wavelets, a popular tool in
nonparametric statistics. Our result relies on an assumption on the nature of
the wavelet, which must be verified by provably-good numerical approximations.
We verify our assumption for Daubechies wavelets and symlets, with N = 6, ...,
20 vanishing moments; larger values of N, and other wavelet bases, are easily
checked, and we conjecture that our assumption holds also in those cases
Uniform in bandwidth exact rates for a class of kernel estimators
Given an i.i.d sample , taking values in \RRR^{d'}\times \RRR^d,
we consider a collection Nadarya-Watson kernel estimators of the conditional
expectations \EEE(+d_g(z)\mid Z=z), where belongs to a
compact set H\subset \RRR^d, a Borel function on \RRR^{d'} and
are continuous functions on \RRR^d. Given two
bandwidth sequences h_n<\wth_n fulfilling mild conditions, we obtain an exact
and explicit almost sure limit bounds for the deviations of these estimators
around their expectations, uniformly in g\in\GG,\;z\in H and h_n\le h\le
\wth_n under mild conditions on the density , the class \GG, the kernel
and the functions . We apply this result to prove
that smoothed empirical likelihood can be used to build confidence intervals
for conditional probabilities \PPP(Y\in C\mid Z=z), that hold uniformly in
z\in H,\; C\in \CC,\; h\in [h_n,\wth_n]. Here \CC is a Vapnik-Chervonenkis
class of sets.Comment: Published in the Annals of the Institute of Statistical Mathematics
Volume 63, p. 1077-1102 (2011
Adaptive Density Estimation on the Circle by Nearly-Tight Frames
This work is concerned with the study of asymptotic properties of
nonparametric density estimates in the framework of circular data. The
estimation procedure here applied is based on wavelet thresholding methods: the
wavelets used are the so-called Mexican needlets, which describe a nearly-tight
frame on the circle. We study the asymptotic behaviour of the -risk
function for these estimates, in particular its adaptivity, proving that its
rate of convergence is nearly optimal.Comment: 30 pages, 3 figure
Tight Lower Bound for Linear Sketches of Moments
The problem of estimating frequency moments of a data stream has attracted a
lot of attention since the onset of streaming algorithms [AMS99]. While the
space complexity for approximately computing the moment, for
has been settled [KNW10], for the exact complexity remains
open. For the current best algorithm uses words of
space [AKO11,BO10], whereas the lower bound is of [BJKS04].
In this paper, we show a tight lower bound of words
for the class of algorithms based on linear sketches, which store only a sketch
of input vector and some (possibly randomized) matrix . We note
that all known algorithms for this problem are linear sketches.Comment: In Proceedings of the 40th International Colloquium on Automata,
Languages and Programming (ICALP), Riga, Latvia, July 201
Introduction à l'estimation non-paramétrique
Collection : Mathématiques & Applications n°41