26 research outputs found

    Noisy Independent Factor Analysis Model for Density Estimation and Classification

    Get PDF
    We consider the problem of multivariate density estimation when the unknown density is assumed to follow a particular form of dimensionality reduction, a noisy independent factor analysis (IFA) model. In this model the data are generated by a number of latent independent components having unknown distributions and are observed in Gaussian noise. We do not assume that either the number of components or the matrix mixing the components are known. We show that the densities of this form can be estimated with a fast rate. Using the mirror averaging aggregation algorithm, we construct a density estimator which achieves a nearly parametric rate (log1/4 n)/√n, independent of the dimensionality of the data, as the sample size n tends to infinity. This estimator is adaptive to the number of components, their distributions and the mixing matrix. We then apply this density estimator to construct nonparametric plug-in classifiers and show that they achieve the best obtainable rate of the excess Bayes risk, to within a logarithmic factor independent of the dimension of the data. Applications of this classifier to simulated data sets and to real data from a remote sensing experiment show promising results.Financial support from the IAP research network of the Belgian government (Belgian Federal Science Policy) is gratefully acknowledged. Research of A. Samarov was partially supported by NSF grant DMS- 0505561 and by a grant from Singapore-MIT Alliance (CSB). Research of A.B. Tsybakov was partially supported by the grant ANR-06-BLAN-0194 and by the PASCAL Network of Excellence

    A Smirnov-Bickel-Rosenblatt theorem for compactly-supported wavelets

    Full text link
    In nonparametric statistical problems, we wish to find an estimator of an unknown function f. We can split its error into bias and variance terms; Smirnov, Bickel and Rosenblatt have shown that, for a histogram or kernel estimate, the supremum norm of the variance term is asymptotically distributed as a Gumbel random variable. In the following, we prove a version of this result for estimators using compactly-supported wavelets, a popular tool in nonparametric statistics. Our result relies on an assumption on the nature of the wavelet, which must be verified by provably-good numerical approximations. We verify our assumption for Daubechies wavelets and symlets, with N = 6, ..., 20 vanishing moments; larger values of N, and other wavelet bases, are easily checked, and we conjecture that our assumption holds also in those cases

    Uniform in bandwidth exact rates for a class of kernel estimators

    Full text link
    Given an i.i.d sample (Yi,Zi)(Y_i,Z_i), taking values in \RRR^{d'}\times \RRR^d, we consider a collection Nadarya-Watson kernel estimators of the conditional expectations \EEE(+d_g(z)\mid Z=z), where zz belongs to a compact set H\subset \RRR^d, gg a Borel function on \RRR^{d'} and cg(),dg()c_g(\cdot),d_g(\cdot) are continuous functions on \RRR^d. Given two bandwidth sequences h_n<\wth_n fulfilling mild conditions, we obtain an exact and explicit almost sure limit bounds for the deviations of these estimators around their expectations, uniformly in g\in\GG,\;z\in H and h_n\le h\le \wth_n under mild conditions on the density fZf_Z, the class \GG, the kernel KK and the functions cg(),dg()c_g(\cdot),d_g(\cdot). We apply this result to prove that smoothed empirical likelihood can be used to build confidence intervals for conditional probabilities \PPP(Y\in C\mid Z=z), that hold uniformly in z\in H,\; C\in \CC,\; h\in [h_n,\wth_n]. Here \CC is a Vapnik-Chervonenkis class of sets.Comment: Published in the Annals of the Institute of Statistical Mathematics Volume 63, p. 1077-1102 (2011

    Adaptive Density Estimation on the Circle by Nearly-Tight Frames

    Full text link
    This work is concerned with the study of asymptotic properties of nonparametric density estimates in the framework of circular data. The estimation procedure here applied is based on wavelet thresholding methods: the wavelets used are the so-called Mexican needlets, which describe a nearly-tight frame on the circle. We study the asymptotic behaviour of the L2L^{2}-risk function for these estimates, in particular its adaptivity, proving that its rate of convergence is nearly optimal.Comment: 30 pages, 3 figure

    Tight Lower Bound for Linear Sketches of Moments

    Get PDF
    The problem of estimating frequency moments of a data stream has attracted a lot of attention since the onset of streaming algorithms [AMS99]. While the space complexity for approximately computing the pthp^{\rm th} moment, for p(0,2]p\in(0,2] has been settled [KNW10], for p>2p>2 the exact complexity remains open. For p>2p>2 the current best algorithm uses O(n12/plogn)O(n^{1-2/p}\log n) words of space [AKO11,BO10], whereas the lower bound is of Ω(n12/p)\Omega(n^{1-2/p}) [BJKS04]. In this paper, we show a tight lower bound of Ω(n12/plogn)\Omega(n^{1-2/p}\log n) words for the class of algorithms based on linear sketches, which store only a sketch AxAx of input vector xx and some (possibly randomized) matrix AA. We note that all known algorithms for this problem are linear sketches.Comment: In Proceedings of the 40th International Colloquium on Automata, Languages and Programming (ICALP), Riga, Latvia, July 201

    Introduction à l'estimation non-paramétrique

    No full text
    Collection : Mathématiques & Applications n°41

    Improved Matrix Uncertainty selector

    No full text

    Estimation of support of a probability density and estimation of support functionals

    No full text
    The problem of estimating the unknown support G [ belong ] [ R^N ] of a uniform density is considered under the assumption that the support G belongs to the class of "boundary fragments" with smooth upper surface. The minimax lower bounds for the accuracy of arbitrary estimators of G are obtained if the distance between sets is Hausdorff metric or measure of symmetric difference. The estimators of support are proposed which are optimal in the sense that they attain the convergence rate of the minimax lower bound. Similar results are proved for the problem of estima.tion of functionals of the density support.
    corecore