3,293 research outputs found

    Classification via local multi-resolution projections

    Full text link
    We focus on the supervised binary classification problem, which consists in guessing the label YY associated to a co-variate XRdX \in \R^d, given a set of nn independent and identically distributed co-variates and associated labels (Xi,Yi)(X_i,Y_i). We assume that the law of the random vector (X,Y)(X,Y) is unknown and the marginal law of XX admits a density supported on a set \A. In the particular case of plug-in classifiers, solving the classification problem boils down to the estimation of the regression function \eta(X) = \Exp[Y|X]. Assuming first \A to be known, we show how it is possible to construct an estimator of η\eta by localized projections onto a multi-resolution analysis (MRA). In a second step, we show how this estimation procedure generalizes to the case where \A is unknown. Interestingly, this novel estimation procedure presents similar theoretical performances as the celebrated local-polynomial estimator (LPE). In addition, it benefits from the lattice structure of the underlying MRA and thus outperforms the LPE from a computational standpoint, which turns out to be a crucial feature in many practical applications. Finally, we prove that the associated plug-in classifier can reach super-fast rates under a margin assumption.Comment: 38 pages, 6 figure

    General maximum likelihood empirical Bayes estimation of normal means

    Full text link
    We propose a general maximum likelihood empirical Bayes (GMLEB) method for the estimation of a mean vector based on observations with i.i.d. normal errors. We prove that under mild moment conditions on the unknown means, the average mean squared error (MSE) of the GMLEB is within an infinitesimal fraction of the minimum average MSE among all separable estimators which use a single deterministic estimating function on individual observations, provided that the risk is of greater order than (logn)5/n(\log n)^5/n. We also prove that the GMLEB is uniformly approximately minimax in regular and weak p\ell_p balls when the order of the length-normalized norm of the unknown means is between (logn)κ1/n1/(p2)(\log n)^{\kappa_1}/n^{1/(p\wedge2)} and n/(logn)κ2n/(\log n)^{\kappa_2}. Simulation experiments demonstrate that the GMLEB outperforms the James--Stein and several state-of-the-art threshold estimators in a wide range of settings without much down side.Comment: Published in at http://dx.doi.org/10.1214/08-AOS638 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org
    corecore