79 research outputs found

    Simultaneous adaptation to the margin and to complexity in classification

    Get PDF
    We consider the problem of adaptation to the margin and to complexity in binary classification. We suggest an exponential weighting aggregation scheme. We use this aggregation procedure to construct classifiers which adapt automatically to margin and complexity. Two main examples are worked out in which adaptivity is achieved in frameworks proposed by Steinwart and Scovel [Learning Theory. Lecture Notes in Comput. Sci. 3559 (2005) 279--294. Springer, Berlin; Ann. Statist. 35 (2007) 575--607] and Tsybakov [Ann. Statist. 32 (2004) 135--166]. Adaptive schemes, like ERM or penalized ERM, usually involve a minimization step. This is not the case for our procedure.Comment: Published in at http://dx.doi.org/10.1214/009053607000000055 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Lower bounds and aggregation in density estimation

    Full text link
    In this paper we prove the optimality of an aggregation procedure. We prove lower bounds for aggregation of model selection type of MM density estimators for the Kullback-Leiber divergence (KL), the Hellinger's distance and the L_1L\_1-distance. The lower bound, with respect to the KL distance, can be achieved by the on-line type estimate suggested, among others, by Yang (2000). Combining these results, we state that log⁥M/n\log M/n is an optimal rate of aggregation in the sense of Tsybakov (2003), where nn is the sample size

    Classification with Minimax Fast Rates for Classes of Bayes Rules with Sparse Representation

    Get PDF
    We construct a classifier which attains the rate of convergence log⁡n/n\log n/n under sparsity and margin assumptions. An approach close to the one met in approximation theory for the estimation of function is used to obtain this result. The idea is to develop the Bayes rule in a fundamental system of L2([0,1]d)L^2([0,1]^d) made of indicator of dyadic sets and to assume that coefficients, equal to −1,0or1-1,0 {or} 1, belong to a kind of L1−L^1-ball. This assumption can be seen as a sparsity assumption, in the sense that the proportion of coefficients non equal to zero decreases as "frequency" grows. Finally, rates of convergence are obtained by using an usual trade-off between a bias term and a variance term

    Optimal learning with QQ-aggregation

    Full text link
    We consider a general supervised learning problem with strongly convex and Lipschitz loss and study the problem of model selection aggregation. In particular, given a finite dictionary functions (learners) together with the prior, we generalize the results obtained by Dai, Rigollet and Zhang [Ann. Statist. 40 (2012) 1878-1905] for Gaussian regression with squared loss and fixed design to this learning setup. Specifically, we prove that the QQ-aggregation procedure outputs an estimator that satisfies optimal oracle inequalities both in expectation and with high probability. Our proof techniques somewhat depart from traditional proofs by making most of the standard arguments on the Laplace transform of the empirical process to be controlled.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1190 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org
    • 

    corecore