research

Sparse classification boundaries

Abstract

Given a training sample of size mm from a dd-dimensional population, we wish to allocate a new observation ZRdZ\in \R^d to this population or to the noise. We suppose that the difference between the distribution of the population and that of the noise is only in a shift, which is a sparse vector. For the Gaussian noise, fixed sample size mm, and the dimension dd that tends to infinity, we obtain the sharp classification boundary and we propose classifiers attaining this boundary. We also give extensions of this result to the case where the sample size mm depends on dd and satisfies the condition (logm)/logdγ(\log m)/\log d \to \gamma, 0γ<10\le \gamma<1, and to the case of non-Gaussian noise satisfying the Cram\'er condition

    Similar works