This paper studies ℓ1 regularization with high-dimensional features for
support vector machines with a built-in reject option (meaning that the
decision of classifying an observation can be withheld at a cost lower than
that of misclassification). The procedure can be conveniently implemented as a
linear program and computed using standard software. We prove that the
minimizer of the penalized population risk favors sparse solutions and show
that the behavior of the empirical risk minimizer mimics that of the population
risk minimizer. We also introduce a notion of classification complexity and
prove that our minimizers adapt to the unknown complexity. Using a novel oracle
inequality for the excess risk, we identify situations where fast rates of
convergence occur.Comment: Published in at http://dx.doi.org/10.3150/10-BEJ320 the Bernoulli
(http://isi.cbs.nl/bernoulli/) by the International Statistical
Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm