2,675 research outputs found
An adaptive multiclass nearest neighbor classifier
We consider a problem of multiclass classification, where the training sample
is generated from the model , , and are
unknown -Holder continuous functions.Given a test point , our goal
is to predict its label. A widely used -nearest-neighbors classifier
constructs estimates of and uses a plug-in rule
for the prediction. However, it requires a proper choice of the smoothing
parameter , which may become tricky in some situations. In our
solution, we fix several integers , compute corresponding
-nearest-neighbor estimates for each and each and apply an
aggregation procedure. We study an algorithm, which constructs a convex
combination of these estimates such that the aggregated estimate behaves
approximately as well as an oracle choice. We also provide a non-asymptotic
analysis of the procedure, prove its adaptation to the unknown smoothness
parameter and to the margin and establish rates of convergence under
mild assumptions.Comment: Accepted in ESAIM: Probability & Statistics. The original publication
is available at www.esaim-ps.or
Simultaneous adaptation to the margin and to complexity in classification
We consider the problem of adaptation to the margin and to complexity in
binary classification. We suggest an exponential weighting aggregation scheme.
We use this aggregation procedure to construct classifiers which adapt
automatically to margin and complexity. Two main examples are worked out in
which adaptivity is achieved in frameworks proposed by Steinwart and Scovel
[Learning Theory. Lecture Notes in Comput. Sci. 3559 (2005) 279--294. Springer,
Berlin; Ann. Statist. 35 (2007) 575--607] and Tsybakov [Ann. Statist. 32 (2004)
135--166]. Adaptive schemes, like ERM or penalized ERM, usually involve a
minimization step. This is not the case for our procedure.Comment: Published in at http://dx.doi.org/10.1214/009053607000000055 the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
On Rate-Optimal Partitioning Classification from Observable and from Privatised Data
In this paper we revisit the classical method of partitioning classification
and study its convergence rate under relaxed conditions, both for observable
(non-privatised) and for privatised data. Let the feature vector take
values in and denote its label by . Previous results on the
partitioning classifier worked with the strong density assumption, which is
restrictive, as we demonstrate through simple examples. We assume that the
distribution of is a mixture of an absolutely continuous and a discrete
distribution, such that the absolutely continuous component is concentrated to
a dimensional subspace. Here, we study the problem under much milder
assumptions: in addition to the standard Lipschitz and margin conditions, a
novel characteristic of the absolutely continuous component is introduced, by
which the exact convergence rate of the classification error probability is
calculated, both for the binary and for the multi-label cases. Interestingly,
this rate of convergence depends only on the intrinsic dimension .
The privacy constraints mean that the data
cannot be directly observed, and the classifiers are functions of the
randomised outcome of a suitable local differential privacy mechanism. The
statistician is free to choose the form of this privacy mechanism, and here we
add Laplace distributed noises to the discontinuations of all possible
locations of the feature vector and to its label . Again, tight
upper bounds on the rate of convergence of the classification error probability
are derived, without the strong density assumption, such that this rate depends
on
Two Phases of Scaling Laws for Nearest Neighbor Classifiers
A scaling law refers to the observation that the test performance of a model
improves as the number of training data increases. A fast scaling law implies
that one can solve machine learning problems by simply boosting the data and
the model sizes. Yet, in many cases, the benefit of adding more data can be
negligible. In this work, we study the rate of scaling laws of nearest neighbor
classifiers. We show that a scaling law can have two phases: in the first
phase, the generalization error depends polynomially on the data dimension and
decreases fast; whereas in the second phase, the error depends exponentially on
the data dimension and decreases slowly. Our analysis highlights the complexity
of the data distribution in determining the generalization error. When the data
distributes benignly, our result suggests that nearest neighbor classifier can
achieve a generalization error that depends polynomially, instead of
exponentially, on the data dimension
Statistical inference for the mean outcome under a possibly non-unique optimal treatment strategy
We consider challenges that arise in the estimation of the mean outcome under
an optimal individualized treatment strategy defined as the treatment rule that
maximizes the population mean outcome, where the candidate treatment rules are
restricted to depend on baseline covariates. We prove a necessary and
sufficient condition for the pathwise differentiability of the optimal value, a
key condition needed to develop a regular and asymptotically linear (RAL)
estimator of the optimal value. The stated condition is slightly more general
than the previous condition implied in the literature. We then describe an
approach to obtain root- rate confidence intervals for the optimal value
even when the parameter is not pathwise differentiable. We provide conditions
under which our estimator is RAL and asymptotically efficient when the mean
outcome is pathwise differentiable. We also outline an extension of our
approach to a multiple time point problem. All of our results are supported by
simulations.Comment: Published at http://dx.doi.org/10.1214/15-AOS1384 in the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Choice of neighbor order in nearest-neighbor classification
The th-nearest neighbor rule is arguably the simplest and most intuitively
appealing nonparametric classification procedure. However, application of this
method is inhibited by lack of knowledge about its properties, in particular,
about the manner in which it is influenced by the value of ; and by the
absence of techniques for empirical choice of . In the present paper we
detail the way in which the value of determines the misclassification
error. We consider two models, Poisson and Binomial, for the training samples.
Under the first model, data are recorded in a Poisson stream and are "assigned"
to one or other of the two populations in accordance with the prior
probabilities. In particular, the total number of data in both training samples
is a Poisson-distributed random variable. Under the Binomial model, however,
the total number of data in the training samples is fixed, although again each
data value is assigned in a random way. Although the values of risk and regret
associated with the Poisson and Binomial models are different, they are
asymptotically equivalent to first order, and also to the risks associated with
kernel-based classifiers that are tailored to the case of two derivatives.
These properties motivate new methods for choosing the value of .Comment: Published in at http://dx.doi.org/10.1214/07-AOS537 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
- …