Search CORE

2,675 research outputs found

An adaptive multiclass nearest neighbor classifier

Author: Puchkin Nikita
Spokoiny Vladimir
Publication venue: 'EDP Sciences'
Publication date: 03/11/2019
Field of study

We consider a problem of multiclass classification, where the training sample

S_n = \{(X_i, Y_i)\}_{i=1}^n

is generated from the model

\mathbb P(Y = m | X = x) = \eta_m(x)

1 \leq m \leq M

, and

\eta_1(x), \dots, \eta_M(x)

are unknown

\alpha

-Holder continuous functions.Given a test point

X

, our goal is to predict its label. A widely used

\mathsf k

-nearest-neighbors classifier constructs estimates of

\eta_1(X), \dots, \eta_M(X)

and uses a plug-in rule for the prediction. However, it requires a proper choice of the smoothing parameter

\mathsf k

, which may become tricky in some situations. In our solution, we fix several integers

n_1, \dots, n_K

, compute corresponding

n_k

-nearest-neighbor estimates for each

m

and each

n_k

and apply an aggregation procedure. We study an algorithm, which constructs a convex combination of these estimates such that the aggregated estimate behaves approximately as well as an oracle choice. We also provide a non-asymptotic analysis of the procedure, prove its adaptation to the unknown smoothness parameter

\alpha

and to the margin and establish rates of convergence under mild assumptions.Comment: Accepted in ESAIM: Probability & Statistics. The original publication is available at www.esaim-ps.or

arXiv.org e-Print Archive

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

Simultaneous adaptation to the margin and to complexity in classification

Author: Lecué Guillaume
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2005
Field of study

We consider the problem of adaptation to the margin and to complexity in binary classification. We suggest an exponential weighting aggregation scheme. We use this aggregation procedure to construct classifiers which adapt automatically to margin and complexity. Two main examples are worked out in which adaptivity is achieved in frameworks proposed by Steinwart and Scovel [Learning Theory. Lecture Notes in Comput. Sci. 3559 (2005) 279--294. Springer, Berlin; Ann. Statist. 35 (2007) 575--607] and Tsybakov [Ann. Statist. 32 (2004) 135--166]. Adaptive schemes, like ERM or penalized ERM, usually involve a minimization step. This is not the case for our procedure.Comment: Published in at http://dx.doi.org/10.1214/009053607000000055 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

On Rate-Optimal Partitioning Classification from Observable and from Privatised Data

Author: Csáji Balázs Csanád
Györfi László
Tamás Ambrus
Walk Harro
Publication venue
Publication date: 29/02/2024
Field of study

In this paper we revisit the classical method of partitioning classification and study its convergence rate under relaxed conditions, both for observable (non-privatised) and for privatised data. Let the feature vector

X

take values in

\mathbb{R}^d

and denote its label by

Y

. Previous results on the partitioning classifier worked with the strong density assumption, which is restrictive, as we demonstrate through simple examples. We assume that the distribution of

X

is a mixture of an absolutely continuous and a discrete distribution, such that the absolutely continuous component is concentrated to a

d_a

dimensional subspace. Here, we study the problem under much milder assumptions: in addition to the standard Lipschitz and margin conditions, a novel characteristic of the absolutely continuous component is introduced, by which the exact convergence rate of the classification error probability is calculated, both for the binary and for the multi-label cases. Interestingly, this rate of convergence depends only on the intrinsic dimension

d_a

. The privacy constraints mean that the data

(X_1,Y_1), \dots ,(X_n,Y_n)

cannot be directly observed, and the classifiers are functions of the randomised outcome of a suitable local differential privacy mechanism. The statistician is free to choose the form of this privacy mechanism, and here we add Laplace distributed noises to the discontinuations of all possible locations of the feature vector

X_i

and to its label

Y_i

. Again, tight upper bounds on the rate of convergence of the classification error probability are derived, without the strong density assumption, such that this rate depends on

2\,d_a

arXiv.org e-Print Archive

Two Phases of Scaling Laws for Nearest Neighbor Classifiers

Author: Yang Pengkun
Zhang Jingzhao
Publication venue
Publication date: 16/08/2023
Field of study

A scaling law refers to the observation that the test performance of a model improves as the number of training data increases. A fast scaling law implies that one can solve machine learning problems by simply boosting the data and the model sizes. Yet, in many cases, the benefit of adding more data can be negligible. In this work, we study the rate of scaling laws of nearest neighbor classifiers. We show that a scaling law can have two phases: in the first phase, the generalization error depends polynomially on the data dimension and decreases fast; whereas in the second phase, the error depends exponentially on the data dimension and decreases slowly. Our analysis highlights the complexity of the data distribution in determining the generalization error. When the data distributes benignly, our result suggests that nearest neighbor classifier can achieve a generalization error that depends polynomially, instead of exponentially, on the data dimension

arXiv.org e-Print Archive

Statistical inference for the mean outcome under a possibly non-unique optimal treatment strategy

Author: Luedtke Alexander R.
van der Laan Mark J.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 24/03/2016
Field of study

We consider challenges that arise in the estimation of the mean outcome under an optimal individualized treatment strategy defined as the treatment rule that maximizes the population mean outcome, where the candidate treatment rules are restricted to depend on baseline covariates. We prove a necessary and sufficient condition for the pathwise differentiability of the optimal value, a key condition needed to develop a regular and asymptotically linear (RAL) estimator of the optimal value. The stated condition is slightly more general than the previous condition implied in the literature. We then describe an approach to obtain root-

n

rate confidence intervals for the optimal value even when the parameter is not pathwise differentiable. We provide conditions under which our estimator is RAL and asymptotically efficient when the mean outcome is pathwise differentiable. We also outline an extension of our approach to a multiple time point problem. All of our results are supported by simulations.Comment: Published at http://dx.doi.org/10.1214/15-AOS1384 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

eScholarship - University of California

Choice of neighbor order in nearest-neighbor classification

Author: Hall Peter
Park Byeong U.
Samworth Richard J.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 29/10/2008
Field of study

The

k

th-nearest neighbor rule is arguably the simplest and most intuitively appealing nonparametric classification procedure. However, application of this method is inhibited by lack of knowledge about its properties, in particular, about the manner in which it is influenced by the value of

k

; and by the absence of techniques for empirical choice of

k

. In the present paper we detail the way in which the value of

k

determines the misclassification error. We consider two models, Poisson and Binomial, for the training samples. Under the first model, data are recorded in a Poisson stream and are "assigned" to one or other of the two populations in accordance with the prior probabilities. In particular, the total number of data in both training samples is a Poisson-distributed random variable. Under the Binomial model, however, the total number of data in the training samples is fixed, although again each data value is assigned in a random way. Although the values of risk and regret associated with the Poisson and Binomial models are different, they are asymptotically equivalent to first order, and also to the risks associated with kernel-based classifiers that are tailored to the case of two derivatives. These properties motivate new methods for choosing the value of

k

.Comment: Published in at http://dx.doi.org/10.1214/07-AOS537 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref