26,758 research outputs found
Bandwidth choice for nonparametric classification
It is shown that, for kernel-based classification with univariate
distributions and two populations, optimal bandwidth choice has a dichotomous
character. If the two densities cross at just one point, where their curvatures
have the same signs, then minimum Bayes risk is achieved using bandwidths which
are an order of magnitude larger than those which minimize pointwise estimation
error. On the other hand, if the curvature signs are different, or if there are
multiple crossing points, then bandwidths of conventional size are generally
appropriate. The range of different modes of behavior is narrower in
multivariate settings. There, the optimal size of bandwidth is generally the
same as that which is appropriate for pointwise density estimation. These
properties motivate empirical rules for bandwidth choice.Comment: Published at http://dx.doi.org/10.1214/009053604000000959 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Bandwidth choice for nonparametric classification
It is shown that, for kernel-based classification with univariate
distributions and two populations, optimal bandwidth choice has a dichotomous
character. If the two densities cross at just one point, where their curvatures
have the same signs, then minimum Bayes risk is achieved using bandwidths which
are an order of magnitude larger than those which minimize pointwise estimation
error. On the other hand, if the curvature signs are different, or if there are
multiple crossing points, then bandwidths of conventional size are generally
appropriate. The range of different modes of behavior is narrower in
multivariate settings. There, the optimal size of bandwidth is generally the
same as that which is appropriate for pointwise density estimation. These
properties motivate empirical rules for bandwidth choice
Altitude Training: Strong Bounds for Single-Layer Dropout
Dropout training, originally designed for deep neural networks, has been
successful on high-dimensional single-layer natural language tasks. This paper
proposes a theoretical explanation for this phenomenon: we show that, under a
generative Poisson topic model with long documents, dropout training improves
the exponent in the generalization bound for empirical risk minimization.
Dropout achieves this gain much like a marathon runner who practices at
altitude: once a classifier learns to perform reasonably well on training
examples that have been artificially corrupted by dropout, it will do very well
on the uncorrupted test set. We also show that, under similar conditions,
dropout preserves the Bayes decision boundary and should therefore induce
minimal bias in high dimensions.Comment: Advances in Neural Information Processing Systems (NIPS), 201
On false discovery rate thresholding for classification under sparsity
We study the properties of false discovery rate (FDR) thresholding, viewed as
a classification procedure. The "0"-class (null) is assumed to have a known
density while the "1"-class (alternative) is obtained from the "0"-class either
by translation or by scaling. Furthermore, the "1"-class is assumed to have a
small number of elements w.r.t. the "0"-class (sparsity). We focus on densities
of the Subbotin family, including Gaussian and Laplace models. Nonasymptotic
oracle inequalities are derived for the excess risk of FDR thresholding. These
inequalities lead to explicit rates of convergence of the excess risk to zero,
as the number m of items to be classified tends to infinity and in a regime
where the power of the Bayes rule is away from 0 and 1. Moreover, these
theoretical investigations suggest an explicit choice for the target level
of FDR thresholding, as a function of m. Our oracle inequalities
show theoretically that the resulting FDR thresholding adapts to the unknown
sparsity regime contained in the data. This property is illustrated with
numerical experiments
Feature Augmentation via Nonparametrics and Selection (FANS) in High Dimensional Classification
We propose a high dimensional classification method that involves
nonparametric feature augmentation. Knowing that marginal density ratios are
the most powerful univariate classifiers, we use the ratio estimates to
transform the original feature measurements. Subsequently, penalized logistic
regression is invoked, taking as input the newly transformed or augmented
features. This procedure trains models equipped with local complexity and
global simplicity, thereby avoiding the curse of dimensionality while creating
a flexible nonlinear decision boundary. The resulting method is called Feature
Augmentation via Nonparametrics and Selection (FANS). We motivate FANS by
generalizing the Naive Bayes model, writing the log ratio of joint densities as
a linear combination of those of marginal densities. It is related to
generalized additive models, but has better interpretability and computability.
Risk bounds are developed for FANS. In numerical analysis, FANS is compared
with competing methods, so as to provide a guideline on its best application
domain. Real data analysis demonstrates that FANS performs very competitively
on benchmark email spam and gene expression data sets. Moreover, FANS is
implemented by an extremely fast algorithm through parallel computing.Comment: 30 pages, 2 figure
Fast DD-classification of functional data
A fast nonparametric procedure for classifying functional data is introduced.
It consists of a two-step transformation of the original data plus a classifier
operating on a low-dimensional hypercube. The functional data are first mapped
into a finite-dimensional location-slope space and then transformed by a
multivariate depth function into the -plot, which is a subset of the unit
hypercube. This transformation yields a new notion of depth for functional
data. Three alternative depth functions are employed for this, as well as two
rules for the final classification on . The resulting classifier has
to be cross-validated over a small range of parameters only, which is
restricted by a Vapnik-Cervonenkis bound. The entire methodology does not
involve smoothing techniques, is completely nonparametric and allows to achieve
Bayes optimality under standard distributional settings. It is robust,
efficiently computable, and has been implemented in an R environment.
Applicability of the new approach is demonstrated by simulations as well as a
benchmark study
Learning with Symmetric Label Noise: The Importance of Being Unhinged
Convex potential minimisation is the de facto approach to binary
classification. However, Long and Servedio [2010] proved that under symmetric
label noise (SLN), minimisation of any convex potential over a linear function
class can result in classification performance equivalent to random guessing.
This ostensibly shows that convex losses are not SLN-robust. In this paper, we
propose a convex, classification-calibrated loss and prove that it is
SLN-robust. The loss avoids the Long and Servedio [2010] result by virtue of
being negatively unbounded. The loss is a modification of the hinge loss, where
one does not clamp at zero; hence, we call it the unhinged loss. We show that
the optimal unhinged solution is equivalent to that of a strongly regularised
SVM, and is the limiting solution for any convex potential; this implies that
strong l2 regularisation makes most standard learners SLN-robust. Experiments
confirm the SLN-robustness of the unhinged loss
- …