8,670 research outputs found
Efficient Classification for Metric Data
Recent advances in large-margin classification of data residing in general
metric spaces (rather than Hilbert spaces) enable classification under various
natural metrics, such as string edit and earthmover distance. A general
framework developed for this purpose by von Luxburg and Bousquet [JMLR, 2004]
left open the questions of computational efficiency and of providing direct
bounds on generalization error.
We design a new algorithm for classification in general metric spaces, whose
runtime and accuracy depend on the doubling dimension of the data points, and
can thus achieve superior classification performance in many common scenarios.
The algorithmic core of our approach is an approximate (rather than exact)
solution to the classical problems of Lipschitz extension and of Nearest
Neighbor Search. The algorithm's generalization performance is guaranteed via
the fat-shattering dimension of Lipschitz classifiers, and we present
experimental evidence of its superiority to some common kernel methods. As a
by-product, we offer a new perspective on the nearest neighbor classifier,
which yields significantly sharper risk asymptotics than the classic analysis
of Cover and Hart [IEEE Trans. Info. Theory, 1967].Comment: This is the full version of an extended abstract that appeared in
Proceedings of the 23rd COLT, 201
Discrimination on the Grassmann Manifold: Fundamental Limits of Subspace Classifiers
We present fundamental limits on the reliable classification of linear and
affine subspaces from noisy, linear features. Drawing an analogy between
discrimination among subspaces and communication over vector wireless channels,
we propose two Shannon-inspired measures to characterize asymptotic classifier
performance. First, we define the classification capacity, which characterizes
necessary and sufficient conditions for the misclassification probability to
vanish as the signal dimension, the number of features, and the number of
subspaces to be discerned all approach infinity. Second, we define the
diversity-discrimination tradeoff which, by analogy with the
diversity-multiplexing tradeoff of fading vector channels, characterizes
relationships between the number of discernible subspaces and the
misclassification probability as the noise power approaches zero. We derive
upper and lower bounds on these measures which are tight in many regimes.
Numerical results, including a face recognition application, validate the
results in practice.Comment: 19 pages, 4 figures. Revised submission to IEEE Transactions on
Information Theor
Efficient Learning of Linear Separators under Bounded Noise
We study the learnability of linear separators in in the presence of
bounded (a.k.a Massart) noise. This is a realistic generalization of the random
classification noise model, where the adversary can flip each example with
probability . We provide the first polynomial time algorithm
that can learn linear separators to arbitrarily small excess error in this
noise model under the uniform distribution over the unit ball in , for
some constant value of . While widely studied in the statistical learning
theory community in the context of getting faster convergence rates,
computationally efficient algorithms in this model had remained elusive. Our
work provides the first evidence that one can indeed design algorithms
achieving arbitrarily small excess error in polynomial time under this
realistic noise model and thus opens up a new and exciting line of research.
We additionally provide lower bounds showing that popular algorithms such as
hinge loss minimization and averaging cannot lead to arbitrarily small excess
error under Massart noise, even under the uniform distribution. Our work
instead, makes use of a margin based technique developed in the context of
active learning. As a result, our algorithm is also an active learning
algorithm with label complexity that is only a logarithmic the desired excess
error
- …