6,455 research outputs found
An Impossibility Result for High Dimensional Supervised Learning
We study high-dimensional asymptotic performance limits of binary supervised
classification problems where the class conditional densities are Gaussian with
unknown means and covariances and the number of signal dimensions scales faster
than the number of labeled training samples. We show that the Bayes error,
namely the minimum attainable error probability with complete distributional
knowledge and equally likely classes, can be arbitrarily close to zero and yet
the limiting minimax error probability of every supervised learning algorithm
is no better than a random coin toss. In contrast to related studies where the
classification difficulty (Bayes error) is made to vanish, we hold it constant
when taking high-dimensional limits. In contrast to VC-dimension based minimax
lower bounds that consider the worst case error probability over all
distributions that have a fixed Bayes error, our worst case is over the family
of Gaussian distributions with constant Bayes error. We also show that a
nontrivial asymptotic minimax error probability can only be attained for
parametric subsets of zero measure (in a suitable measure space). These results
expose the fundamental importance of prior knowledge and suggest that unless we
impose strong structural constraints, such as sparsity, on the parametric
space, supervised learning may be ineffective in high dimensional small sample
settings.Comment: This paper was submitted to the IEEE Information Theory Workshop
(ITW) 2013 on April 23, 201
Discrimination on the Grassmann Manifold: Fundamental Limits of Subspace Classifiers
We present fundamental limits on the reliable classification of linear and
affine subspaces from noisy, linear features. Drawing an analogy between
discrimination among subspaces and communication over vector wireless channels,
we propose two Shannon-inspired measures to characterize asymptotic classifier
performance. First, we define the classification capacity, which characterizes
necessary and sufficient conditions for the misclassification probability to
vanish as the signal dimension, the number of features, and the number of
subspaces to be discerned all approach infinity. Second, we define the
diversity-discrimination tradeoff which, by analogy with the
diversity-multiplexing tradeoff of fading vector channels, characterizes
relationships between the number of discernible subspaces and the
misclassification probability as the noise power approaches zero. We derive
upper and lower bounds on these measures which are tight in many regimes.
Numerical results, including a face recognition application, validate the
results in practice.Comment: 19 pages, 4 figures. Revised submission to IEEE Transactions on
Information Theor
Similarity Learning for Provably Accurate Sparse Linear Classification
In recent years, the crucial importance of metrics in machine learning
algorithms has led to an increasing interest for optimizing distance and
similarity functions. Most of the state of the art focus on learning
Mahalanobis distances (requiring to fulfill a constraint of positive
semi-definiteness) for use in a local k-NN algorithm. However, no theoretical
link is established between the learned metrics and their performance in
classification. In this paper, we make use of the formal framework of good
similarities introduced by Balcan et al. to design an algorithm for learning a
non PSD linear similarity optimized in a nonlinear feature space, which is then
used to build a global linear classifier. We show that our approach has uniform
stability and derive a generalization bound on the classification error.
Experiments performed on various datasets confirm the effectiveness of our
approach compared to state-of-the-art methods and provide evidence that (i) it
is fast, (ii) robust to overfitting and (iii) produces very sparse classifiers.Comment: Appears in Proceedings of the 29th International Conference on
Machine Learning (ICML 2012
Random projections as regularizers: learning a linear discriminant from fewer observations than dimensions
We prove theoretical guarantees for an averaging-ensemble of randomly projected Fisher linear discriminant classifiers, focusing on the casewhen there are fewer training observations than data dimensions. The specific form and simplicity of this ensemble permits a direct and much more detailed analysis than existing generic tools in previous works. In particular, we are able to derive the exact form of the generalization error of our ensemble, conditional on the training set, and based on this we give theoretical guarantees which directly link the performance of the ensemble to that of the corresponding linear discriminant learned in the full data space. To the best of our knowledge these are the first theoretical results to prove such an explicit link for any classifier and classifier ensemble pair. Furthermore we show that the randomly projected ensemble is equivalent to implementing a sophisticated regularization scheme to the linear discriminant learned in the original data space and this prevents overfitting in conditions of small sample size where pseudo-inverse FLD learned in the data space is provably poor. Our ensemble is learned from a set of randomly projected representations of the original high dimensional data and therefore for this approach data can be collected, stored and processed in such a compressed form. We confirm our theoretical findings with experiments, and demonstrate the utility of our approach on several datasets from the bioinformatics domain and one very high dimensional dataset from the drug discovery domain, both settings in which fewer observations than dimensions are the norm
Sharp generalization error bounds for randomly-projected classifiers
We derive sharp bounds on the generalization error of a generic linear classifier trained by empirical risk minimization on randomly projected data. We make no restrictive assumptions (such as sparsity or separability) on the data: Instead we use the fact that, in a classification setting, the question of interest is really āwhat is the effect of random projection on the predicted class labels?ā and we therefore derive the exact probability of ālabel flippingā under Gaussian random projection in order to quantify this effect precisely in our bounds
Incremental Training of a Detector Using Online Sparse Eigen-decomposition
The ability to efficiently and accurately detect objects plays a very crucial
role for many computer vision tasks. Recently, offline object detectors have
shown a tremendous success. However, one major drawback of offline techniques
is that a complete set of training data has to be collected beforehand. In
addition, once learned, an offline detector can not make use of newly arriving
data. To alleviate these drawbacks, online learning has been adopted with the
following objectives: (1) the technique should be computationally and storage
efficient; (2) the updated classifier must maintain its high classification
accuracy. In this paper, we propose an effective and efficient framework for
learning an adaptive online greedy sparse linear discriminant analysis (GSLDA)
model. Unlike many existing online boosting detectors, which usually apply
exponential or logistic loss, our online algorithm makes use of LDA's learning
criterion that not only aims to maximize the class-separation criterion but
also incorporates the asymmetrical property of training data distributions. We
provide a better alternative for online boosting algorithms in the context of
training a visual object detector. We demonstrate the robustness and efficiency
of our methods on handwriting digit and face data sets. Our results confirm
that object detection tasks benefit significantly when trained in an online
manner.Comment: 14 page
- ā¦