157,149 research outputs found
Pattern Recognition for Conditionally Independent Data
In this work we consider the task of relaxing the i.i.d assumption in pattern
recognition (or classification), aiming to make existing learning algorithms
applicable to a wider range of tasks. Pattern recognition is guessing a
discrete label of some object based on a set of given examples (pairs of
objects and labels). We consider the case of deterministically defined labels.
Traditionally, this task is studied under the assumption that examples are
independent and identically distributed. However, it turns out that many
results of pattern recognition theory carry over a weaker assumption. Namely,
under the assumption of conditional independence and identical distribution of
objects, while the only assumption on the distribution of labels is that the
rate of occurrence of each label should be above some positive threshold.
We find a broad class of learning algorithms for which estimations of the
probability of a classification error achieved under the classical i.i.d.
assumption can be generalised to the similar estimates for the case of
conditionally i.i.d. examples.Comment: parts of results published at ALT'04 and ICML'0
Exact Distribution-Free Hypothesis Tests for the Regression Function of Binary Classification via Conditional Kernel Mean Embeddings
In this paper we suggest two statistical hypothesis tests for the regression
function of binary classification based on conditional kernel mean embeddings.
The regression function is a fundamental object in classification as it
determines both the Bayes optimal classifier and the misclassification
probabilities. A resampling based framework is presented and combined with
consistent point estimators of the conditional kernel mean map, in order to
construct distribution-free hypothesis tests. These tests are introduced in a
flexible manner allowing us to control the exact probability of type I error
for any sample size. We also prove that both proposed techniques are consistent
under weak statistical assumptions, i.e., the type II error probabilities
pointwise converge to zero
Optimal Clustering under Uncertainty
Classical clustering algorithms typically either lack an underlying
probability framework to make them predictive or focus on parameter estimation
rather than defining and minimizing a notion of error. Recent work addresses
these issues by developing a probabilistic framework based on the theory of
random labeled point processes and characterizing a Bayes clusterer that
minimizes the number of misclustered points. The Bayes clusterer is analogous
to the Bayes classifier. Whereas determining a Bayes classifier requires full
knowledge of the feature-label distribution, deriving a Bayes clusterer
requires full knowledge of the point process. When uncertain of the point
process, one would like to find a robust clusterer that is optimal over the
uncertainty, just as one may find optimal robust classifiers with uncertain
feature-label distributions. Herein, we derive an optimal robust clusterer by
first finding an effective random point process that incorporates all
randomness within its own probabilistic structure and from which a Bayes
clusterer can be derived that provides an optimal robust clusterer relative to
the uncertainty. This is analogous to the use of effective class-conditional
distributions in robust classification. After evaluating the performance of
robust clusterers in synthetic mixtures of Gaussians models, we apply the
framework to granular imaging, where we make use of the asymptotic
granulometric moment theory for granular images to relate robust clustering
theory to the application.Comment: 19 pages, 5 eps figures, 1 tabl
Discrimination on the Grassmann Manifold: Fundamental Limits of Subspace Classifiers
We present fundamental limits on the reliable classification of linear and
affine subspaces from noisy, linear features. Drawing an analogy between
discrimination among subspaces and communication over vector wireless channels,
we propose two Shannon-inspired measures to characterize asymptotic classifier
performance. First, we define the classification capacity, which characterizes
necessary and sufficient conditions for the misclassification probability to
vanish as the signal dimension, the number of features, and the number of
subspaces to be discerned all approach infinity. Second, we define the
diversity-discrimination tradeoff which, by analogy with the
diversity-multiplexing tradeoff of fading vector channels, characterizes
relationships between the number of discernible subspaces and the
misclassification probability as the noise power approaches zero. We derive
upper and lower bounds on these measures which are tight in many regimes.
Numerical results, including a face recognition application, validate the
results in practice.Comment: 19 pages, 4 figures. Revised submission to IEEE Transactions on
Information Theor
- âŠ