2,395 research outputs found
Random projections as regularizers: learning a linear discriminant from fewer observations than dimensions
We prove theoretical guarantees for an averaging-ensemble of randomly projected Fisher linear discriminant classifiers, focusing on the casewhen there are fewer training observations than data dimensions. The specific form and simplicity of this ensemble permits a direct and much more detailed analysis than existing generic tools in previous works. In particular, we are able to derive the exact form of the generalization error of our ensemble, conditional on the training set, and based on this we give theoretical guarantees which directly link the performance of the ensemble to that of the corresponding linear discriminant learned in the full data space. To the best of our knowledge these are the first theoretical results to prove such an explicit link for any classifier and classifier ensemble pair. Furthermore we show that the randomly projected ensemble is equivalent to implementing a sophisticated regularization scheme to the linear discriminant learned in the original data space and this prevents overfitting in conditions of small sample size where pseudo-inverse FLD learned in the data space is provably poor. Our ensemble is learned from a set of randomly projected representations of the original high dimensional data and therefore for this approach data can be collected, stored and processed in such a compressed form. We confirm our theoretical findings with experiments, and demonstrate the utility of our approach on several datasets from the bioinformatics domain and one very high dimensional dataset from the drug discovery domain, both settings in which fewer observations than dimensions are the norm
Asymptotic Generalization Bound of Fisher's Linear Discriminant Analysis
Fisher's linear discriminant analysis (FLDA) is an important dimension
reduction method in statistical pattern recognition. It has been shown that
FLDA is asymptotically Bayes optimal under the homoscedastic Gaussian
assumption. However, this classical result has the following two major
limitations: 1) it holds only for a fixed dimensionality , and thus does not
apply when and the training sample size are proportionally large; 2) it
does not provide a quantitative description on how the generalization ability
of FLDA is affected by and . In this paper, we present an asymptotic
generalization analysis of FLDA based on random matrix theory, in a setting
where both and increase and . The
obtained lower bound of the generalization discrimination power overcomes both
limitations of the classical result, i.e., it is applicable when and
are proportionally large and provides a quantitative description of the
generalization ability of FLDA in terms of the ratio and the
population discrimination power. Besides, the discrimination power bound also
leads to an upper bound on the generalization error of binary-classification
with FLDA
Detecting single-trial EEG evoked potential using a wavelet domain linear mixed model: application to error potentials classification
Objective. The main goal of this work is to develop a model for multi-sensor
signals such as MEG or EEG signals, that accounts for the inter-trial
variability, suitable for corresponding binary classification problems. An
important constraint is that the model be simple enough to handle small size
and unbalanced datasets, as often encountered in BCI type experiments.
Approach. The method involves linear mixed effects statistical model, wavelet
transform and spatial filtering, and aims at the characterization of localized
discriminant features in multi-sensor signals. After discrete wavelet transform
and spatial filtering, a projection onto the relevant wavelet and spatial
channels subspaces is used for dimension reduction. The projected signals are
then decomposed as the sum of a signal of interest (i.e. discriminant) and
background noise, using a very simple Gaussian linear mixed model. Main
results. Thanks to the simplicity of the model, the corresponding parameter
estimation problem is simplified. Robust estimates of class-covariance matrices
are obtained from small sample sizes and an effective Bayes plug-in classifier
is derived. The approach is applied to the detection of error potentials in
multichannel EEG data, in a very unbalanced situation (detection of rare
events). Classification results prove the relevance of the proposed approach in
such a context. Significance. The combination of linear mixed model, wavelet
transform and spatial filtering for EEG classification is, to the best of our
knowledge, an original approach, which is proven to be effective. This paper
improves on earlier results on similar problems, and the three main ingredients
all play an important role
Learning in high dimensions with projected linear discriminants
The enormous power of modern computers has made possible the statistical modelling of data with dimensionality that would have made this task inconceivable only decades ago. However, experience in such modelling has made researchers aware of many issues associated with working in high-dimensional domains, collectively known as `the curse of dimensionality', which can confound practitioners' desires to build good models of the world from these data. When the dimensionality is very large, low-dimensional methods and geometric intuition both break down in these high-dimensional spaces. To mitigate the dimensionality curse we can use low-dimensional representations of the original data that capture most of the information it contained. However, little is currently known about the effect of such dimensionality reduction on classifier performance. In this thesis we develop theory quantifying the effect of random projection - a recent, very promising, non-adaptive dimensionality reduction technique - on the classification performance of Fisher's Linear Discriminant (FLD), a successful and widely-used linear classifier. We tackle the issues associated with small sample size and high-dimensionality by using randomly projected FLD ensembles, and we develop theory explaining why our new approach performs well. Finally, we quantify the generalization error of Kernel FLD, a related non-linear projected classifier
Automatic Face Recognition System Based on Local Fourier-Bessel Features
We present an automatic face verification system inspired by known properties
of biological systems. In the proposed algorithm the whole image is converted
from the spatial to polar frequency domain by a Fourier-Bessel Transform (FBT).
Using the whole image is compared to the case where only face image regions
(local analysis) are considered. The resulting representations are embedded in
a dissimilarity space, where each image is represented by its distance to all
the other images, and a Pseudo-Fisher discriminator is built. Verification test
results on the FERET database showed that the local-based algorithm outperforms
the global-FBT version. The local-FBT algorithm performed as state-of-the-art
methods under different testing conditions, indicating that the proposed system
is highly robust for expression, age, and illumination variations. We also
evaluated the performance of the proposed system under strong occlusion
conditions and found that it is highly robust for up to 50% of face occlusion.
Finally, we automated completely the verification system by implementing face
and eye detection algorithms. Under this condition, the local approach was only
slightly superior to the global approach.Comment: 2005, Brazilian Symposium on Computer Graphics and Image Processing,
18 (SIBGRAPI
Implicitly Constrained Semi-Supervised Least Squares Classification
We introduce a novel semi-supervised version of the least squares classifier.
This implicitly constrained least squares (ICLS) classifier minimizes the
squared loss on the labeled data among the set of parameters implied by all
possible labelings of the unlabeled data. Unlike other discriminative
semi-supervised methods, our approach does not introduce explicit additional
assumptions into the objective function, but leverages implicit assumptions
already present in the choice of the supervised least squares classifier. We
show this approach can be formulated as a quadratic programming problem and its
solution can be found using a simple gradient descent procedure. We prove that,
in a certain way, our method never leads to performance worse than the
supervised classifier. Experimental results corroborate this theoretical result
in the multidimensional case on benchmark datasets, also in terms of the error
rate.Comment: 12 pages, 2 figures, 1 table. The Fourteenth International Symposium
on Intelligent Data Analysis (2015), Saint-Etienne, Franc
- …