2,395 research outputs found

    Random projections as regularizers: learning a linear discriminant from fewer observations than dimensions

    Get PDF
    We prove theoretical guarantees for an averaging-ensemble of randomly projected Fisher linear discriminant classifiers, focusing on the casewhen there are fewer training observations than data dimensions. The specific form and simplicity of this ensemble permits a direct and much more detailed analysis than existing generic tools in previous works. In particular, we are able to derive the exact form of the generalization error of our ensemble, conditional on the training set, and based on this we give theoretical guarantees which directly link the performance of the ensemble to that of the corresponding linear discriminant learned in the full data space. To the best of our knowledge these are the first theoretical results to prove such an explicit link for any classifier and classifier ensemble pair. Furthermore we show that the randomly projected ensemble is equivalent to implementing a sophisticated regularization scheme to the linear discriminant learned in the original data space and this prevents overfitting in conditions of small sample size where pseudo-inverse FLD learned in the data space is provably poor. Our ensemble is learned from a set of randomly projected representations of the original high dimensional data and therefore for this approach data can be collected, stored and processed in such a compressed form. We confirm our theoretical findings with experiments, and demonstrate the utility of our approach on several datasets from the bioinformatics domain and one very high dimensional dataset from the drug discovery domain, both settings in which fewer observations than dimensions are the norm

    Asymptotic Generalization Bound of Fisher's Linear Discriminant Analysis

    Full text link
    Fisher's linear discriminant analysis (FLDA) is an important dimension reduction method in statistical pattern recognition. It has been shown that FLDA is asymptotically Bayes optimal under the homoscedastic Gaussian assumption. However, this classical result has the following two major limitations: 1) it holds only for a fixed dimensionality DD, and thus does not apply when DD and the training sample size NN are proportionally large; 2) it does not provide a quantitative description on how the generalization ability of FLDA is affected by DD and NN. In this paper, we present an asymptotic generalization analysis of FLDA based on random matrix theory, in a setting where both DD and NN increase and D/Nγ[0,1)D/N\longrightarrow\gamma\in[0,1). The obtained lower bound of the generalization discrimination power overcomes both limitations of the classical result, i.e., it is applicable when DD and NN are proportionally large and provides a quantitative description of the generalization ability of FLDA in terms of the ratio γ=D/N\gamma=D/N and the population discrimination power. Besides, the discrimination power bound also leads to an upper bound on the generalization error of binary-classification with FLDA

    Detecting single-trial EEG evoked potential using a wavelet domain linear mixed model: application to error potentials classification

    Full text link
    Objective. The main goal of this work is to develop a model for multi-sensor signals such as MEG or EEG signals, that accounts for the inter-trial variability, suitable for corresponding binary classification problems. An important constraint is that the model be simple enough to handle small size and unbalanced datasets, as often encountered in BCI type experiments. Approach. The method involves linear mixed effects statistical model, wavelet transform and spatial filtering, and aims at the characterization of localized discriminant features in multi-sensor signals. After discrete wavelet transform and spatial filtering, a projection onto the relevant wavelet and spatial channels subspaces is used for dimension reduction. The projected signals are then decomposed as the sum of a signal of interest (i.e. discriminant) and background noise, using a very simple Gaussian linear mixed model. Main results. Thanks to the simplicity of the model, the corresponding parameter estimation problem is simplified. Robust estimates of class-covariance matrices are obtained from small sample sizes and an effective Bayes plug-in classifier is derived. The approach is applied to the detection of error potentials in multichannel EEG data, in a very unbalanced situation (detection of rare events). Classification results prove the relevance of the proposed approach in such a context. Significance. The combination of linear mixed model, wavelet transform and spatial filtering for EEG classification is, to the best of our knowledge, an original approach, which is proven to be effective. This paper improves on earlier results on similar problems, and the three main ingredients all play an important role

    Learning in high dimensions with projected linear discriminants

    Get PDF
    The enormous power of modern computers has made possible the statistical modelling of data with dimensionality that would have made this task inconceivable only decades ago. However, experience in such modelling has made researchers aware of many issues associated with working in high-dimensional domains, collectively known as `the curse of dimensionality', which can confound practitioners' desires to build good models of the world from these data. When the dimensionality is very large, low-dimensional methods and geometric intuition both break down in these high-dimensional spaces. To mitigate the dimensionality curse we can use low-dimensional representations of the original data that capture most of the information it contained. However, little is currently known about the effect of such dimensionality reduction on classifier performance. In this thesis we develop theory quantifying the effect of random projection - a recent, very promising, non-adaptive dimensionality reduction technique - on the classification performance of Fisher's Linear Discriminant (FLD), a successful and widely-used linear classifier. We tackle the issues associated with small sample size and high-dimensionality by using randomly projected FLD ensembles, and we develop theory explaining why our new approach performs well. Finally, we quantify the generalization error of Kernel FLD, a related non-linear projected classifier

    Automatic Face Recognition System Based on Local Fourier-Bessel Features

    Full text link
    We present an automatic face verification system inspired by known properties of biological systems. In the proposed algorithm the whole image is converted from the spatial to polar frequency domain by a Fourier-Bessel Transform (FBT). Using the whole image is compared to the case where only face image regions (local analysis) are considered. The resulting representations are embedded in a dissimilarity space, where each image is represented by its distance to all the other images, and a Pseudo-Fisher discriminator is built. Verification test results on the FERET database showed that the local-based algorithm outperforms the global-FBT version. The local-FBT algorithm performed as state-of-the-art methods under different testing conditions, indicating that the proposed system is highly robust for expression, age, and illumination variations. We also evaluated the performance of the proposed system under strong occlusion conditions and found that it is highly robust for up to 50% of face occlusion. Finally, we automated completely the verification system by implementing face and eye detection algorithms. Under this condition, the local approach was only slightly superior to the global approach.Comment: 2005, Brazilian Symposium on Computer Graphics and Image Processing, 18 (SIBGRAPI

    Implicitly Constrained Semi-Supervised Least Squares Classification

    Full text link
    We introduce a novel semi-supervised version of the least squares classifier. This implicitly constrained least squares (ICLS) classifier minimizes the squared loss on the labeled data among the set of parameters implied by all possible labelings of the unlabeled data. Unlike other discriminative semi-supervised methods, our approach does not introduce explicit additional assumptions into the objective function, but leverages implicit assumptions already present in the choice of the supervised least squares classifier. We show this approach can be formulated as a quadratic programming problem and its solution can be found using a simple gradient descent procedure. We prove that, in a certain way, our method never leads to performance worse than the supervised classifier. Experimental results corroborate this theoretical result in the multidimensional case on benchmark datasets, also in terms of the error rate.Comment: 12 pages, 2 figures, 1 table. The Fourteenth International Symposium on Intelligent Data Analysis (2015), Saint-Etienne, Franc
    corecore