66,692 research outputs found
Detecting single-trial EEG evoked potential using a wavelet domain linear mixed model: application to error potentials classification
Objective. The main goal of this work is to develop a model for multi-sensor
signals such as MEG or EEG signals, that accounts for the inter-trial
variability, suitable for corresponding binary classification problems. An
important constraint is that the model be simple enough to handle small size
and unbalanced datasets, as often encountered in BCI type experiments.
Approach. The method involves linear mixed effects statistical model, wavelet
transform and spatial filtering, and aims at the characterization of localized
discriminant features in multi-sensor signals. After discrete wavelet transform
and spatial filtering, a projection onto the relevant wavelet and spatial
channels subspaces is used for dimension reduction. The projected signals are
then decomposed as the sum of a signal of interest (i.e. discriminant) and
background noise, using a very simple Gaussian linear mixed model. Main
results. Thanks to the simplicity of the model, the corresponding parameter
estimation problem is simplified. Robust estimates of class-covariance matrices
are obtained from small sample sizes and an effective Bayes plug-in classifier
is derived. The approach is applied to the detection of error potentials in
multichannel EEG data, in a very unbalanced situation (detection of rare
events). Classification results prove the relevance of the proposed approach in
such a context. Significance. The combination of linear mixed model, wavelet
transform and spatial filtering for EEG classification is, to the best of our
knowledge, an original approach, which is proven to be effective. This paper
improves on earlier results on similar problems, and the three main ingredients
all play an important role
Automated design of robust discriminant analysis classifier for foot pressure lesions using kinematic data
In the recent years, the use of motion tracking systems for acquisition of functional biomechanical gait data, has received increasing interest due to the richness and accuracy of the measured kinematic information. However, costs frequently restrict the number of subjects employed, and this makes the dimensionality of the collected data far higher than the available samples. This paper applies discriminant analysis algorithms to the classification of patients with different types of foot lesions, in order to establish an association between foot motion and lesion formation. With primary attention to small sample size situations, we compare different types of Bayesian classifiers and evaluate their performance with various dimensionality reduction techniques for feature extraction, as well as search methods for selection of raw kinematic variables. Finally, we propose a novel integrated method which fine-tunes the classifier parameters and selects the most relevant kinematic variables simultaneously. Performance comparisons are using robust resampling techniques such as Bootstrapand k-fold cross-validation. Results from experimentations with lesion subjects suffering from pathological plantar hyperkeratosis, show that the proposed method can lead tocorrect classification rates with less than 10% of the original features
Optimal classifier selection and negative bias in error rate estimation: An empirical study on high-dimensional prediction
In biometric practice, researchers often apply a large number of different methods in a "trial-and-error" strategy to get as much as possible out of their data and, due to publication pressure or pressure from the consulting customer, present only the most favorable results. This strategy may induce a substantial optimistic bias in prediction error estimation, which is quantitatively assessed in the present manuscript. The focus of our work is on class prediction based on high-dimensional data (e.g. microarray data), since such analyses are particularly exposed to this kind of bias.
In our study we consider a total of 124 variants of classifiers (possibly including variable selection or tuning steps) within a cross-validation evaluation scheme. The classifiers are applied to original and modified real microarray data sets, some of which are obtained by randomly permuting the class labels to mimic non-informative predictors while preserving their correlation structure. We then assess the minimal misclassification rate over the different variants of classifiers in order to quantify the bias arising when the optimal classifier is selected a posteriori in a data-driven manner. The bias resulting from the parameter tuning (including gene selection parameters as a special case) and the bias resulting from the choice of the classification method are examined both separately and jointly.
We conclude that the strategy to present only the optimal result is not acceptable, and suggest alternative approaches for properly reporting classification accuracy
NBLDA: Negative Binomial Linear Discriminant Analysis for RNA-Seq Data
RNA-sequencing (RNA-Seq) has become a powerful technology to characterize
gene expression profiles because it is more accurate and comprehensive than
microarrays. Although statistical methods that have been developed for
microarray data can be applied to RNA-Seq data, they are not ideal due to the
discrete nature of RNA-Seq data. The Poisson distribution and negative binomial
distribution are commonly used to model count data. Recently, Witten (2011)
proposed a Poisson linear discriminant analysis for RNA-Seq data. The Poisson
assumption may not be as appropriate as negative binomial distribution when
biological replicates are available and in the presence of overdispersion
(i.e., when the variance is larger than the mean). However, it is more
complicated to model negative binomial variables because they involve a
dispersion parameter that needs to be estimated. In this paper, we propose a
negative binomial linear discriminant analysis for RNA-Seq data. By Bayes'
rule, we construct the classifier by fitting a negative binomial model, and
propose some plug-in rules to estimate the unknown parameters in the
classifier. The relationship between the negative binomial classifier and the
Poisson classifier is explored, with a numerical investigation of the impact of
dispersion on the discriminant score. Simulation results show the superiority
of our proposed method. We also analyze four real RNA-Seq data sets to
demonstrate the advantage of our method in real-world applications
- …