654 research outputs found

    Detecting single-trial EEG evoked potential using a wavelet domain linear mixed model: application to error potentials classification

    Full text link
    Objective. The main goal of this work is to develop a model for multi-sensor signals such as MEG or EEG signals, that accounts for the inter-trial variability, suitable for corresponding binary classification problems. An important constraint is that the model be simple enough to handle small size and unbalanced datasets, as often encountered in BCI type experiments. Approach. The method involves linear mixed effects statistical model, wavelet transform and spatial filtering, and aims at the characterization of localized discriminant features in multi-sensor signals. After discrete wavelet transform and spatial filtering, a projection onto the relevant wavelet and spatial channels subspaces is used for dimension reduction. The projected signals are then decomposed as the sum of a signal of interest (i.e. discriminant) and background noise, using a very simple Gaussian linear mixed model. Main results. Thanks to the simplicity of the model, the corresponding parameter estimation problem is simplified. Robust estimates of class-covariance matrices are obtained from small sample sizes and an effective Bayes plug-in classifier is derived. The approach is applied to the detection of error potentials in multichannel EEG data, in a very unbalanced situation (detection of rare events). Classification results prove the relevance of the proposed approach in such a context. Significance. The combination of linear mixed model, wavelet transform and spatial filtering for EEG classification is, to the best of our knowledge, an original approach, which is proven to be effective. This paper improves on earlier results on similar problems, and the three main ingredients all play an important role

    Correcting the optimally selected resampling-based error rate: A smooth analytical alternative to nested cross-validation

    Get PDF
    High-dimensional binary classification tasks, e.g. the classification of microarray samples into normal and cancer tissues, usually involve a tuning parameter adjusting the complexity of the applied method to the examined data set. By reporting the performance of the best tuning parameter value only, over-optimistic prediction errors are published. The contribution of this paper is two-fold. Firstly, we develop a new method for tuning bias correction which can be motivated by decision theoretic considerations. The method is based on the decomposition of the unconditional error rate involving the tuning procedure. Our corrected error estimator can be written as a weighted mean of the errors obtained using the different tuning parameter values. It can be interpreted as a smooth version of nested cross-validation (NCV) which is the standard approach for avoiding tuning bias. In contrast to NCV, the weighting scheme of our method guarantees intuitive bounds for the corrected error. Secondly, we suggest to use bias correction methods also to address the bias resulting from the optimal choice of the classification method among several competitors. This method selection bias is particularly relevant to prediction problems in high-dimensional data. In the absence of standards, it is common practice to try several methods successively, which can lead to an optimistic bias similar to the tuning bias. We demonstrate the performance of our method to address both types of bias based on microarray data sets and compare it to existing methods. This study confirms that our approach yields estimates competitive to NCV at a much lower computational price

    Feature selection guided by structural information

    Get PDF
    In generalized linear regression problems with an abundant number of features, lasso-type regularization which imposes an 1\ell^1-constraint on the regression coefficients has become a widely established technique. Deficiencies of the lasso in certain scenarios, notably strongly correlated design, were unmasked when Zou and Hastie [J. Roy. Statist. Soc. Ser. B 67 (2005) 301--320] introduced the elastic net. In this paper we propose to extend the elastic net by admitting general nonnegative quadratic constraints as a second form of regularization. The generalized ridge-type constraint will typically make use of the known association structure of features, for example, by using temporal- or spatial closeness. We study properties of the resulting "structured elastic net" regression estimation procedure, including basic asymptotics and the issue of model selection consistency. In this vein, we provide an analog to the so-called "irrepresentable condition" which holds for the lasso. Moreover, we outline algorithmic solutions for the structured elastic net within the generalized linear model family. The rationale and the performance of our approach is illustrated by means of simulated and real world data, with a focus on signal regression.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS302 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    The Illusion of Distribution-Free Small-Sample Classification in Genomics

    Get PDF
    Classification has emerged as a major area of investigation in bioinformatics owing to the desire to discriminate phenotypes, in particular, disease conditions, using high-throughput genomic data. While many classification rules have been posed, there is a paucity of error estimation rules and an even greater paucity of theory concerning error estimation accuracy. This is problematic because the worth of a classifier depends mainly on its error rate. It is common place in bio-informatics papers to have a classification rule applied to a small labeled data set and the error of the resulting classifier be estimated on the same data set, most often via cross-validation, without any assumptions being made on the underlying feature-label distribution. Concomitant with a lack of distributional assumptions is the absence of any statement regarding the accuracy of the error estimate. Without such a measure of accuracy, the most common one being the root-mean-square (RMS), the error estimate is essentially meaningless and the worth of the entire paper is questionable. The concomitance of an absence of distributional assumptions and of a measure of error estimation accuracy is assured in small-sample settings because even when distribution-free bounds exist (and that is rare), the sample sizes required under the bounds are so large as to make them useless for small samples. Thus, distributional bounds are necessary and the distributional assumptions need to be stated. Owing to the epistemological dependence of classifiers on the accuracy of their estimated errors, scientifically meaningful distribution-free classification in high-throughput, small-sample biology is an illusion

    Signal identification in ERP data by decorrelated Higher Criticism Thresholding

    Get PDF
    Event-related potentials (ERPs) are intensive recordings of electrical activity along the scalp time-locked to motor, sensory, or cognitive events. A main objective in ERP studies is to select (rare) time points at which (weak) ERP amplitudes (features) are significantly associated with experimental variable of interest. The Higher Criticism Thresholding (HCT), as an optimal signal detection procedure in the " rare-and-weak " paradigm, appears to be ideally suited for identifying ERP features. However, ERPs exhibit complex temporal dependence patterns violating the assumption under which signal identification can be achieved efficiently for HCT. This article first highlights this impact of dependence in terms of instability of signal estimation by HCT. A factor modeling for the covariance in HCT is then introduced to decorrelate test statistics and to restore stability in estimation. The detection boundary under factor-analytic dependence is derived and the phase diagram is correspondingly extended. Using simulations and a real data analysis example, the proposed method is shown to estimate more efficiently the support of signals compared with standard HCT and other HCT approaches based on a shrinkage estimation of the covariance matrix

    Innovative Techniques for the Retrieval of Earth’s Surface and Atmosphere Geophysical Parameters: Spaceborne Infrared/Microwave Combined Analyses

    Get PDF
    With the advent of the first satellites for Earth Observation: Landsat-1 in July 1972 and ERS-1 in May 1991, the discipline of environmental remote sensing has become, over time, increasingly fundamental for the study of phenomena characterizing the planet Earth. The goal of environmental remote sensing is to perform detailed analyses and to monitor the temporal evolution of different physical phenomena, exploiting the mechanisms of interaction between the objects that are present in an observed scene and the electromagnetic radiation detected by sensors, placed at a distance from the scene, operating at different frequencies. The analyzed physical phenomena are those related to climate change, weather forecasts, global ocean circulation, greenhouse gas profiling, earthquakes, volcanic eruptions, soil subsidence, and the effects of rapid urbanization processes. Generally, remote sensing sensors are of two primary types: active and passive. Active sensors use their own source of electromagnetic radiation to illuminate and analyze an area of interest. An active sensor emits radiation in the direction of the area to be investigated and then detects and measures the radiation that is backscattered from the objects contained in that area. Passive sensors, on the other hand, detect natural electromagnetic radiation (e.g., from the Sun in the visible band and the Earth in the infrared and microwave bands) emitted or reflected by the object contained in the observed scene. The scientific community has dedicated many resources to developing techniques to estimate, study and analyze Earth’s geophysical parameters. These techniques differ for active and passive sensors because they depend strictly on the type of the measured physical quantity. In my P.h.D. work, inversion techniques for estimating Earth’s surface and atmosphere geophysical parameters will be addressed, emphasizing methods based on machine learning (ML). In particular, the study of cloud microphysics and the characterization of Earth’s surface changes phenomenon are the critical points of this work

    Characterization of the Effectiveness of Reporting Lists of Small Feature Sets Relative to the Accuracy of the Prior Biological Knowledge

    Get PDF
    When confronted with a small sample, feature-selection algorithms often fail to find good feature sets, a problem exacerbated for high-dimensional data and large feature sets. The problem is compounded by the fact that, if one obtains a feature set with a low error estimate, the estimate is unreliable because training-data-based error estimators typically perform poorly on small samples, exhibiting optimistic bias or high variance. One way around the problem is limit the number of features being considered, restrict features sets to sizes such that all feature sets can be examined by exhaustive search, and report a list of the best performing feature sets. If the list is short, then it greatly restricts the possible feature sets to be considered as candidates; however, one can expect the lowest error estimates obtained to be optimistically biased so that there may not be a close-to-optimal feature set on the list. This paper provides a power analysis of this methodology; in particular, it examines the kind of results one should expect to obtain relative to the length of the list and the number of discriminating features among those considered. Two measures are employed. The first is the probability that there is at least one feature set on the list whose true classification error is within some given tolerance of the best feature set and the second is the expected number of feature sets on the list whose true errors are within the given tolerance of the best feature set. These values are plotted as functions of the list length to generate power curves. The results show that, if the number of discriminating features is not too small—that is, the prior biological knowledge is not too poor—then one should expect, with high probability, to find good feature sets

    Anomaly Detection in Presence of Irrelevant Features

    Full text link
    Experiments at particle colliders are the primary source of insight into physics at microscopic scales. Searches at these facilities often rely on optimization of analyses targeting specific models of new physics. Increasingly, however, data-driven model-agnostic approaches based on machine learning are also being explored. A major challenge is that such methods can be highly sensitive to the presence of many irrelevant features in the data. This paper presents Boosted Decision Tree (BDT)-based techniques to improve anomaly detection in the presence of many irrelevant features. First, a BDT classifier is shown to be more robust than neural networks for the Classification Without Labels approach to finding resonant excesses assuming independence of resonant and non-resonant observables. Next, a tree-based probability density estimator using copula transformations demonstrates significant stability and improved performance over normalizing flows as irrelevant features are added. The results make a compelling case for further development of tree-based algorithms for more robust resonant anomaly detection in high energy physics.Comment: 24 pages, 7 figures. v2: Figure 6 update
    corecore