654 research outputs found
Detecting single-trial EEG evoked potential using a wavelet domain linear mixed model: application to error potentials classification
Objective. The main goal of this work is to develop a model for multi-sensor
signals such as MEG or EEG signals, that accounts for the inter-trial
variability, suitable for corresponding binary classification problems. An
important constraint is that the model be simple enough to handle small size
and unbalanced datasets, as often encountered in BCI type experiments.
Approach. The method involves linear mixed effects statistical model, wavelet
transform and spatial filtering, and aims at the characterization of localized
discriminant features in multi-sensor signals. After discrete wavelet transform
and spatial filtering, a projection onto the relevant wavelet and spatial
channels subspaces is used for dimension reduction. The projected signals are
then decomposed as the sum of a signal of interest (i.e. discriminant) and
background noise, using a very simple Gaussian linear mixed model. Main
results. Thanks to the simplicity of the model, the corresponding parameter
estimation problem is simplified. Robust estimates of class-covariance matrices
are obtained from small sample sizes and an effective Bayes plug-in classifier
is derived. The approach is applied to the detection of error potentials in
multichannel EEG data, in a very unbalanced situation (detection of rare
events). Classification results prove the relevance of the proposed approach in
such a context. Significance. The combination of linear mixed model, wavelet
transform and spatial filtering for EEG classification is, to the best of our
knowledge, an original approach, which is proven to be effective. This paper
improves on earlier results on similar problems, and the three main ingredients
all play an important role
Correcting the optimally selected resampling-based error rate: A smooth analytical alternative to nested cross-validation
High-dimensional binary classification tasks, e.g. the classification of microarray samples into normal and cancer tissues, usually involve a tuning parameter adjusting the complexity of the applied method to the examined data set. By reporting the performance of the best tuning parameter value only, over-optimistic prediction errors are published. The contribution of this paper is two-fold. Firstly, we develop a new method for tuning bias
correction which can be motivated by decision theoretic considerations. The method is based on the decomposition of the unconditional error rate involving the tuning procedure. Our corrected error estimator can be written as
a weighted mean of the errors obtained using the different tuning parameter values. It can be interpreted as a smooth version of nested cross-validation (NCV) which is the standard approach for avoiding tuning bias. In contrast
to NCV, the weighting scheme of our method guarantees intuitive bounds for the corrected error. Secondly, we suggest to use bias correction methods also to address the bias resulting from the optimal choice of the classification method among several competitors. This method selection bias is particularly relevant to prediction problems in high-dimensional data. In the
absence of standards, it is common practice to try several methods successively, which can lead to an optimistic bias similar to the tuning bias. We demonstrate the performance of our method to address both types of bias based on microarray data sets and compare it to existing methods. This study confirms that our approach yields estimates competitive to NCV at a much lower computational price
Feature selection guided by structural information
In generalized linear regression problems with an abundant number of
features, lasso-type regularization which imposes an -constraint on the
regression coefficients has become a widely established technique. Deficiencies
of the lasso in certain scenarios, notably strongly correlated design, were
unmasked when Zou and Hastie [J. Roy. Statist. Soc. Ser. B 67 (2005) 301--320]
introduced the elastic net. In this paper we propose to extend the elastic net
by admitting general nonnegative quadratic constraints as a second form of
regularization. The generalized ridge-type constraint will typically make use
of the known association structure of features, for example, by using temporal-
or spatial closeness. We study properties of the resulting "structured elastic
net" regression estimation procedure, including basic asymptotics and the issue
of model selection consistency. In this vein, we provide an analog to the
so-called "irrepresentable condition" which holds for the lasso. Moreover, we
outline algorithmic solutions for the structured elastic net within the
generalized linear model family. The rationale and the performance of our
approach is illustrated by means of simulated and real world data, with a focus
on signal regression.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS302 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
The Illusion of Distribution-Free Small-Sample Classification in Genomics
Classification has emerged as a major area of investigation in bioinformatics owing to the desire to discriminate phenotypes, in particular, disease conditions, using high-throughput genomic data. While many classification rules have been posed, there is a paucity of error estimation rules and an even greater paucity of theory concerning error estimation accuracy. This is problematic because the worth of a classifier depends mainly on its error rate. It is common place in bio-informatics papers to have a classification rule applied to a small labeled data set and the error of the resulting classifier be estimated on the same data set, most often via cross-validation, without any assumptions being made on the underlying feature-label distribution. Concomitant with a lack of distributional assumptions is the absence of any statement regarding the accuracy of the error estimate. Without such a measure of accuracy, the most common one being the root-mean-square (RMS), the error estimate is essentially meaningless and the worth of the entire paper is questionable. The concomitance of an absence of distributional assumptions and of a measure of error estimation accuracy is assured in small-sample settings because even when distribution-free bounds exist (and that is rare), the sample sizes required under the bounds are so large as to make them useless for small samples. Thus, distributional bounds are necessary and the distributional assumptions need to be stated. Owing to the epistemological dependence of classifiers on the accuracy of their estimated errors, scientifically meaningful distribution-free classification in high-throughput, small-sample biology is an illusion
Signal identification in ERP data by decorrelated Higher Criticism Thresholding
Event-related potentials (ERPs) are intensive recordings of electrical activity along the scalp time-locked to motor, sensory, or cognitive events. A main objective in ERP studies is to select (rare) time points at which (weak) ERP amplitudes (features) are significantly associated with experimental variable of interest. The Higher Criticism Thresholding (HCT), as an optimal signal detection procedure in the " rare-and-weak " paradigm, appears to be ideally suited for identifying ERP features. However, ERPs exhibit complex temporal dependence patterns violating the assumption under which signal identification can be achieved efficiently for HCT. This article first highlights this impact of dependence in terms of instability of signal estimation by HCT. A factor modeling for the covariance in HCT is then introduced to decorrelate test statistics and to restore stability in estimation. The detection boundary under factor-analytic dependence is derived and the phase diagram is correspondingly extended. Using simulations and a real data analysis example, the proposed method is shown to estimate more efficiently the support of signals compared with standard HCT and other HCT approaches based on a shrinkage estimation of the covariance matrix
Innovative Techniques for the Retrieval of Earth’s Surface and Atmosphere Geophysical Parameters: Spaceborne Infrared/Microwave Combined Analyses
With the advent of the first satellites for Earth Observation: Landsat-1 in July 1972 and ERS-1 in May 1991, the discipline of environmental remote sensing has become, over time, increasingly fundamental for the study of phenomena characterizing the planet Earth. The goal of environmental remote sensing is to perform detailed analyses and to monitor the temporal evolution of different physical phenomena, exploiting the mechanisms of interaction between the objects that are present in an observed scene and the electromagnetic radiation detected by sensors, placed at a distance from the scene, operating at different frequencies. The analyzed physical phenomena are those related to climate change, weather forecasts, global ocean circulation, greenhouse gas profiling, earthquakes, volcanic eruptions, soil subsidence, and the effects of rapid urbanization processes. Generally, remote sensing sensors are of two primary types: active and passive. Active sensors use their own source of electromagnetic radiation to illuminate and analyze an area of interest. An active sensor emits radiation in the direction of the area to be investigated and then detects and measures the radiation that is backscattered from the objects contained in that area. Passive sensors, on the other hand, detect natural electromagnetic radiation (e.g., from the Sun in the visible band and the Earth in the infrared and microwave bands) emitted or reflected by the object contained in the observed scene. The scientific community has dedicated many resources to developing techniques to estimate, study and analyze Earth’s geophysical parameters. These techniques differ for active and passive sensors because they depend strictly on the type of the measured physical quantity. In my P.h.D. work, inversion techniques for estimating Earth’s surface and atmosphere geophysical parameters will be addressed, emphasizing methods based on machine learning (ML). In particular, the study of cloud microphysics and the characterization of Earth’s surface changes phenomenon are the critical points of this work
Characterization of the Effectiveness of Reporting Lists of Small Feature Sets Relative to the Accuracy of the Prior Biological Knowledge
When confronted with a small sample, feature-selection algorithms often fail to find good feature sets, a problem exacerbated for high-dimensional data and large feature sets. The problem is compounded by the fact that, if one obtains a feature set with a low error estimate, the estimate is unreliable because training-data-based error estimators typically perform poorly on small samples, exhibiting optimistic bias or high variance. One way around the problem is limit the number of features being considered, restrict features sets to sizes such that all feature sets can be examined by exhaustive search, and report a list of the best performing feature sets. If the list is short, then it greatly restricts the possible feature sets to be considered as candidates; however, one can expect the lowest error estimates obtained to be optimistically biased so that there may not be a close-to-optimal feature set on the list. This paper provides a power analysis of this methodology; in particular, it examines the kind of results one should expect to obtain relative to the length of the list and the number of discriminating features among those considered. Two measures are employed. The first is the probability that there is at least one feature set on the list whose true classification error is within some given tolerance of the best feature set and the second is the expected number of feature sets on the list whose true errors are within the given tolerance of the best feature set. These values are plotted as functions of the list length to generate power curves. The results show that, if the number of discriminating features is not too small—that is, the prior biological knowledge is not too poor—then one should expect, with high probability, to find good feature sets
Anomaly Detection in Presence of Irrelevant Features
Experiments at particle colliders are the primary source of insight into
physics at microscopic scales. Searches at these facilities often rely on
optimization of analyses targeting specific models of new physics.
Increasingly, however, data-driven model-agnostic approaches based on machine
learning are also being explored. A major challenge is that such methods can be
highly sensitive to the presence of many irrelevant features in the data. This
paper presents Boosted Decision Tree (BDT)-based techniques to improve anomaly
detection in the presence of many irrelevant features. First, a BDT classifier
is shown to be more robust than neural networks for the Classification Without
Labels approach to finding resonant excesses assuming independence of resonant
and non-resonant observables. Next, a tree-based probability density estimator
using copula transformations demonstrates significant stability and improved
performance over normalizing flows as irrelevant features are added. The
results make a compelling case for further development of tree-based algorithms
for more robust resonant anomaly detection in high energy physics.Comment: 24 pages, 7 figures. v2: Figure 6 update
- …