245 research outputs found
A simple forward selection procedure based on false discovery rate control
We propose the use of a new false discovery rate (FDR) controlling procedure
as a model selection penalized method, and compare its performance to that of
other penalized methods over a wide range of realistic settings: nonorthogonal
design matrices, moderate and large pool of explanatory variables, and both
sparse and nonsparse models, in the sense that they may include a small and
large fraction of the potential variables (and even all). The comparison is
done by a comprehensive simulation study, using a quantitative framework for
performance comparisons in the form of empirical minimaxity relative to a
"random oracle": the oracle model selection performance on data dependent
forward selected family of potential models. We show that FDR based procedures
have good performance, and in particular the newly proposed method, emerges as
having empirical minimax performance. Interestingly, using FDR level of 0.05 is
a global best.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS194 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Revisiting Multi-Subject Random Effects in fMRI: Advocating Prevalence Estimation
Random Effects analysis has been introduced into fMRI research in order to
generalize findings from the study group to the whole population. Generalizing
findings is obviously harder than detecting activation in the study group since
in order to be significant, an activation has to be larger than the
inter-subject variability. Indeed, detected regions are smaller when using
random effect analysis versus fixed effects. The statistical assumptions behind
the classic random effects model are that the effect in each location is
normally distributed over subjects, and "activation" refers to a non-null mean
effect. We argue this model is unrealistic compared to the true population
variability, where, due to functional plasticity and registration anomalies, at
each brain location some of the subjects are active and some are not. We
propose a finite-Gaussian--mixture--random-effect. A model that amortizes
between-subject spatial disagreement and quantifies it using the "prevalence"
of activation at each location. This measure has several desirable properties:
(a) It is more informative than the typical active/inactive paradigm. (b) In
contrast to the hypothesis testing approach (thus t-maps) which are trivially
rejected for large sample sizes, the larger the sample size, the more
informative the prevalence statistic becomes.
In this work we present a formal definition and an estimation procedure of
this prevalence. The end result of the proposed analysis is a map of the
prevalence at locations with significant activation, highlighting activations
regions that are common over many brains
High-throughput data analysis in behavior genetics
In recent years, a growing need has arisen in different fields for the
development of computational systems for automated analysis of large amounts of
data (high-throughput). Dealing with nonstandard noise structure and outliers,
that could have been detected and corrected in manual analysis, must now be
built into the system with the aid of robust methods. We discuss such problems
and present insights and solutions in the context of behavior genetics, where
data consists of a time series of locations of a mouse in a circular arena. In
order to estimate the location, velocity and acceleration of the mouse, and
identify stops, we use a nonstandard mix of robust and resistant methods:
LOWESS and repeated running median. In addition, we argue that protection
against small deviations from experimental protocols can be handled
automatically using statistical methods. In our case, it is of biological
interest to measure a rodent's distance from the arena's wall, but this measure
is corrupted if the arena is not a perfect circle, as required in the protocol.
The problem is addressed by estimating robustly the actual boundary of the
arena and its center using a nonparametric regression quantile of the
behavioral data, with the aid of a fast algorithm developed for that purpose.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS304 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …