245 research outputs found

    A simple forward selection procedure based on false discovery rate control

    Full text link
    We propose the use of a new false discovery rate (FDR) controlling procedure as a model selection penalized method, and compare its performance to that of other penalized methods over a wide range of realistic settings: nonorthogonal design matrices, moderate and large pool of explanatory variables, and both sparse and nonsparse models, in the sense that they may include a small and large fraction of the potential variables (and even all). The comparison is done by a comprehensive simulation study, using a quantitative framework for performance comparisons in the form of empirical minimaxity relative to a "random oracle": the oracle model selection performance on data dependent forward selected family of potential models. We show that FDR based procedures have good performance, and in particular the newly proposed method, emerges as having empirical minimax performance. Interestingly, using FDR level of 0.05 is a global best.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS194 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Revisiting Multi-Subject Random Effects in fMRI: Advocating Prevalence Estimation

    Full text link
    Random Effects analysis has been introduced into fMRI research in order to generalize findings from the study group to the whole population. Generalizing findings is obviously harder than detecting activation in the study group since in order to be significant, an activation has to be larger than the inter-subject variability. Indeed, detected regions are smaller when using random effect analysis versus fixed effects. The statistical assumptions behind the classic random effects model are that the effect in each location is normally distributed over subjects, and "activation" refers to a non-null mean effect. We argue this model is unrealistic compared to the true population variability, where, due to functional plasticity and registration anomalies, at each brain location some of the subjects are active and some are not. We propose a finite-Gaussian--mixture--random-effect. A model that amortizes between-subject spatial disagreement and quantifies it using the "prevalence" of activation at each location. This measure has several desirable properties: (a) It is more informative than the typical active/inactive paradigm. (b) In contrast to the hypothesis testing approach (thus t-maps) which are trivially rejected for large sample sizes, the larger the sample size, the more informative the prevalence statistic becomes. In this work we present a formal definition and an estimation procedure of this prevalence. The end result of the proposed analysis is a map of the prevalence at locations with significant activation, highlighting activations regions that are common over many brains

    High-throughput data analysis in behavior genetics

    Full text link
    In recent years, a growing need has arisen in different fields for the development of computational systems for automated analysis of large amounts of data (high-throughput). Dealing with nonstandard noise structure and outliers, that could have been detected and corrected in manual analysis, must now be built into the system with the aid of robust methods. We discuss such problems and present insights and solutions in the context of behavior genetics, where data consists of a time series of locations of a mouse in a circular arena. In order to estimate the location, velocity and acceleration of the mouse, and identify stops, we use a nonstandard mix of robust and resistant methods: LOWESS and repeated running median. In addition, we argue that protection against small deviations from experimental protocols can be handled automatically using statistical methods. In our case, it is of biological interest to measure a rodent's distance from the arena's wall, but this measure is corrupted if the arena is not a perfect circle, as required in the protocol. The problem is addressed by estimating robustly the actual boundary of the arena and its center using a nonparametric regression quantile of the behavioral data, with the aid of a fast algorithm developed for that purpose.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS304 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org
    • …
    corecore