25,435 research outputs found

    Accelerating Permutation Testing in Voxel-wise Analysis through Subspace Tracking: A new plugin for SnPM

    Get PDF
    Permutation testing is a non-parametric method for obtaining the max null distribution used to compute corrected pp-values that provide strong control of false positives. In neuroimaging, however, the computational burden of running such an algorithm can be significant. We find that by viewing the permutation testing procedure as the construction of a very large permutation testing matrix, TT, one can exploit structural properties derived from the data and the test statistics to reduce the runtime under certain conditions. In particular, we see that TT is low-rank plus a low-variance residual. This makes TT a good candidate for low-rank matrix completion, where only a very small number of entries of TT (0.35%\sim0.35\% of all entries in our experiments) have to be computed to obtain a good estimate. Based on this observation, we present RapidPT, an algorithm that efficiently recovers the max null distribution commonly obtained through regular permutation testing in voxel-wise analysis. We present an extensive validation on a synthetic dataset and four varying sized datasets against two baselines: Statistical NonParametric Mapping (SnPM13) and a standard permutation testing implementation (referred as NaivePT). We find that RapidPT achieves its best runtime performance on medium sized datasets (50n20050 \leq n \leq 200), with speedups of 1.5x - 38x (vs. SnPM13) and 20x-1000x (vs. NaivePT). For larger datasets (n200n \geq 200) RapidPT outperforms NaivePT (6x - 200x) on all datasets, and provides large speedups over SnPM13 when more than 10000 permutations (2x - 15x) are needed. The implementation is a standalone toolbox and also integrated within SnPM13, able to leverage multi-core architectures when available.Comment: 36 pages, 16 figure

    A statistical method (cross-validation) for bone loss region detection after spaceflight.

    Get PDF
    Astronauts experience bone loss after the long spaceflight missions. Identifying specific regions that undergo the greatest losses (e.g. the proximal femur) could reveal information about the processes of bone loss in disuse and disease. Methods for detecting such regions, however, remains an open problem. This paper focuses on statistical methods to detect such regions. We perform statistical parametric mapping to get t-maps of changes in images, and propose a new cross-validation method to select an optimum suprathreshold for forming clusters of pixels. Once these candidate clusters are formed, we use permutation testing of longitudinal labels to derive significant changes

    A generalization of moderated statistics to data adaptive semiparametric estimation in high-dimensional biology

    Full text link
    The widespread availability of high-dimensional biological data has made the simultaneous screening of numerous biological characteristics a central statistical problem in computational biology. While the dimensionality of such datasets continues to increase, the problem of teasing out the effects of biomarkers in studies measuring baseline confounders while avoiding model misspecification remains only partially addressed. Efficient estimators constructed from data adaptive estimates of the data-generating distribution provide an avenue for avoiding model misspecification; however, in the context of high-dimensional problems requiring simultaneous estimation of numerous parameters, standard variance estimators have proven unstable, resulting in unreliable Type-I error control under standard multiple testing corrections. We present the formulation of a general approach for applying empirical Bayes shrinkage approaches to asymptotically linear estimators of parameters defined in the nonparametric model. The proposal applies existing shrinkage estimators to the estimated variance of the influence function, allowing for increased inferential stability in high-dimensional settings. A methodology for nonparametric variable importance analysis for use with high-dimensional biological datasets with modest sample sizes is introduced and the proposed technique is demonstrated to be robust in small samples even when relying on data adaptive estimators that eschew parametric forms. Use of the proposed variance moderation strategy in constructing stabilized variable importance measures of biomarkers is demonstrated by application to an observational study of occupational exposure. The result is a data adaptive approach for robustly uncovering stable associations in high-dimensional data with limited sample sizes

    Parametric Alignment of Drosophila Genomes

    Get PDF
    The classic algorithms of Needleman--Wunsch and Smith--Waterman find a maximum a posteriori probability alignment for a pair hidden Markov model (PHMM). In order to process large genomes that have undergone complex genome rearrangements, almost all existing whole genome alignment methods apply fast heuristics to divide genomes into small pieces which are suitable for Needleman--Wunsch alignment. In these alignment methods, it is standard practice to fix the parameters and to produce a single alignment for subsequent analysis by biologists. Our main result is the construction of a whole genome parametric alignment of Drosophila melanogaster and Drosophila pseudoobscura. Parametric alignment resolves the issue of robustness to changes in parameters by finding all optimal alignments for all possible parameters in a PHMM. Our alignment draws on existing heuristics for dividing whole genomes into small pieces for alignment, and it relies on advances we have made in computing convex polytopes that allow us to parametrically align non-coding regions using biologically realistic models. We demonstrate the utility of our parametric alignment for biological inference by showing that cis-regulatory elements are more conserved between Drosophila melanogaster and Drosophila pseudoobscura than previously thought. We also show how whole genome parametric alignment can be used to quantitatively assess the dependence of branch length estimates on alignment parameters. The alignment polytopes, software, and supplementary material can be downloaded at http://bio.math.berkeley.edu/parametric/.Comment: 19 pages, 3 figure

    Altered Neurocircuitry in the Dopamine Transporter Knockout Mouse Brain

    Get PDF
    The plasma membrane transporters for the monoamine neurotransmitters dopamine, serotonin, and norepinephrine modulate the dynamics of these monoamine neurotransmitters. Thus, activity of these transporters has significant consequences for monoamine activity throughout the brain and for a number of neurological and psychiatric disorders. Gene knockout (KO) mice that reduce or eliminate expression of each of these monoamine transporters have provided a wealth of new information about the function of these proteins at molecular, physiological and behavioral levels. In the present work we use the unique properties of magnetic resonance imaging (MRI) to probe the effects of altered dopaminergic dynamics on meso-scale neuronal circuitry and overall brain morphology, since changes at these levels of organization might help to account for some of the extensive pharmacological and behavioral differences observed in dopamine transporter (DAT) KO mice. Despite the smaller size of these animals, voxel-wise statistical comparison of high resolution structural MR images indicated little morphological change as a consequence of DAT KO. Likewise, proton magnetic resonance spectra recorded in the striatum indicated no significant changes in detectable metabolite concentrations between DAT KO and wild-type (WT) mice. In contrast, alterations in the circuitry from the prefrontal cortex to the mesocortical limbic system, an important brain component intimately tied to function of mesolimbic/mesocortical dopamine reward pathways, were revealed by manganese-enhanced MRI (MEMRI). Analysis of co-registered MEMRI images taken over the 26 hours after introduction of Mn^(2+) into the prefrontal cortex indicated that DAT KO mice have a truncated Mn^(2+) distribution within this circuitry with little accumulation beyond the thalamus or contralateral to the injection site. By contrast, WT littermates exhibit Mn^(2+) transport into more posterior midbrain nuclei and contralateral mesolimbic structures at 26 hr post-injection. Thus, DAT KO mice appear, at this level of anatomic resolution, to have preserved cortico-striatal-thalamic connectivity but diminished robustness of reward-modulating circuitry distal to the thalamus. This is in contradistinction to the state of this circuitry in serotonin transporter KO mice where we observed more robust connectivity in more posterior brain regions using methods identical to those employed here

    GOexpress: an R/Bioconductor package for the identification and visualisation of robust gene ontology signatures through supervised learning of gene expression data

    Get PDF
    Background: Identification of gene expression profiles that differentiate experimental groups is critical for discovery and analysis of key molecular pathways and also for selection of robust diagnostic or prognostic biomarkers. While integration of differential expression statistics has been used to refine gene set enrichment analyses, such approaches are typically limited to single gene lists resulting from simple two-group comparisons or time-series analyses. In contrast, functional class scoring and machine learning approaches provide powerful alternative methods to leverage molecular measurements for pathway analyses, and to compare continuous and multi-level categorical factors. Results: We introduce GOexpress, a software package for scoring and summarising the capacity of gene ontology features to simultaneously classify samples from multiple experimental groups. GOexpress integrates normalised gene expression data (e.g., from microarray and RNA-seq experiments) and phenotypic information of individual samples with gene ontology annotations to derive a ranking of genes and gene ontology terms using a supervised learning approach. The default random forest algorithm allows interactions between all experimental factors, and competitive scoring of expressed genes to evaluate their relative importance in classifying predefined groups of samples. Conclusions: GOexpress enables rapid identification and visualisation of ontology-related gene panels that robustly classify groups of samples and supports both categorical (e.g., infection status, treatment) and continuous (e.g., time-series, drug concentrations) experimental factors. The use of standard Bioconductor extension packages and publicly available gene ontology annotations facilitates straightforward integration of GOexpress within existing computational biology pipelines.Department of Agriculture, Food and the MarineEuropean Commission - Seventh Framework Programme (FP7)Science Foundation IrelandUniversity College Dubli

    Statistical Analysis of fMRI Time-Series: A Critical Review of the GLM Approach.

    Get PDF
    Functional magnetic resonance imaging (fMRI) is one of the most widely used tools to study the neural underpinnings of human cognition. Standard analysis of fMRI data relies on a general linear model (GLM) approach to separate stimulus induced signals from noise. Crucially, this approach relies on a number of assumptions about the data which, for inferences to be valid, must be met. The current paper reviews the GLM approach to analysis of fMRI time-series, focusing in particular on the degree to which such data abides by the assumptions of the GLM framework, and on the methods that have been developed to correct for any violation of those assumptions. Rather than biasing estimates of effect size, the major consequence of non-conformity to the assumptions is to introduce bias into estimates of the variance, thus affecting test statistics, power, and false positive rates. Furthermore, this bias can have pervasive effects on both individual subject and group-level statistics, potentially yielding qualitatively different results across replications, especially after the thresholding procedures commonly used for inference-making
    corecore