25,435 research outputs found
Accelerating Permutation Testing in Voxel-wise Analysis through Subspace Tracking: A new plugin for SnPM
Permutation testing is a non-parametric method for obtaining the max null
distribution used to compute corrected -values that provide strong control
of false positives. In neuroimaging, however, the computational burden of
running such an algorithm can be significant. We find that by viewing the
permutation testing procedure as the construction of a very large permutation
testing matrix, , one can exploit structural properties derived from the
data and the test statistics to reduce the runtime under certain conditions. In
particular, we see that is low-rank plus a low-variance residual. This
makes a good candidate for low-rank matrix completion, where only a very
small number of entries of ( of all entries in our experiments)
have to be computed to obtain a good estimate. Based on this observation, we
present RapidPT, an algorithm that efficiently recovers the max null
distribution commonly obtained through regular permutation testing in
voxel-wise analysis. We present an extensive validation on a synthetic dataset
and four varying sized datasets against two baselines: Statistical
NonParametric Mapping (SnPM13) and a standard permutation testing
implementation (referred as NaivePT). We find that RapidPT achieves its best
runtime performance on medium sized datasets (), with
speedups of 1.5x - 38x (vs. SnPM13) and 20x-1000x (vs. NaivePT). For larger
datasets () RapidPT outperforms NaivePT (6x - 200x) on all
datasets, and provides large speedups over SnPM13 when more than 10000
permutations (2x - 15x) are needed. The implementation is a standalone toolbox
and also integrated within SnPM13, able to leverage multi-core architectures
when available.Comment: 36 pages, 16 figure
A statistical method (cross-validation) for bone loss region detection after spaceflight.
Astronauts experience bone loss after the long spaceflight missions. Identifying specific regions that undergo the greatest losses (e.g. the proximal femur) could reveal information about the processes of bone loss in disuse and disease. Methods for detecting such regions, however, remains an open problem. This paper focuses on statistical methods to detect such regions. We perform statistical parametric mapping to get t-maps of changes in images, and propose a new cross-validation method to select an optimum suprathreshold for forming clusters of pixels. Once these candidate clusters are formed, we use permutation testing of longitudinal labels to derive significant changes
A generalization of moderated statistics to data adaptive semiparametric estimation in high-dimensional biology
The widespread availability of high-dimensional biological data has made the
simultaneous screening of numerous biological characteristics a central
statistical problem in computational biology. While the dimensionality of such
datasets continues to increase, the problem of teasing out the effects of
biomarkers in studies measuring baseline confounders while avoiding model
misspecification remains only partially addressed. Efficient estimators
constructed from data adaptive estimates of the data-generating distribution
provide an avenue for avoiding model misspecification; however, in the context
of high-dimensional problems requiring simultaneous estimation of numerous
parameters, standard variance estimators have proven unstable, resulting in
unreliable Type-I error control under standard multiple testing corrections. We
present the formulation of a general approach for applying empirical Bayes
shrinkage approaches to asymptotically linear estimators of parameters defined
in the nonparametric model. The proposal applies existing shrinkage estimators
to the estimated variance of the influence function, allowing for increased
inferential stability in high-dimensional settings. A methodology for
nonparametric variable importance analysis for use with high-dimensional
biological datasets with modest sample sizes is introduced and the proposed
technique is demonstrated to be robust in small samples even when relying on
data adaptive estimators that eschew parametric forms. Use of the proposed
variance moderation strategy in constructing stabilized variable importance
measures of biomarkers is demonstrated by application to an observational study
of occupational exposure. The result is a data adaptive approach for robustly
uncovering stable associations in high-dimensional data with limited sample
sizes
Parametric Alignment of Drosophila Genomes
The classic algorithms of Needleman--Wunsch and Smith--Waterman find a
maximum a posteriori probability alignment for a pair hidden Markov model
(PHMM). In order to process large genomes that have undergone complex genome
rearrangements, almost all existing whole genome alignment methods apply fast
heuristics to divide genomes into small pieces which are suitable for
Needleman--Wunsch alignment. In these alignment methods, it is standard
practice to fix the parameters and to produce a single alignment for subsequent
analysis by biologists.
Our main result is the construction of a whole genome parametric alignment of
Drosophila melanogaster and Drosophila pseudoobscura. Parametric alignment
resolves the issue of robustness to changes in parameters by finding all
optimal alignments for all possible parameters in a PHMM. Our alignment draws
on existing heuristics for dividing whole genomes into small pieces for
alignment, and it relies on advances we have made in computing convex polytopes
that allow us to parametrically align non-coding regions using biologically
realistic models. We demonstrate the utility of our parametric alignment for
biological inference by showing that cis-regulatory elements are more conserved
between Drosophila melanogaster and Drosophila pseudoobscura than previously
thought. We also show how whole genome parametric alignment can be used to
quantitatively assess the dependence of branch length estimates on alignment
parameters.
The alignment polytopes, software, and supplementary material can be
downloaded at http://bio.math.berkeley.edu/parametric/.Comment: 19 pages, 3 figure
Altered Neurocircuitry in the Dopamine Transporter Knockout Mouse Brain
The plasma membrane transporters for the monoamine neurotransmitters dopamine, serotonin, and norepinephrine modulate the dynamics of these monoamine neurotransmitters. Thus, activity of these transporters has significant consequences for monoamine activity throughout the brain and for a number of neurological and psychiatric disorders. Gene knockout (KO) mice that reduce or eliminate expression of each of these monoamine transporters have provided a wealth of new information about the function of these proteins at molecular, physiological and behavioral levels. In the present work we use the unique properties of magnetic resonance imaging (MRI) to probe the effects of altered dopaminergic dynamics on meso-scale neuronal circuitry and overall brain morphology, since changes at these levels of organization might help to account for some of the extensive pharmacological and behavioral differences observed in dopamine transporter (DAT) KO mice. Despite the smaller size of these animals, voxel-wise statistical comparison of high resolution structural MR images indicated little morphological change as a consequence of DAT KO. Likewise, proton magnetic resonance spectra recorded in the striatum indicated no significant changes in detectable metabolite concentrations between DAT KO and wild-type (WT) mice. In contrast, alterations in the circuitry from the prefrontal cortex to the mesocortical limbic system, an important brain component intimately tied to function of mesolimbic/mesocortical dopamine reward pathways, were revealed by manganese-enhanced MRI (MEMRI). Analysis of co-registered MEMRI images taken over the 26 hours after introduction of Mn^(2+) into the prefrontal cortex indicated that DAT KO mice have a truncated Mn^(2+) distribution within this circuitry with little accumulation beyond the thalamus or contralateral to the injection site. By contrast, WT littermates exhibit Mn^(2+) transport into more posterior midbrain nuclei and contralateral mesolimbic structures at 26 hr post-injection. Thus, DAT KO mice appear, at this level of anatomic resolution, to have preserved cortico-striatal-thalamic connectivity but diminished robustness of reward-modulating circuitry distal to the thalamus. This is in contradistinction to the state of this circuitry in serotonin transporter KO mice where we observed more robust connectivity in more posterior brain regions using methods identical to those employed here
GOexpress: an R/Bioconductor package for the identification and visualisation of robust gene ontology signatures through supervised learning of gene expression data
Background: Identification of gene expression profiles that differentiate experimental groups is critical for discovery and analysis of key molecular pathways and also for selection of robust diagnostic or prognostic biomarkers. While integration of differential expression statistics has been used to refine gene set enrichment analyses, such approaches are typically limited to single gene lists resulting from simple two-group comparisons or time-series analyses. In contrast, functional class scoring and machine learning approaches provide powerful alternative methods to leverage molecular measurements for pathway analyses, and to compare continuous and multi-level categorical factors. Results: We introduce GOexpress, a software package for scoring and summarising the capacity of gene ontology features to simultaneously classify samples from multiple experimental groups. GOexpress integrates normalised gene expression data (e.g., from microarray and RNA-seq experiments) and phenotypic information of individual samples with gene ontology annotations to derive a ranking of genes and gene ontology terms using a supervised learning approach. The default random forest algorithm allows interactions between all experimental factors, and competitive scoring of expressed genes to evaluate their relative importance in classifying predefined groups of samples. Conclusions: GOexpress enables rapid identification and visualisation of ontology-related gene panels that robustly classify groups of samples and supports both categorical (e.g., infection status, treatment) and continuous (e.g., time-series, drug concentrations) experimental factors. The use of standard Bioconductor extension packages and publicly available gene ontology annotations facilitates straightforward integration of GOexpress within existing computational biology pipelines.Department of Agriculture, Food and the MarineEuropean Commission - Seventh Framework Programme (FP7)Science Foundation IrelandUniversity College Dubli
Statistical Analysis of fMRI Time-Series: A Critical Review of the GLM Approach.
Functional magnetic resonance imaging (fMRI) is one of the most widely used tools to study the neural underpinnings of human cognition. Standard analysis of fMRI data relies on a general linear model (GLM) approach to separate stimulus induced signals from noise. Crucially, this approach relies on a number of assumptions about the data which, for inferences to be valid, must be met. The current paper reviews the GLM approach to analysis of fMRI time-series, focusing in particular on the degree to which such data abides by the assumptions of the GLM framework, and on the methods that have been developed to correct for any violation of those assumptions. Rather than biasing estimates of effect size, the major consequence of non-conformity to the assumptions is to introduce bias into estimates of the variance, thus affecting test statistics, power, and false positive rates. Furthermore, this bias can have pervasive effects on both individual subject and group-level statistics, potentially yielding qualitatively different results across replications, especially after the thresholding procedures commonly used for inference-making
- …