Search CORE

344 research outputs found

Accelerating Permutation Testing in Voxel-wise Analysis through Subspace Tracking: A new plugin for SnPM

Author: Arndt
Balzano
Bennett
Candès
Candès
Chandrasekaran
Cheverud
Dahl
Edgington
Edgington
Edgington
Efron
Eklund
Eklund
Eklund
Fazel
Friston
FSL
Gaonkar
Gutierrez-Barragan
Halber
He
Hinrichs
Hochberg
Holmes
Ji
Jockel
Knijnenburg
Li
Nichols
Nichols
Recht
Recht
SnPM
SPM
Subramanian
Taylor
Winkler
Winkler
Worsley
Worsley
Publication venue: 'Elsevier BV'
Publication date: 15/07/2017
Field of study

Permutation testing is a non-parametric method for obtaining the max null distribution used to compute corrected

p

-values that provide strong control of false positives. In neuroimaging, however, the computational burden of running such an algorithm can be significant. We find that by viewing the permutation testing procedure as the construction of a very large permutation testing matrix,

T

, one can exploit structural properties derived from the data and the test statistics to reduce the runtime under certain conditions. In particular, we see that

T

is low-rank plus a low-variance residual. This makes

T

a good candidate for low-rank matrix completion, where only a very small number of entries of

T

(

\sim0.35\%

of all entries in our experiments) have to be computed to obtain a good estimate. Based on this observation, we present RapidPT, an algorithm that efficiently recovers the max null distribution commonly obtained through regular permutation testing in voxel-wise analysis. We present an extensive validation on a synthetic dataset and four varying sized datasets against two baselines: Statistical NonParametric Mapping (SnPM13) and a standard permutation testing implementation (referred as NaivePT). We find that RapidPT achieves its best runtime performance on medium sized datasets (

50 \leq n \leq 200

), with speedups of 1.5x - 38x (vs. SnPM13) and 20x-1000x (vs. NaivePT). For larger datasets (

n \geq 200

) RapidPT outperforms NaivePT (6x - 200x) on all datasets, and provides large speedups over SnPM13 when more than 10000 permutations (2x - 15x) are needed. The implementation is a standalone toolbox and also integrated within SnPM13, able to leverage multi-core architectures when available.Comment: 36 pages, 16 figure

arXiv.org e-Print Archive

Warwick Research Archives Portal Repository

Bayesian, and Non-Bayesian, Cause-Specific Competing-Risk Analysis for Parametric and Nonparametric Survival Functions: The R Package CFC

Author: Mahani Alireza S.
Sharabiani Mansour T. A.
Publication venue: 'Foundation for Open Access Statistic'
Publication date: 27/05/2019
Field of study

The R package CFC performs cause-specific, competing-risk survival analysis by computing cumulative incidence functions from unadjusted, cause-specific survival functions. A high-level API in CFC enables end-to-end survival and competing-risk analysis, using a single-line function call, based on the parametric survival regression models in the survival package. A low-level API allows users to achieve more flexibility by supplying their custom survival functions, perhaps in a Bayesian setting. Utility methods for summarizing and plotting the output allow population-average cumulative incidence functions to be calculated, visualized and compared to unadjusted survival curves. Numerical and computational optimization strategies are employed for efficient and reliable computation of the coupled integrals involved. To address potential integrable singularities caused by infinite cause-specific hazards, particularly near time-from-index of zero, integrals are transformed to remove their dependency on hazard functions, making them solely functions of causespecific, unadjusted survival functions. This implicit variable transformation also provides for easier extensibility of CFC to handle custom survival models since it only requires the users to implement a maximum of one function per cause. The transformed integrals are numerically calculated using a generalization of Simpson's rule to handle the implicit change of variable from time to survival, while a generalized trapezoidal rule is used as reference for error calculation. An OpenMP-parallelized, efficient C++ implementation - using packages Rcpp and RcppArmadillo - makes the application of CFC in Bayesian settings practical, where a potentially large number of samples represent the posterior distribution of cause-specific survival functions

Journal of Statistical Software

Massively parallel approximate Gaussian process regression

Author: Gramacy Robert B.
Niemi Jarad
Weiss Robin M.
Publication venue
Publication date: 04/06/2014
Field of study

We explore how the big-three computing paradigms -- symmetric multi-processor (SMC), graphical processing units (GPUs), and cluster computing -- can together be brought to bare on large-data Gaussian processes (GP) regression problems via a careful implementation of a newly developed local approximation scheme. Our methodological contribution focuses primarily on GPU computation, as this requires the most care and also provides the largest performance boost. However, in our empirical work we study the relative merits of all three paradigms to determine how best to combine them. The paper concludes with two case studies. One is a real data fluid-dynamics computer experiment which benefits from the local nature of our approximation; the second is a synthetic data example designed to find the largest design for which (accurate) GP emulation can performed on a commensurate predictive set under an hour.Comment: 24 pages, 6 figures, 1 tabl

arXiv.org e-Print Archive

Ancestral Causal Inference

Author: Claassen Tom
Magliacane Sara
Mooij Joris M.
Publication venue
Publication date: 01/01/2016
Field of study

Constraint-based causal discovery from limited data is a notoriously difficult challenge due to the many borderline independence test decisions. Several approaches to improve the reliability of the predictions by exploiting redundancy in the independence information have been proposed recently. Though promising, existing approaches can still be greatly improved in terms of accuracy and scalability. We present a novel method that reduces the combinatorial explosion of the search space by using a more coarse-grained representation of causal information, drastically reducing computation time. Additionally, we propose a method to score causal predictions based on their confidence. Crucially, our implementation also allows one to easily combine observational and interventional data and to incorporate various types of available background knowledge. We prove soundness and asymptotic consistency of our method and demonstrate that it can outperform the state-of-the-art on synthetic data, achieving a speedup of several orders of magnitude. We illustrate its practical feasibility by applying it on a challenging protein data set.Comment: In Proceedings of Advances in Neural Information Processing Systems 29 (NIPS 2016

arXiv.org e-Print Archive

UvA-DARE