344 research outputs found
Accelerating Permutation Testing in Voxel-wise Analysis through Subspace Tracking: A new plugin for SnPM
Permutation testing is a non-parametric method for obtaining the max null
distribution used to compute corrected -values that provide strong control
of false positives. In neuroimaging, however, the computational burden of
running such an algorithm can be significant. We find that by viewing the
permutation testing procedure as the construction of a very large permutation
testing matrix, , one can exploit structural properties derived from the
data and the test statistics to reduce the runtime under certain conditions. In
particular, we see that is low-rank plus a low-variance residual. This
makes a good candidate for low-rank matrix completion, where only a very
small number of entries of ( of all entries in our experiments)
have to be computed to obtain a good estimate. Based on this observation, we
present RapidPT, an algorithm that efficiently recovers the max null
distribution commonly obtained through regular permutation testing in
voxel-wise analysis. We present an extensive validation on a synthetic dataset
and four varying sized datasets against two baselines: Statistical
NonParametric Mapping (SnPM13) and a standard permutation testing
implementation (referred as NaivePT). We find that RapidPT achieves its best
runtime performance on medium sized datasets (), with
speedups of 1.5x - 38x (vs. SnPM13) and 20x-1000x (vs. NaivePT). For larger
datasets () RapidPT outperforms NaivePT (6x - 200x) on all
datasets, and provides large speedups over SnPM13 when more than 10000
permutations (2x - 15x) are needed. The implementation is a standalone toolbox
and also integrated within SnPM13, able to leverage multi-core architectures
when available.Comment: 36 pages, 16 figure
Bayesian, and Non-Bayesian, Cause-Specific Competing-Risk Analysis for Parametric and Nonparametric Survival Functions: The R Package CFC
The R package CFC performs cause-specific, competing-risk survival analysis by computing cumulative incidence functions from unadjusted, cause-specific survival functions. A high-level API in CFC enables end-to-end survival and competing-risk analysis, using a single-line function call, based on the parametric survival regression models in the survival package. A low-level API allows users to achieve more flexibility by supplying their custom survival functions, perhaps in a Bayesian setting. Utility methods for summarizing and plotting the output allow population-average cumulative incidence functions to be calculated, visualized and compared to unadjusted survival curves. Numerical and computational optimization strategies are employed for efficient and reliable computation of the coupled integrals involved. To address potential integrable singularities caused by infinite cause-specific hazards, particularly near time-from-index of zero, integrals are transformed to remove their dependency on hazard functions, making them solely functions of causespecific, unadjusted survival functions. This implicit variable transformation also provides for easier extensibility of CFC to handle custom survival models since it only requires the users to implement a maximum of one function per cause. The transformed integrals are numerically calculated using a generalization of Simpson's rule to handle the implicit change of variable from time to survival, while a generalized trapezoidal rule is used as reference for error calculation. An OpenMP-parallelized, efficient C++ implementation - using packages Rcpp and RcppArmadillo - makes the application of CFC in Bayesian settings practical, where a potentially large number of samples represent the posterior distribution of cause-specific survival functions
Massively parallel approximate Gaussian process regression
We explore how the big-three computing paradigms -- symmetric multi-processor
(SMC), graphical processing units (GPUs), and cluster computing -- can together
be brought to bare on large-data Gaussian processes (GP) regression problems
via a careful implementation of a newly developed local approximation scheme.
Our methodological contribution focuses primarily on GPU computation, as this
requires the most care and also provides the largest performance boost.
However, in our empirical work we study the relative merits of all three
paradigms to determine how best to combine them. The paper concludes with two
case studies. One is a real data fluid-dynamics computer experiment which
benefits from the local nature of our approximation; the second is a synthetic
data example designed to find the largest design for which (accurate) GP
emulation can performed on a commensurate predictive set under an hour.Comment: 24 pages, 6 figures, 1 tabl
Ancestral Causal Inference
Constraint-based causal discovery from limited data is a notoriously
difficult challenge due to the many borderline independence test decisions.
Several approaches to improve the reliability of the predictions by exploiting
redundancy in the independence information have been proposed recently. Though
promising, existing approaches can still be greatly improved in terms of
accuracy and scalability. We present a novel method that reduces the
combinatorial explosion of the search space by using a more coarse-grained
representation of causal information, drastically reducing computation time.
Additionally, we propose a method to score causal predictions based on their
confidence. Crucially, our implementation also allows one to easily combine
observational and interventional data and to incorporate various types of
available background knowledge. We prove soundness and asymptotic consistency
of our method and demonstrate that it can outperform the state-of-the-art on
synthetic data, achieving a speedup of several orders of magnitude. We
illustrate its practical feasibility by applying it on a challenging protein
data set.Comment: In Proceedings of Advances in Neural Information Processing Systems
29 (NIPS 2016
- …