6,199 research outputs found
A Multiscale Approach for Statistical Characterization of Functional Images
Increasingly, scientific studies yield functional image data, in which the observed data consist of sets of curves recorded on the pixels of the image. Examples include temporal brain response intensities measured by fMRI and NMR frequency spectra measured at each pixel. This article presents a new methodology for improving the characterization of pixels in functional imaging, formulated as a spatial curve clustering problem. Our method operates on curves as a unit. It is nonparametric and involves multiple stages: (i) wavelet thresholding, aggregation, and Neyman truncation to effectively reduce dimensionality; (ii) clustering based on an extended EM algorithm; and (iii) multiscale penalized dyadic partitioning to create a spatial segmentation. We motivate the different stages with theoretical considerations and arguments, and illustrate the overall procedure on simulated and real datasets. Our method appears to offer substantial improvements over monoscale pixel-wise methods. An Appendix which gives some theoretical justifications of the methodology, computer code, documentation and dataset are available in the online supplements
Multiple Testing for Neuroimaging via Hidden Markov Random Field
Traditional voxel-level multiple testing procedures in neuroimaging, mostly
-value based, often ignore the spatial correlations among neighboring voxels
and thus suffer from substantial loss of power. We extend the
local-significance-index based procedure originally developed for the hidden
Markov chain models, which aims to minimize the false nondiscovery rate subject
to a constraint on the false discovery rate, to three-dimensional neuroimaging
data using a hidden Markov random field model. A generalized
expectation-maximization algorithm for maximizing the penalized likelihood is
proposed for estimating the model parameters. Extensive simulations show that
the proposed approach is more powerful than conventional false discovery rate
procedures. We apply the method to the comparison between mild cognitive
impairment, a disease status with increased risk of developing Alzheimer's or
another dementia, and normal controls in the FDG-PET imaging study of the
Alzheimer's Disease Neuroimaging Initiative.Comment: A MATLAB package implementing the proposed FDR procedure is available
with this paper at the Biometrics website on Wiley Online Librar
Foundational principles for large scale inference: Illustrations through correlation mining
When can reliable inference be drawn in the "Big Data" context? This paper
presents a framework for answering this fundamental question in the context of
correlation mining, with implications for general large scale inference. In
large scale data applications like genomics, connectomics, and eco-informatics
the dataset is often variable-rich but sample-starved: a regime where the
number of acquired samples (statistical replicates) is far fewer than the
number of observed variables (genes, neurons, voxels, or chemical
constituents). Much of recent work has focused on understanding the
computational complexity of proposed methods for "Big Data." Sample complexity
however has received relatively less attention, especially in the setting when
the sample size is fixed, and the dimension grows without bound. To
address this gap, we develop a unified statistical framework that explicitly
quantifies the sample complexity of various inferential tasks. Sampling regimes
can be divided into several categories: 1) the classical asymptotic regime
where the variable dimension is fixed and the sample size goes to infinity; 2)
the mixed asymptotic regime where both variable dimension and sample size go to
infinity at comparable rates; 3) the purely high dimensional asymptotic regime
where the variable dimension goes to infinity and the sample size is fixed.
Each regime has its niche but only the latter regime applies to exa-scale data
dimension. We illustrate this high dimensional framework for the problem of
correlation mining, where it is the matrix of pairwise and partial correlations
among the variables that are of interest. We demonstrate various regimes of
correlation mining based on the unifying perspective of high dimensional
learning rates and sample complexity for different structured covariance models
and different inference tasks
- âŠ