627 research outputs found
Tensor Regression with Applications in Neuroimaging Data Analysis
Classical regression methods treat covariates as a vector and estimate a
corresponding vector of regression coefficients. Modern applications in medical
imaging generate covariates of more complex form such as multidimensional
arrays (tensors). Traditional statistical and computational methods are proving
insufficient for analysis of these high-throughput data due to their ultrahigh
dimensionality as well as complex structure. In this article, we propose a new
family of tensor regression models that efficiently exploit the special
structure of tensor covariates. Under this framework, ultrahigh dimensionality
is reduced to a manageable level, resulting in efficient estimation and
prediction. A fast and highly scalable estimation algorithm is proposed for
maximum likelihood estimation and its associated asymptotic properties are
studied. Effectiveness of the new methods is demonstrated on both synthetic and
real MRI imaging data.Comment: 27 pages, 4 figure
Challenges of Big Data Analysis
Big Data bring new opportunities to modern society and challenges to data
scientists. On one hand, Big Data hold great promises for discovering subtle
population patterns and heterogeneities that are not possible with small-scale
data. On the other hand, the massive sample size and high dimensionality of Big
Data introduce unique computational and statistical challenges, including
scalability and storage bottleneck, noise accumulation, spurious correlation,
incidental endogeneity, and measurement errors. These challenges are
distinguished and require new computational and statistical paradigm. This
article give overviews on the salient features of Big Data and how these
features impact on paradigm change on statistical and computational methods as
well as computing architectures. We also provide various new perspectives on
the Big Data analysis and computation. In particular, we emphasis on the
viability of the sparsest solution in high-confidence set and point out that
exogeneous assumptions in most statistical methods for Big Data can not be
validated due to incidental endogeneity. They can lead to wrong statistical
inferences and consequently wrong scientific conclusions
Identification of gene pathways implicated in Alzheimer's disease using longitudinal imaging phenotypes with sparse regression
We present a new method for the detection of gene pathways associated with a
multivariate quantitative trait, and use it to identify causal pathways
associated with an imaging endophenotype characteristic of longitudinal
structural change in the brains of patients with Alzheimer's disease (AD). Our
method, known as pathways sparse reduced-rank regression (PsRRR), uses group
lasso penalised regression to jointly model the effects of genome-wide single
nucleotide polymorphisms (SNPs), grouped into functional pathways using prior
knowledge of gene-gene interactions. Pathways are ranked in order of importance
using a resampling strategy that exploits finite sample variability. Our
application study uses whole genome scans and MR images from 464 subjects in
the Alzheimer's Disease Neuroimaging Initiative (ADNI) database. 66,182 SNPs
are mapped to 185 gene pathways from the KEGG pathways database. Voxel-wise
imaging signatures characteristic of AD are obtained by analysing 3D patterns
of structural change at 6, 12 and 24 months relative to baseline. High-ranking,
AD endophenotype-associated pathways in our study include those describing
chemokine, Jak-stat and insulin signalling pathways, and tight junction
interactions. All of these have been previously implicated in AD biology. In a
secondary analysis, we investigate SNPs and genes that may be driving pathway
selection, and identify a number of previously validated AD genes including
CR1, APOE and TOMM40
Local-Aggregate Modeling for Big-Data via Distributed Optimization: Applications to Neuroimaging
Technological advances have led to a proliferation of structured big data
that have matrix-valued covariates. We are specifically motivated to build
predictive models for multi-subject neuroimaging data based on each subject's
brain imaging scans. This is an ultra-high-dimensional problem that consists of
a matrix of covariates (brain locations by time points) for each subject; few
methods currently exist to fit supervised models directly to this tensor data.
We propose a novel modeling and algorithmic strategy to apply generalized
linear models (GLMs) to this massive tensor data in which one set of variables
is associated with locations. Our method begins by fitting GLMs to each
location separately, and then builds an ensemble by blending information across
locations through regularization with what we term an aggregating penalty. Our
so called, Local-Aggregate Model, can be fit in a completely distributed manner
over the locations using an Alternating Direction Method of Multipliers (ADMM)
strategy, and thus greatly reduces the computational burden. Furthermore, we
propose to select the appropriate model through a novel sequence of faster
algorithmic solutions that is similar to regularization paths. We will
demonstrate both the computational and predictive modeling advantages of our
methods via simulations and an EEG classification problem.Comment: 41 pages, 5 figures and 3 table
Continuation of Nesterov's Smoothing for Regression with Structured Sparsity in High-Dimensional Neuroimaging
Predictive models can be used on high-dimensional brain images for diagnosis
of a clinical condition. Spatial regularization through structured sparsity
offers new perspectives in this context and reduces the risk of overfitting the
model while providing interpretable neuroimaging signatures by forcing the
solution to adhere to domain-specific constraints. Total Variation (TV)
enforces spatial smoothness of the solution while segmenting predictive regions
from the background. We consider the problem of minimizing the sum of a smooth
convex loss, a non-smooth convex penalty (whose proximal operator is known) and
a wide range of possible complex, non-smooth convex structured penalties such
as TV or overlapping group Lasso. Existing solvers are either limited in the
functions they can minimize or in their practical capacity to scale to
high-dimensional imaging data. Nesterov's smoothing technique can be used to
minimize a large number of non-smooth convex structured penalties but
reasonable precision requires a small smoothing parameter, which slows down the
convergence speed. To benefit from the versatility of Nesterov's smoothing
technique, we propose a first order continuation algorithm, CONESTA, which
automatically generates a sequence of decreasing smoothing parameters. The
generated sequence maintains the optimal convergence speed towards any globally
desired precision. Our main contributions are: To propose an expression of the
duality gap to probe the current distance to the global optimum in order to
adapt the smoothing parameter and the convergence speed. We provide a
convergence rate, which is an improvement over classical proximal gradient
smoothing methods. We demonstrate on both simulated and high-dimensional
structural neuroimaging data that CONESTA significantly outperforms many
state-of-the-art solvers in regard to convergence speed and precision.Comment: 11 pages, 6 figures, accepted in IEEE TMI, IEEE Transactions on
Medical Imaging 201
- …