43 research outputs found
Supervised Principal Component Regression for Functional Responses with High Dimensional Predictors
We propose a supervised principal component regression method for relating
functional responses with high dimensional predictors. Unlike the conventional
principal component analysis, the proposed method builds on a newly defined
expected integrated residual sum of squares, which directly makes use of the
association between the functional response and the predictors. Minimizing the
integrated residual sum of squares gives the supervised principal components,
which is equivalent to solving a sequence of nonconvex generalized Rayleigh
quotient optimization problems. We reformulate the nonconvex optimization
problems into a simultaneous linear regression with a sparse penalty to deal
with high dimensional predictors. Theoretically, we show that the reformulated
regression problem can recover the same supervised principal subspace under
suitable conditions. Statistically, we establish non-asymptotic error bounds
for the proposed estimators. We demonstrate the advantages of the proposed
method through both numerical experiments and an application to the Human
Connectome Project fMRI data
The synthetic instrument: From sparse association to sparse causation
In many observational studies, researchers are often interested in studying
the effects of multiple exposures on a single outcome. Standard approaches for
high-dimensional data such as the lasso assume the associations between the
exposures and the outcome are sparse. These methods, however, do not estimate
the causal effects in the presence of unmeasured confounding. In this paper, we
consider an alternative approach that assumes the causal effects in view are
sparse. We show that with sparse causation, the causal effects are identifiable
even with unmeasured confounding. At the core of our proposal is a novel
device, called the synthetic instrument, that in contrast to standard
instrumental variables, can be constructed using the observed exposures
directly. We show that under linear structural equation models, the problem of
causal effect estimation can be formulated as an -penalization problem,
and hence can be solved efficiently using off-the-shelf software. Simulations
show that our approach outperforms state-of-art methods in both low-dimensional
and high-dimensional settings. We further illustrate our method using a mouse
obesity dataset
Causal Inference on Distribution Functions
Understanding causal relationships is one of the most important goals of
modern science. So far, the causal inference literature has focused almost
exclusively on outcomes coming from the Euclidean space .
However, it is increasingly common that complex datasets collected through
electronic sources, such as wearable devices, cannot be represented as data
points from . In this paper, we present a novel framework of
causal effects for outcomes from the Wasserstein space of cumulative
distribution functions, which in contrast to the Euclidean space, is
non-linear. We develop doubly robust estimators and associated asymptotic
theory for these causal effects. As an illustration, we use our framework to
quantify the causal effect of marriage on physical activity patterns using
wearable device data collected through the National Health and Nutrition
Examination Survey
The Promises of Parallel Outcomes
Unobserved confounding presents a major threat to the validity of causal
inference from observational studies. In this paper, we introduce a novel
framework that leverages the information in multiple parallel outcomes for
identification and estimation of causal effects. Under a conditional
independence structure among multiple parallel outcomes, we achieve
nonparametric identification with at least three parallel outcomes. We further
show that under a set of linear structural equation models, causal inference is
possible with two parallel outcomes. We develop accompanying estimating
procedures and evaluate their finite sample performance through simulation
studies and a data application studying the causal effect of the tau protein
level on various types of behavioral deficits
Beyond Scalar Treatment: A Causal Analysis of Hippocampal Atrophy on Behavioral Deficits in Alzheimer's Studies
Alzheimer's disease is a progressive form of dementia that results in
problems with memory, thinking and behavior. It often starts with abnormal
aggregation and deposition of beta-amyloid and tau, followed by neuronal damage
such as atrophy of the hippocampi, and finally leads to behavioral deficits.
Despite significant progress in finding biomarkers associated with behavioral
deficits, the underlying causal mechanism remains largely unknown. Here we
investigate whether and how hippocampal atrophy contributes to behavioral
deficits based on a large-scale observational study conducted by the
Alzheimer's Disease Neuroimaging Initiative (ADNI). As a key novelty, we use 2D
representations of the hippocampi, which allows us to better understand atrophy
associated with different subregions. It, however, introduces methodological
challenges as existing causal inference methods are not well suited for
exploiting structural information embedded in the 2D exposures. Moreover, our
data contain more than 6 million clinical and genetic covariates, necessitating
appropriate confounder selection methods. We hence develop a novel two-step
causal inference approach tailored for our ADNI data application. Analysis
results suggest that atrophy of CA1 and subiculum subregions may cause more
severe behavioral deficits compared to CA2 and CA3 subregions. We further
evaluate our method using simulations and provide theoretical guarantees