Search CORE

43 research outputs found

Supervised Principal Component Regression for Functional Responses with High Dimensional Predictors

Author: Kong Dehan
Sun Qiang
Zhang Xinyi
Publication venue
Publication date: 25/02/2022
Field of study

We propose a supervised principal component regression method for relating functional responses with high dimensional predictors. Unlike the conventional principal component analysis, the proposed method builds on a newly defined expected integrated residual sum of squares, which directly makes use of the association between the functional response and the predictors. Minimizing the integrated residual sum of squares gives the supervised principal components, which is equivalent to solving a sequence of nonconvex generalized Rayleigh quotient optimization problems. We reformulate the nonconvex optimization problems into a simultaneous linear regression with a sparse penalty to deal with high dimensional predictors. Theoretically, we show that the reformulated regression problem can recover the same supervised principal subspace under suitable conditions. Statistically, we establish non-asymptotic error bounds for the proposed estimators. We demonstrate the advantages of the proposed method through both numerical experiments and an application to the Human Connectome Project fMRI data

arXiv.org e-Print Archive

The synthetic instrument: From sparse association to sparse causation

Author: Kong Dehan
Tang Dingke
Wang Linbo
Publication venue
Publication date: 03/04/2023
Field of study

In many observational studies, researchers are often interested in studying the effects of multiple exposures on a single outcome. Standard approaches for high-dimensional data such as the lasso assume the associations between the exposures and the outcome are sparse. These methods, however, do not estimate the causal effects in the presence of unmeasured confounding. In this paper, we consider an alternative approach that assumes the causal effects in view are sparse. We show that with sparse causation, the causal effects are identifiable even with unmeasured confounding. At the core of our proposal is a novel device, called the synthetic instrument, that in contrast to standard instrumental variables, can be constructed using the observed exposures directly. We show that under linear structural equation models, the problem of causal effect estimation can be formulated as an

\ell_0

-penalization problem, and hence can be solved efficiently using off-the-shelf software. Simulations show that our approach outperforms state-of-art methods in both low-dimensional and high-dimensional settings. We further illustrate our method using a mouse obesity dataset

arXiv.org e-Print Archive

Causal Inference on Distribution Functions

Author: Kong Dehan
Lin Zhenhua
Wang Linbo
Publication venue
Publication date: 14/10/2021
Field of study

Understanding causal relationships is one of the most important goals of modern science. So far, the causal inference literature has focused almost exclusively on outcomes coming from the Euclidean space

\mathbb{R}^p

. However, it is increasingly common that complex datasets collected through electronic sources, such as wearable devices, cannot be represented as data points from

\mathbb{R}^p

. In this paper, we present a novel framework of causal effects for outcomes from the Wasserstein space of cumulative distribution functions, which in contrast to the Euclidean space, is non-linear. We develop doubly robust estimators and associated asymptotic theory for these causal effects. As an illustration, we use our framework to quantify the causal effect of marriage on physical activity patterns using wearable device data collected through the National Health and Nutrition Examination Survey

arXiv.org e-Print Archive

The Promises of Parallel Outcomes

Author: Kong Dehan
Wang Linbo
Zhou Ying
Publication venue
Publication date: 10/12/2020
Field of study

Unobserved confounding presents a major threat to the validity of causal inference from observational studies. In this paper, we introduce a novel framework that leverages the information in multiple parallel outcomes for identification and estimation of causal effects. Under a conditional independence structure among multiple parallel outcomes, we achieve nonparametric identification with at least three parallel outcomes. We further show that under a set of linear structural equation models, causal inference is possible with two parallel outcomes. We develop accompanying estimating procedures and evaluate their finite sample performance through simulation studies and a data application studying the causal effect of the tau protein level on various types of behavioral deficits

arXiv.org e-Print Archive

Beyond Scalar Treatment: A Causal Analysis of Hippocampal Atrophy on Behavioral Deficits in Alzheimer's Studies

Author: Kong Dehan
Wang Linbo
Yu Dengdeng
Zhu Hongtu
Publication venue
Publication date: 09/07/2020
Field of study

Alzheimer's disease is a progressive form of dementia that results in problems with memory, thinking and behavior. It often starts with abnormal aggregation and deposition of beta-amyloid and tau, followed by neuronal damage such as atrophy of the hippocampi, and finally leads to behavioral deficits. Despite significant progress in finding biomarkers associated with behavioral deficits, the underlying causal mechanism remains largely unknown. Here we investigate whether and how hippocampal atrophy contributes to behavioral deficits based on a large-scale observational study conducted by the Alzheimer's Disease Neuroimaging Initiative (ADNI). As a key novelty, we use 2D representations of the hippocampi, which allows us to better understand atrophy associated with different subregions. It, however, introduces methodological challenges as existing causal inference methods are not well suited for exploiting structural information embedded in the 2D exposures. Moreover, our data contain more than 6 million clinical and genetic covariates, necessitating appropriate confounder selection methods. We hence develop a novel two-step causal inference approach tailored for our ADNI data application. Analysis results suggest that atrophy of CA1 and subiculum subregions may cause more severe behavioral deficits compared to CA2 and CA3 subregions. We further evaluate our method using simulations and provide theoretical guarantees

arXiv.org e-Print Archive