29 research outputs found
Special Topics in Latent Variable Models with Spatially and Temporally Correlated Latent Variables
The term latent variable model (LVM) refers to any statistical procedure that utilizes information contained in a set of observed variables to construct a set of underlying latent variables that drive the observed values and associations. Independent component analysis (ICA) is a LVM that separates recorded mixtures of signals into independent source signals, called independent components (ICs). ICA is popular tool for separating brain signals of interest from artifacts and noise in electroencephalogram (EEG) data. Due to challenges in the estimation of uncertainties in ICA, standard errors are not generally estimated alongside ICA estimates and thus ICs representing brain signals of interest cannot be distinguished through a statistical hypothesis testing framework. In Chapter 2 of this dissertation, we propose a bootstrapping algorithm for ICA that produces bootstrap samples that retain critical correlation structures in the data. These are used to compute uncertainties for ICA parameter estimates and to construct a hypothesis test to identify ICs representing brain activity, which we demonstrate in the context of EEG functional connectivity. In Chapter 3, we extend this bootstrapping approach to accommodate pre-ICA dimension reduction procedures, and we use the resulting method to compare popular strategies for pre-ICA dimension reduction in EEG research. In the final chapter, we turn our attention to another LVM, factor analysis, which utilizes the covariance structure of a set of correlated observed variables to model a smaller number of unmeasured underlying variables. A spatial factor analysis (SFA) model can be used to quantify the social vulnerability of communities based on a set of observed social variables. Current SFA methodology is ill-equipped to handle spatial misalignment in the observed variables. We propose a joint spatial factor analysis model that identifies a common set of latent variables underlying spatially misaligned observed variables and produces results at the level of the smallest spatial units, thereby minimizing loss of information. We apply this model to spatially misaligned data to construct an index of community social vulnerability for Louisiana, which we integrate with Louisiana flood data to identify communities at high risk during natural disasters, based on both social and geographic features.Doctor of Philosoph
Causal inference and machine learning approaches for evaluation of the health impacts of large-scale air quality regulations
Causal exposure-response curve estimation with surrogate confounders: a study of air pollution and children's health in Medicaid claims data
In this paper, we undertake a case study in which interest lies in estimating
a causal exposure-response function (ERF) for long-term exposure to fine
particulate matter (PM) and respiratory hospitalizations in
socioeconomically disadvantaged children using nationwide Medicaid claims data.
New methods are needed to address the specific challenges the Medicaid data
present. First, Medicaid eligibility criteria, which are largely based on
family income for children, differ by state, creating socioeconomically
distinct populations and leading to clustered data, where zip codes (our units
of analysis) are nested within states. Second, Medicaid enrollees'
individual-level socioeconomic status, which is known to be a confounder and an
effect modifier of the exposure-response relationships under study, is not
available. However, two useful surrogates are available: median household
income of each enrollee's zip code of residence and state-level Medicaid family
income eligibility thresholds for children. In this paper, we introduce a
customized approach, called \textit{MedMatch}, that builds on generalized
propensity score matching methods for estimating causal ERFs, adapting these
approaches to leverage our two surrogate variables to account for potential
confounding and/or effect modification by socioeconomic status. We conduct
extensive simulation studies, consistently demonstrating the strong performance
of \textit{MedMatch} relative to conventional approaches to handling the
surrogate variables. We apply \textit{MedMatch} to estimate the causal ERF
between long-term PM exposure and first respiratory hospitalization
among children in Medicaid from 2000 to 2012. We find a positive association,
with a steeper curve at PM g/m that levels off at higher
concentrations.Comment: 38 pages,5 figure
Estimating a Causal Exposure Response Function with a Continuous Error-Prone Exposure: A Study of Fine Particulate Matter and All-Cause Mortality
Numerous studies have examined the associations between long-term exposure to
fine particulate matter (PM2.5) and adverse health outcomes. Recently, many of
these studies have begun to employ high-resolution predicted PM2.5
concentrations, which are subject to measurement error. Previous approaches for
exposure measurement error correction have either been applied in non-causal
settings or have only considered a categorical exposure. Moreover, most
procedures have failed to account for uncertainty induced by error correction
when fitting an exposure-response function (ERF). To remedy these deficiencies,
we develop a multiple imputation framework that combines regression calibration
and Bayesian techniques to estimate a causal ERF. We demonstrate how the output
of the measurement error correction steps can be seamlessly integrated into a
Bayesian additive regression trees (BART) estimator of the causal ERF. We also
demonstrate how locally-weighted smoothing of the posterior samples from BART
can be used to create a more accurate ERF estimate. Our proposed approach also
properly propagates the exposure measurement error uncertainty to yield
accurate standard error estimates. We assess the robustness of our proposed
approach in an extensive simulation study. We then apply our methodology to
estimate the effects of PM2.5 on all-cause mortality among Medicare enrollees
in New England from 2000-2012
Severe flooding and cause-specific hospitalization in the United States
Flooding is one of the most disruptive and costliest climate-related
disasters and presents an escalating threat to population health due to climate
change and urbanization patterns. Previous studies have investigated the
consequences of flood exposures on only a handful of health outcomes and focus
on a single flood event or affected region. To address this gap, we conducted a
nationwide, multi-decade analysis of the impacts of severe floods on a wide
range of health outcomes in the United States by linking a novel
satellite-based high-resolution flood exposure database with Medicare
cause-specific hospitalization records over the period 2000- 2016. Using a
self-matched study design with a distributed lag model, we examined how
cause-specific hospitalization rates deviate from expected rates during and up
to four weeks after severe flood exposure. Our results revealed that risk of
hospitalization was consistently elevated during and for at least four weeks
following severe flood exposure for nervous system diseases (3.5 %; 95 %
confidence interval [CI]: 0.6 %, 6.4 %), skin and subcutaneous tissue diseases
(3.4 %; 95 % CI: 0.3 %, 6.7 %), and injury and poisoning (1.5 %; 95 % CI: -0.07
%, 3.2 %). Increases in hospitalization rate for these causes, musculoskeletal
system diseases, and mental health-related impacts varied based on proportion
of Black residents in each ZIP Code. Our findings demonstrate the need for
targeted preparedness strategies for hospital personnel before, during, and
after severe flooding
Impacts of Census Differential Privacy for Small-Area Disease Mapping to Monitor Health Inequities
The US Census Bureau will implement a new privacy-preserving disclosure
avoidance system (DAS), which includes application of differential privacy, on
publicly-released 2020 census data. There are concerns that the DAS may bias
small-area and demographically-stratified population counts, which play a
critical role in public health research, serving as denominators in estimation
of disease/mortality rates. Employing three DAS demonstration products, we
quantify errors attributable to reliance on DAS-protected denominators in
standard small-area disease mapping models for characterizing health
inequities. We conduct simulation studies and real data analyses of inequities
in premature mortality at the census tract level in Massachusetts and Georgia.
Results show that overall patterns of inequity by racialized group and economic
deprivation level are not compromised by the DAS. While early versions of DAS
induce errors in mortality rate estimation that are larger for Black than
non-Hispanic white populations in Massachusetts, this issue is ameliorated in
newer DAS versions
A common spatial factor analysis model for measured neighborhood-level characteristics: The Multi-Ethnic Study of Atherosclerosis
The purpose of this study was to reduce the dimensionality of a set of neighborhood-level variables collected on participants in the Multi-Ethnic Study of Atherosclerosis (MESA) while appropriately accounting for the spatial structure of the data. A common spatial factor analysis model in the Bayesian setting was utilized in order to properly characterize dependencies in the data. Results suggest that use of the spatial factor model can result in more precise estimation of factor scores, improved insight into the spatial patterns in the data, and the ability to more accurately assess associations between the neighborhood environment and health outcomes