10 research outputs found
Topics in Modeling of Multivariate Mixed Data Types and Highly Multivariate Spatial Data
In public health, surveillance constitutes systematic data collection to analyze, interpret and implement public policies. Notable examples of surveillance include periodic large health surveys (e.g. National Health and Nutrition Examination Survey) and environmental surveillance through measuring pollutants and meteorological data at multiple monitoring sites. With technological advancements, we can record multiple varieties of data at each time point or spatial location. Unfortunately, the existing statistical literature is limited to modeling such complex multivariate data due to either lack of generalizability, scalability, or computational efficiencies. This dissertation focuses on building global, scalable, and efficient methods to bridge those gaps in the literature. This work focuses explicitly on three contexts: (1) using semi-parametric Gaussian copulas to build joint models of multivariate mixed type of data (binary/ordinal/truncated/continuous) that can define mutually consistent regression models for any type of outcome, (2) develop a consistent and robust estimator of the ubiquitous measure of classification accuracy: Area Under the Curve (AUC) under complex survey designs and connect it to a latent R-square analogous to linear models, and (3) propose a class of "Graphical Gaussian Processes" that can efficiently model highly multivariate spatial data where tens or hundreds of variables are observed at each spatial location
Graphical Gaussian Process Models for Highly Multivariate Spatial Data
For multivariate spatial Gaussian process (GP) models, customary
specifications of cross-covariance functions do not exploit relational
inter-variable graphs to ensure process-level conditional independence among
the variables. This is undesirable, especially for highly multivariate
settings, where popular cross-covariance functions such as the multivariate
Mat\'ern suffer from a "curse of dimensionality" as the number of parameters
and floating point operations scale up in quadratic and cubic order,
respectively, in the number of variables. We propose a class of multivariate
"Graphical Gaussian Processes" using a general construction called "stitching"
that crafts cross-covariance functions from graphs and ensures process-level
conditional independence among variables. For the Mat\'ern family of functions,
stitching yields a multivariate GP whose univariate components are Mat\'ern
GPs, and conforms to process-level conditional independence as specified by the
graphical model. For highly multivariate settings and decomposable graphical
models, stitching offers massive computational gains and parameter dimension
reduction. We demonstrate the utility of the graphical Mat\'ern GP to jointly
model highly multivariate spatial data using simulation examples and an
application to air-pollution modelling
Covariance Estimation and Principal Component Analysis for Mixed-Type Functional Data with application to mHealth in Mood Disorders
Mobile digital health (mHealth) studies often collect multiple within-day
self-reported assessments of participants' behaviour and health. Indexed by
time of day, these assessments can be treated as functional observations of
continuous, truncated, ordinal, and binary type. We develop covariance
estimation and principal component analysis for mixed-type functional data like
that. We propose a semiparametric Gaussian copula model that assumes a
generalized latent non-paranormal process generating observed mixed-type
functional data and defining temporal dependence via a latent covariance. The
smooth estimate of latent covariance is constructed via Kendall's Tau bridging
method that incorporates smoothness within the bridging step. The approach is
then extended with methods for handling both dense and sparse sampling designs,
calculating subject-specific latent representations of observed data, latent
principal components and principal component scores. Importantly, the proposed
framework handles all four mixed types in a unified way. Simulation studies
show a competitive performance of the proposed method under both dense and
sparse sampling designs. The method is applied to data from 497 participants of
National Institute of Mental Health Family Study of the Mood Disorder Spectrum
to characterize the differences in within-day temporal patterns of mood in
individuals with the major mood disorder subtypes including Major Depressive
Disorder, and Type 1 and 2 Bipolar Disorder
Graph-constrained Analysis for Multivariate Functional Data
Functional Gaussian graphical models (GGM) used for analyzing multivariate
functional data customarily estimate an unknown graphical model representing
the conditional relationships between the functional variables. However, in
many applications of multivariate functional data, the graph is known and
existing functional GGM methods cannot preserve a given graphical constraint.
In this manuscript, we demonstrate how to conduct multivariate functional
analysis that exactly conforms to a given inter-variable graph. We first show
the equivalence between partially separable functional GGM and graphical
Gaussian processes (GP), proposed originally for constructing optimal
covariance functions for multivariate spatial data that retain the conditional
independence relations in a given graphical model. The theoretical connection
help design a new algorithm that leverages Dempster's covariance selection to
calculate the maximum likelihood estimate of the covariance function for
multivariate functional data under graphical constraints. We also show that the
finite term truncation of functional GGM basis expansion used in practice is
equivalent to a low-rank graphical GP, which is known to oversmooth marginal
distributions. To remedy this, we extend our algorithm to better preserve
marginal distributions while still respecting the graph and retaining
computational scalability. The insights obtained from the new results presented
in this manuscript will help practitioners better understand the relationship
between these graphical models and in deciding on the appropriate method for
their specific multivariate data analysis task. The benefits of the proposed
algorithms are illustrated using empirical experiments and an application to
functional modeling of neuroimaging data using the connectivity graph among
regions of the brain.Comment: 23 pages, 6 figure
Topics in Modeling of Multivariate Mixed Data Types and Highly Multivariate Spatial Data
In public health, surveillance constitutes systematic data collection to analyze, interpret and implement public policies. Notable examples of surveillance include periodic large health surveys (e.g. National Health and Nutrition Examination Survey) and environmental surveillance through measuring pollutants and meteorological data at multiple monitoring sites. With technological advancements, we can record multiple varieties of data at each time point or spatial location. Unfortunately, the existing statistical literature is limited to modeling such complex multivariate data due to either lack of generalizability, scalability, or computational efficiencies. This dissertation focuses on building global, scalable, and efficient methods to bridge those gaps in the literature. This work focuses explicitly on three contexts: (1) using semi-parametric Gaussian copulas to build joint models of multivariate mixed type of data (binary/ordinal/truncated/continuous) that can define mutually consistent regression models for any type of outcome, (2) develop a consistent and robust estimator of the ubiquitous measure of classification accuracy: Area Under the Curve (AUC) under complex survey designs and connect it to a latent R-square analogous to linear models, and (3) propose a class of "Graphical Gaussian Processes" that can efficiently model highly multivariate spatial data where tens or hundreds of variables are observed at each spatial location
Semiparametric Gaussian Copula Regression modeling for Mixed Data Types (SGCRM)
Many clinical and epidemiological studies encode collected participant-level
information via a collection of continuous, truncated, ordinal, and binary
variables. To gain novel insights in understanding complex interactions between
collected variables, there is a critical need for the development of flexible
frameworks for joint modeling of mixed data types variables. We propose
Semiparametric Gaussian Copula Regression modeling (SGCRM) that allows to model
a joint dependence structure between observed continuous, truncated, ordinal,
and binary variables and to construct conditional models with these four data
types as outcomes with a guarantee that derived conditional models are mutually
consistent. Semiparametric Gaussian Copula (SGC) mechanism assumes that
observed SGC variables are generated by - i) monotonically transforming
marginals of latent multivariate normal random variable and ii)
dichotimizing/truncating these transformed marginals. SGCRM estimates the
correlation matrix of the latent normal variables through an inversion of
"bridges" between Kendall's Tau rank correlations of observed mixed data type
variables and latent Gaussian correlations. We derive a novel bridging result
to deal with a general ordinal variable. In addition to the previously
established asymptotic consistency, we establish asymptotic normality of the
latent correlation estimators. We also establish the asymptotic normality of
SGCRM regression estimators and provide a computationally efficient way to
calculate asymptotic covariances. We propose computationally efficient methods
to predict SGC latent variables and to do imputation of missing data. Using
National Health and Nutrition Examination Survey (NHANES), we illustrate SGCRM
and compare it with the traditional conditional regression models including
truncated Gaussian regression, ordinal probit, and probit models.Comment: 35 pages, 6 figures, 6 table
#MeToo and Google Inquiries Into Sexual Violence: A Hashtag Campaign Can Sustain Information Seeking
Specificity of affective dynamics of bipolar and major depressive disorder
OBJECTIVE: Here, we examine whether the dynamics of the four dimensions of the circumplex model of affect assessed by ecological momentary assessment (EMA) differ among those with bipolar disorder (BD) and major depressive disorder (MDD). METHODS: Participants aged 11-85 years (n = 362) reported momentary sad, anxious, active, and energetic dimensional states four times per day for 2 weeks. Individuals with lifetime mood disorder subtypes of bipolar-I, bipolar-II, and MDD derived from a semistructured clinical interview were compared to each other and to controls without a lifetime history of psychiatric disorders. Random effects from individual means, inertias, innovation (residual) variances, and cross-lags across the four affective dimensions simultaneously were derived from multivariate dynamic structural equation models. RESULTS: All mood disorder subtypes were associated with higher levels of sad and anxious mood and lower energy than controls. Those with bipolar-I had lower average activation, and lower energy that was independent of activation, compared to MDD or controls. However, increases in activation were more likely to perpetuate in those with bipolar-I. Bipolar-II was characterized by higher lability of sad and anxious mood compared to bipolar-I and controls but not MDD. Compared to BD and controls, those with MDD exhibited cross-augmentation of sadness and anxiety, and sadness blunted energy. CONCLUSION: Bipolar-I is more strongly characterized by activation and energy than sad and anxious mood. This distinction has potential implications for both specificity of intervention targets and differential pathways underlying these dynamic affective systems. Confirmation of the longer term stability and generalizability of these findings in future studies is necessary
Objectively assessed sleep and physical activity in depression subtypes and its mediating role in their association with cardiovascular risk factors
The aims of this study were to investigate the associations of major depressive disorder (MDD) and its subtypes (atypical, melancholic, combined, unspecified) with actigraphy-derived measures of sleep, physical activity and circadian rhythms; and test the potentially mediating role of sleep, physical activity and circadian rhythms in the well-established associations of the atypical MDD subtype with Body Mass Index (BMI) and the metabolic syndrome (MeS). The sample consisted of 2317 participants recruited from an urban area, who underwent comprehensive somatic and psychiatric evaluations. MDD and its subtypes were assessed via semi-structured diagnostic interviews. Sleep, physical activity and circadian rhythms were measured using actigraphy. MDD and its subtypes were associated with several actigraphy-derived variables, including later sleep midpoint, low physical activity, low inter-daily stability and larger intra-individual variability of sleep duration and relative amplitude. Sleep midpoint and physical activity fulfilled criteria for partial mediation of the association between atypical MDD and BMI, and physical activity also for partial mediation of the association between atypical MDD and MeS. Our findings confirm associations of MDD and its atypical subtype with sleep and physical activity, which are likely to partially mediate the associations of atypical MDD with BMI and MeS, although most of these associations are not explained by sleep and activity variables. This highlights the need to consider atypical MDD, sleep and sedentary behavior as cardiovascular risk factors