10 research outputs found

    Topics in Modeling of Multivariate Mixed Data Types and Highly Multivariate Spatial Data

    Get PDF
    In public health, surveillance constitutes systematic data collection to analyze, interpret and implement public policies. Notable examples of surveillance include periodic large health surveys (e.g. National Health and Nutrition Examination Survey) and environmental surveillance through measuring pollutants and meteorological data at multiple monitoring sites. With technological advancements, we can record multiple varieties of data at each time point or spatial location. Unfortunately, the existing statistical literature is limited to modeling such complex multivariate data due to either lack of generalizability, scalability, or computational efficiencies. This dissertation focuses on building global, scalable, and efficient methods to bridge those gaps in the literature. This work focuses explicitly on three contexts: (1) using semi-parametric Gaussian copulas to build joint models of multivariate mixed type of data (binary/ordinal/truncated/continuous) that can define mutually consistent regression models for any type of outcome, (2) develop a consistent and robust estimator of the ubiquitous measure of classification accuracy: Area Under the Curve (AUC) under complex survey designs and connect it to a latent R-square analogous to linear models, and (3) propose a class of "Graphical Gaussian Processes" that can efficiently model highly multivariate spatial data where tens or hundreds of variables are observed at each spatial location

    Graphical Gaussian Process Models for Highly Multivariate Spatial Data

    Full text link
    For multivariate spatial Gaussian process (GP) models, customary specifications of cross-covariance functions do not exploit relational inter-variable graphs to ensure process-level conditional independence among the variables. This is undesirable, especially for highly multivariate settings, where popular cross-covariance functions such as the multivariate Mat\'ern suffer from a "curse of dimensionality" as the number of parameters and floating point operations scale up in quadratic and cubic order, respectively, in the number of variables. We propose a class of multivariate "Graphical Gaussian Processes" using a general construction called "stitching" that crafts cross-covariance functions from graphs and ensures process-level conditional independence among variables. For the Mat\'ern family of functions, stitching yields a multivariate GP whose univariate components are Mat\'ern GPs, and conforms to process-level conditional independence as specified by the graphical model. For highly multivariate settings and decomposable graphical models, stitching offers massive computational gains and parameter dimension reduction. We demonstrate the utility of the graphical Mat\'ern GP to jointly model highly multivariate spatial data using simulation examples and an application to air-pollution modelling

    Covariance Estimation and Principal Component Analysis for Mixed-Type Functional Data with application to mHealth in Mood Disorders

    Full text link
    Mobile digital health (mHealth) studies often collect multiple within-day self-reported assessments of participants' behaviour and health. Indexed by time of day, these assessments can be treated as functional observations of continuous, truncated, ordinal, and binary type. We develop covariance estimation and principal component analysis for mixed-type functional data like that. We propose a semiparametric Gaussian copula model that assumes a generalized latent non-paranormal process generating observed mixed-type functional data and defining temporal dependence via a latent covariance. The smooth estimate of latent covariance is constructed via Kendall's Tau bridging method that incorporates smoothness within the bridging step. The approach is then extended with methods for handling both dense and sparse sampling designs, calculating subject-specific latent representations of observed data, latent principal components and principal component scores. Importantly, the proposed framework handles all four mixed types in a unified way. Simulation studies show a competitive performance of the proposed method under both dense and sparse sampling designs. The method is applied to data from 497 participants of National Institute of Mental Health Family Study of the Mood Disorder Spectrum to characterize the differences in within-day temporal patterns of mood in individuals with the major mood disorder subtypes including Major Depressive Disorder, and Type 1 and 2 Bipolar Disorder

    Graph-constrained Analysis for Multivariate Functional Data

    Full text link
    Functional Gaussian graphical models (GGM) used for analyzing multivariate functional data customarily estimate an unknown graphical model representing the conditional relationships between the functional variables. However, in many applications of multivariate functional data, the graph is known and existing functional GGM methods cannot preserve a given graphical constraint. In this manuscript, we demonstrate how to conduct multivariate functional analysis that exactly conforms to a given inter-variable graph. We first show the equivalence between partially separable functional GGM and graphical Gaussian processes (GP), proposed originally for constructing optimal covariance functions for multivariate spatial data that retain the conditional independence relations in a given graphical model. The theoretical connection help design a new algorithm that leverages Dempster's covariance selection to calculate the maximum likelihood estimate of the covariance function for multivariate functional data under graphical constraints. We also show that the finite term truncation of functional GGM basis expansion used in practice is equivalent to a low-rank graphical GP, which is known to oversmooth marginal distributions. To remedy this, we extend our algorithm to better preserve marginal distributions while still respecting the graph and retaining computational scalability. The insights obtained from the new results presented in this manuscript will help practitioners better understand the relationship between these graphical models and in deciding on the appropriate method for their specific multivariate data analysis task. The benefits of the proposed algorithms are illustrated using empirical experiments and an application to functional modeling of neuroimaging data using the connectivity graph among regions of the brain.Comment: 23 pages, 6 figure

    Topics in Modeling of Multivariate Mixed Data Types and Highly Multivariate Spatial Data

    No full text
    In public health, surveillance constitutes systematic data collection to analyze, interpret and implement public policies. Notable examples of surveillance include periodic large health surveys (e.g. National Health and Nutrition Examination Survey) and environmental surveillance through measuring pollutants and meteorological data at multiple monitoring sites. With technological advancements, we can record multiple varieties of data at each time point or spatial location. Unfortunately, the existing statistical literature is limited to modeling such complex multivariate data due to either lack of generalizability, scalability, or computational efficiencies. This dissertation focuses on building global, scalable, and efficient methods to bridge those gaps in the literature. This work focuses explicitly on three contexts: (1) using semi-parametric Gaussian copulas to build joint models of multivariate mixed type of data (binary/ordinal/truncated/continuous) that can define mutually consistent regression models for any type of outcome, (2) develop a consistent and robust estimator of the ubiquitous measure of classification accuracy: Area Under the Curve (AUC) under complex survey designs and connect it to a latent R-square analogous to linear models, and (3) propose a class of "Graphical Gaussian Processes" that can efficiently model highly multivariate spatial data where tens or hundreds of variables are observed at each spatial location

    Semiparametric Gaussian Copula Regression modeling for Mixed Data Types (SGCRM)

    Full text link
    Many clinical and epidemiological studies encode collected participant-level information via a collection of continuous, truncated, ordinal, and binary variables. To gain novel insights in understanding complex interactions between collected variables, there is a critical need for the development of flexible frameworks for joint modeling of mixed data types variables. We propose Semiparametric Gaussian Copula Regression modeling (SGCRM) that allows to model a joint dependence structure between observed continuous, truncated, ordinal, and binary variables and to construct conditional models with these four data types as outcomes with a guarantee that derived conditional models are mutually consistent. Semiparametric Gaussian Copula (SGC) mechanism assumes that observed SGC variables are generated by - i) monotonically transforming marginals of latent multivariate normal random variable and ii) dichotimizing/truncating these transformed marginals. SGCRM estimates the correlation matrix of the latent normal variables through an inversion of "bridges" between Kendall's Tau rank correlations of observed mixed data type variables and latent Gaussian correlations. We derive a novel bridging result to deal with a general ordinal variable. In addition to the previously established asymptotic consistency, we establish asymptotic normality of the latent correlation estimators. We also establish the asymptotic normality of SGCRM regression estimators and provide a computationally efficient way to calculate asymptotic covariances. We propose computationally efficient methods to predict SGC latent variables and to do imputation of missing data. Using National Health and Nutrition Examination Survey (NHANES), we illustrate SGCRM and compare it with the traditional conditional regression models including truncated Gaussian regression, ordinal probit, and probit models.Comment: 35 pages, 6 figures, 6 table

    Specificity of affective dynamics of bipolar and major depressive disorder

    No full text
    OBJECTIVE: Here, we examine whether the dynamics of the four dimensions of the circumplex model of affect assessed by ecological momentary assessment (EMA) differ among those with bipolar disorder (BD) and major depressive disorder (MDD). METHODS: Participants aged 11-85 years (n = 362) reported momentary sad, anxious, active, and energetic dimensional states four times per day for 2 weeks. Individuals with lifetime mood disorder subtypes of bipolar-I, bipolar-II, and MDD derived from a semistructured clinical interview were compared to each other and to controls without a lifetime history of psychiatric disorders. Random effects from individual means, inertias, innovation (residual) variances, and cross-lags across the four affective dimensions simultaneously were derived from multivariate dynamic structural equation models. RESULTS: All mood disorder subtypes were associated with higher levels of sad and anxious mood and lower energy than controls. Those with bipolar-I had lower average activation, and lower energy that was independent of activation, compared to MDD or controls. However, increases in activation were more likely to perpetuate in those with bipolar-I. Bipolar-II was characterized by higher lability of sad and anxious mood compared to bipolar-I and controls but not MDD. Compared to BD and controls, those with MDD exhibited cross-augmentation of sadness and anxiety, and sadness blunted energy. CONCLUSION: Bipolar-I is more strongly characterized by activation and energy than sad and anxious mood. This distinction has potential implications for both specificity of intervention targets and differential pathways underlying these dynamic affective systems. Confirmation of the longer term stability and generalizability of these findings in future studies is necessary

    Objectively assessed sleep and physical activity in depression subtypes and its mediating role in their association with cardiovascular risk factors

    No full text
    The aims of this study were to investigate the associations of major depressive disorder (MDD) and its subtypes (atypical, melancholic, combined, unspecified) with actigraphy-derived measures of sleep, physical activity and circadian rhythms; and test the potentially mediating role of sleep, physical activity and circadian rhythms in the well-established associations of the atypical MDD subtype with Body Mass Index (BMI) and the metabolic syndrome (MeS). The sample consisted of 2317 participants recruited from an urban area, who underwent comprehensive somatic and psychiatric evaluations. MDD and its subtypes were assessed via semi-structured diagnostic interviews. Sleep, physical activity and circadian rhythms were measured using actigraphy. MDD and its subtypes were associated with several actigraphy-derived variables, including later sleep midpoint, low physical activity, low inter-daily stability and larger intra-individual variability of sleep duration and relative amplitude. Sleep midpoint and physical activity fulfilled criteria for partial mediation of the association between atypical MDD and BMI, and physical activity also for partial mediation of the association between atypical MDD and MeS. Our findings confirm associations of MDD and its atypical subtype with sleep and physical activity, which are likely to partially mediate the associations of atypical MDD with BMI and MeS, although most of these associations are not explained by sleep and activity variables. This highlights the need to consider atypical MDD, sleep and sedentary behavior as cardiovascular risk factors
    corecore