Search CORE

10 research outputs found

Topics in Modeling of Multivariate Mixed Data Types and Highly Multivariate Spatial Data

Author: Dey Debangan
Publication venue: 'The Busan Gyeongnam Mathematical Society'
Publication date: 25/07/2022
Field of study

In public health, surveillance constitutes systematic data collection to analyze, interpret and implement public policies. Notable examples of surveillance include periodic large health surveys (e.g. National Health and Nutrition Examination Survey) and environmental surveillance through measuring pollutants and meteorological data at multiple monitoring sites. With technological advancements, we can record multiple varieties of data at each time point or spatial location. Unfortunately, the existing statistical literature is limited to modeling such complex multivariate data due to either lack of generalizability, scalability, or computational efficiencies. This dissertation focuses on building global, scalable, and efficient methods to bridge those gaps in the literature. This work focuses explicitly on three contexts: (1) using semi-parametric Gaussian copulas to build joint models of multivariate mixed type of data (binary/ordinal/truncated/continuous) that can define mutually consistent regression models for any type of outcome, (2) develop a consistent and robust estimator of the ubiquitous measure of classification accuracy: Area Under the Curve (AUC) under complex survey designs and connect it to a latent R-square analogous to linear models, and (3) propose a class of "Graphical Gaussian Processes" that can efficiently model highly multivariate spatial data where tens or hundreds of variables are observed at each spatial location

Johns Hopkins University

JScholarship

Graphical Gaussian Process Models for Highly Multivariate Spatial Data

Author: Banerjee Sudipto
Datta Abhirup
Dey Debangan
Publication venue
Publication date: 18/11/2021
Field of study

For multivariate spatial Gaussian process (GP) models, customary specifications of cross-covariance functions do not exploit relational inter-variable graphs to ensure process-level conditional independence among the variables. This is undesirable, especially for highly multivariate settings, where popular cross-covariance functions such as the multivariate Mat\'ern suffer from a "curse of dimensionality" as the number of parameters and floating point operations scale up in quadratic and cubic order, respectively, in the number of variables. We propose a class of multivariate "Graphical Gaussian Processes" using a general construction called "stitching" that crafts cross-covariance functions from graphs and ensures process-level conditional independence among variables. For the Mat\'ern family of functions, stitching yields a multivariate GP whose univariate components are Mat\'ern GPs, and conforms to process-level conditional independence as specified by the graphical model. For highly multivariate settings and decomposable graphical models, stitching offers massive computational gains and parameter dimension reduction. We demonstrate the utility of the graphical Mat\'ern GP to jointly model highly multivariate spatial data using simulation examples and an application to air-pollution modelling

arXiv.org e-Print Archive

eScholarship - University of California

Covariance Estimation and Principal Component Analysis for Mixed-Type Functional Data with application to mHealth in Mood Disorders

Author: Dey Debangan
Ghosal Rahul
Merikangas Kathleen
Zipunnikov Vadim
Publication venue
Publication date: 28/06/2023
Field of study

Mobile digital health (mHealth) studies often collect multiple within-day self-reported assessments of participants' behaviour and health. Indexed by time of day, these assessments can be treated as functional observations of continuous, truncated, ordinal, and binary type. We develop covariance estimation and principal component analysis for mixed-type functional data like that. We propose a semiparametric Gaussian copula model that assumes a generalized latent non-paranormal process generating observed mixed-type functional data and defining temporal dependence via a latent covariance. The smooth estimate of latent covariance is constructed via Kendall's Tau bridging method that incorporates smoothness within the bridging step. The approach is then extended with methods for handling both dense and sparse sampling designs, calculating subject-specific latent representations of observed data, latent principal components and principal component scores. Importantly, the proposed framework handles all four mixed types in a unified way. Simulation studies show a competitive performance of the proposed method under both dense and sparse sampling designs. The method is applied to data from 497 participants of National Institute of Mental Health Family Study of the Mood Disorder Spectrum to characterize the differences in within-day temporal patterns of mood in individuals with the major mood disorder subtypes including Major Depressive Disorder, and Type 1 and 2 Bipolar Disorder

arXiv.org e-Print Archive

Graph-constrained Analysis for Multivariate Functional Data

Author: Banerjee Sudipto
Datta Abhirup
Dey Debangan
Lindquist Martin
Publication venue
Publication date: 14/08/2023
Field of study

Functional Gaussian graphical models (GGM) used for analyzing multivariate functional data customarily estimate an unknown graphical model representing the conditional relationships between the functional variables. However, in many applications of multivariate functional data, the graph is known and existing functional GGM methods cannot preserve a given graphical constraint. In this manuscript, we demonstrate how to conduct multivariate functional analysis that exactly conforms to a given inter-variable graph. We first show the equivalence between partially separable functional GGM and graphical Gaussian processes (GP), proposed originally for constructing optimal covariance functions for multivariate spatial data that retain the conditional independence relations in a given graphical model. The theoretical connection help design a new algorithm that leverages Dempster's covariance selection to calculate the maximum likelihood estimate of the covariance function for multivariate functional data under graphical constraints. We also show that the finite term truncation of functional GGM basis expansion used in practice is equivalent to a low-rank graphical GP, which is known to oversmooth marginal distributions. To remedy this, we extend our algorithm to better preserve marginal distributions while still respecting the graph and retaining computational scalability. The insights obtained from the new results presented in this manuscript will help practitioners better understand the relationship between these graphical models and in deciding on the appropriate method for their specific multivariate data analysis task. The benefits of the proposed algorithms are illustrated using empirical experiments and an application to functional modeling of neuroimaging data using the connectivity graph among regions of the brain.Comment: 23 pages, 6 figure

arXiv.org e-Print Archive

Graphical Gaussian Process Models for Highly Multivariate Spatial Data.

Author: Dey Debangan,
Publication venue
Publication date: 07/02/2023
Field of study

Ezid

Topics in Modeling of Multivariate Mixed Data Types and Highly Multivariate Spatial Data

Author: Dey Debangan
Publication venue: 'The Busan Gyeongnam Mathematical Society'
Publication date: 25/07/2022
Field of study

Johns Hopkins University

Semiparametric Gaussian Copula Regression modeling for Mixed Data Types (SGCRM)

Author: Dey Debangan
Zipunnikov Vadim
Publication venue
Publication date: 13/05/2022
Field of study

Many clinical and epidemiological studies encode collected participant-level information via a collection of continuous, truncated, ordinal, and binary variables. To gain novel insights in understanding complex interactions between collected variables, there is a critical need for the development of flexible frameworks for joint modeling of mixed data types variables. We propose Semiparametric Gaussian Copula Regression modeling (SGCRM) that allows to model a joint dependence structure between observed continuous, truncated, ordinal, and binary variables and to construct conditional models with these four data types as outcomes with a guarantee that derived conditional models are mutually consistent. Semiparametric Gaussian Copula (SGC) mechanism assumes that observed SGC variables are generated by - i) monotonically transforming marginals of latent multivariate normal random variable and ii) dichotimizing/truncating these transformed marginals. SGCRM estimates the correlation matrix of the latent normal variables through an inversion of "bridges" between Kendall's Tau rank correlations of observed mixed data type variables and latent Gaussian correlations. We derive a novel bridging result to deal with a general ordinal variable. In addition to the previously established asymptotic consistency, we establish asymptotic normality of the latent correlation estimators. We also establish the asymptotic normality of SGCRM regression estimators and provide a computationally efficient way to calculate asymptotic covariances. We propose computationally efficient methods to predict SGC latent variables and to do imputation of missing data. Using National Health and Nutrition Examination Survey (NHANES), we illustrate SGCRM and compare it with the traditional conditional regression models including truncated Gaussian regression, ordinal probit, and probit models.Comment: 35 pages, 6 figures, 6 table

arXiv.org e-Print Archive

#MeToo and Google Inquiries Into Sexual Violence: A Hashtag Campaign Can Sustain Information Seeking

Author: Ayers J. W.
Ciprian Crainiceanu
Debangan Dey
Mark Dredze
Michelle R. Kaufman
Publication venue: 'SAGE Publications'
Publication date
Field of study

Crossref

Specificity of affective dynamics of bipolar and major depressive disorder

Author: Cui Lihong
Dey Debangan
Husky Mathilde M
Leroux Andrew
Merikangas Kathleen R
Stapp Emma K
Zipunnikov Vadim
Publication venue: Health Sciences Research Commons
Publication date: 13/08/2023
Field of study

OBJECTIVE: Here, we examine whether the dynamics of the four dimensions of the circumplex model of affect assessed by ecological momentary assessment (EMA) differ among those with bipolar disorder (BD) and major depressive disorder (MDD). METHODS: Participants aged 11-85 years (n = 362) reported momentary sad, anxious, active, and energetic dimensional states four times per day for 2 weeks. Individuals with lifetime mood disorder subtypes of bipolar-I, bipolar-II, and MDD derived from a semistructured clinical interview were compared to each other and to controls without a lifetime history of psychiatric disorders. Random effects from individual means, inertias, innovation (residual) variances, and cross-lags across the four affective dimensions simultaneously were derived from multivariate dynamic structural equation models. RESULTS: All mood disorder subtypes were associated with higher levels of sad and anxious mood and lower energy than controls. Those with bipolar-I had lower average activation, and lower energy that was independent of activation, compared to MDD or controls. However, increases in activation were more likely to perpetuate in those with bipolar-I. Bipolar-II was characterized by higher lability of sad and anxious mood compared to bipolar-I and controls but not MDD. Compared to BD and controls, those with MDD exhibited cross-augmentation of sadness and anxiety, and sadness blunted energy. CONCLUSION: Bipolar-I is more strongly characterized by activation and energy than sad and anxious mood. This distinction has potential implications for both specificity of intervention targets and differential pathways underlying these dynamic affective systems. Confirmation of the longer term stability and generalizability of these findings in future studies is necessary

George Washington University: Health Sciences Research Commons (HSRC)

Objectively assessed sleep and physical activity in depression subtypes and its mediating role in their association with cardiovascular risk factors

Author: Dey Debangan
Glaus Jennifer
Guo Wei
Kang Sun Jung
Lamers Femke
Leroux Andrew
Merikangas Kathleen R.
Plessen Kerstin J.
Preisig Martin
Strippoli Marie-Pierre F.
Vaucher Julien
Vollenweider Peter
Zipunnikov Vadim
Publication venue: 'Elsevier BV'
Publication date: 01/07/2023
Field of study

The aims of this study were to investigate the associations of major depressive disorder (MDD) and its subtypes (atypical, melancholic, combined, unspecified) with actigraphy-derived measures of sleep, physical activity and circadian rhythms; and test the potentially mediating role of sleep, physical activity and circadian rhythms in the well-established associations of the atypical MDD subtype with Body Mass Index (BMI) and the metabolic syndrome (MeS). The sample consisted of 2317 participants recruited from an urban area, who underwent comprehensive somatic and psychiatric evaluations. MDD and its subtypes were assessed via semi-structured diagnostic interviews. Sleep, physical activity and circadian rhythms were measured using actigraphy. MDD and its subtypes were associated with several actigraphy-derived variables, including later sleep midpoint, low physical activity, low inter-daily stability and larger intra-individual variability of sleep duration and relative amplitude. Sleep midpoint and physical activity fulfilled criteria for partial mediation of the association between atypical MDD and BMI, and physical activity also for partial mediation of the association between atypical MDD and MeS. Our findings confirm associations of MDD and its atypical subtype with sleep and physical activity, which are likely to partially mediate the associations of atypical MDD with BMI and MeS, although most of these associations are not explained by sleep and activity variables. This highlights the need to consider atypical MDD, sleep and sedentary behavior as cardiovascular risk factors

Serveur académique lausannois