Search CORE

162 research outputs found

Multiple Imputation Using Gaussian Copulas

Author: Bojinov Iavor
Hollenbach Florian M.
Metternich Nils W.
Minhas Shahryar
Minhas Shahryar
Volfovsky Alexander
Ward Michael D.
Publication venue
Publication date: 01/01/2018
Field of study

Missing observations are pervasive throughout empirical research, especially in the social sciences. Despite multiple approaches to dealing adequately with missing data, many scholars still fail to address this vital issue. In this paper, we present a simple-to-use method for generating multiple imputations using a Gaussian copula. The Gaussian copula for multiple imputation (Hoff, 2007) allows scholars to attain estimation results that have good coverage and small bias. The use of copulas to model the dependence among variables will enable researchers to construct valid joint distributions of the data, even without knowledge of the actual underlying marginal distributions. Multiple imputations are then generated by drawing observations from the resulting posterior joint distribution and replacing the missing values. Using simulated and observational data from published social science research, we compare imputation via Gaussian copulas with two other widely used imputation methods: MICE and Amelia II. Our results suggest that the Gaussian copula approach has a slightly smaller bias, higher coverage rates, and narrower confidence intervals compared to the other methods. This is especially true when the variables with missing data are not normally distributed. These results, combined with theoretical guarantees and ease-of-use suggest that the approach examined provides an attractive alternative for applied researchers undertaking multiple imputations

arXiv.org e-Print Archive

UCL Discovery

Prognostic Predictive Model to Estimate the Risk of Multiple Chronic Diseases: Constructing Copulas Using Electronic Medical Record Data

Author: Black Jason E
Publication venue: Scholarship@Western
Publication date: 23/11/2018
Field of study

Introduction: Multimorbidity, the presence of two or more chronic diseases in an individual, is a pressing medical condition. Novel prevention methods are required to reduce the incidence of multimorbidity. Prognostic predictive models estimate a patient’s risk of developing chronic disease. This thesis developed a single predictive model for three diseases associated with multimorbidity: diabetes, hypertension, and osteoarthritis. Methods: Univariate logistic regression models were constructed, followed by an analysis of the dependence that existed using copulas. All analyses were based on data from the Canadian Primary Care Sentinel Surveillance Network. Results: All univariate models were highly predictive, as demonstrated by their discrimination and calibration. Copula models revealed the dependence between each disease pair. Discussion: By estimating the risk of multiple chronic diseases, prognostic predictive models may enable the prevention of chronic disease through identification of high-risk individuals or delivery of individualized risk assessments to inform patient and health care provider decision-making

Scholarship@Western

A Copula-Based Method for Analyzing Bivariate Binary Longitudinal Data

Author: Baek Seunghee
Publication venue: ScholarlyCommons
Publication date: 01/01/2010
Field of study

The work presented as part of this dissertation is primarily motivated by a randomized trial for HIV serodiscordant couples. Specifically, the Multisite HIV/STD Prevention Trial for African American Couples is a behavioral modification trial for African American, heterosexual, HIV discordant couples. In this trial, investigators developed and evaluated a couple-based behavioral intervention for reducing risky shared sexual behaviors and collected retrospective outcomes from both partners at baseline and at 3 follow-ups to evaluate the intervention efficacy. As the outcomes refer to the couples\u27 shared sexual behavior, couples\u27 responses are expected to be correlated, and modeling approaches should account for multiple sources of correlation: within-individual over time as well as within-couple both at the same measurement time and at different times. This dissertation details the novel application copulas to modeling dyadic, longitudinal binary data to estimate reliability and efficacy. Copulas have long been analytic tools for modeling multivariate outcomes in other settings. Particularly, we selected a mixture of max-infinitely divisible (max-id) copula because it has a number of attractive analytic features. The dissertation is arranged as follows: Chapter 2 presents a copula-based approach in estimating the reliability of couple self-reported (baseline) outcomes, adjusting for key couple-level baseline covariates; Chapter 3 presents an extension of the max-id copula to model longitudinal (two measurement occasions), binary couples data; Chapter 4 further extends the copula-based model to accommodate more than two repeated measures in a different application examining two clinical depression measures. In this application, we are interested in estimating whether there are differential treatment effects on two different measures of depression, longitudinally. The copula-based modeling approach presented in this dissertation provides a useful tool for investigating complex dependence structures among multivariate outcomes as well as examining covariate effects on the marginal distribution for each outcome. The application of existing statistical methodology to longitudinal, dyad-based trials is an important translational advancement. The methods presented here are easily applied to other studies that involve multivariate outcomes measured repeatedly

ScholarlyCommons@Penn

Recommended from our members

Appropriate, accessible and appealing probabilistic graphical models

Author: Inouye David Iseri
Publication venue
Publication date: 13/12/2017
Field of study

Appropriate - Many multivariate probabilistic models either use independent distributions or dependent Gaussian distributions. Yet, many real-world datasets contain count-valued or non-negative skewed data, e.g. bag-of-words text data and biological sequencing data. Thus, we develop novel probabilistic graphical models for use on count-valued and non-negative data including Poisson graphical models and multinomial graphical models. We develop one generalization that allows for triple-wise or k-wise graphical models going beyond the normal pairwise formulation. Furthermore, we also explore Gaussian-copula graphical models and derive closed-form solutions for the conditional distributions and marginal distributions (both before and after conditioning). Finally, we derive mixture and admixture, or topic model, generalizations of these graphical models to introduce more power and interpretability. Accessible - Previous multivariate models, especially related to text data, often have complex dependencies without a closed form and require complex inference algorithms that have limited theoretical justification. For example, hierarchical Bayesian models often require marginalizing over many latent variables. We show that our novel graphical models (even the k-wise interaction models) have simple and intuitive estimation procedures based on node-wise regressions that likely have similar theoretical guarantees as previous work in graphical models. For the copula-based graphical models, we show that simple approximations could still provide useful models; these copula models also come with closed-form conditional and marginal distributions, which make them amenable to exploratory inspection and manipulation. The parameters of these models are easy to interpret and thus may be accessible to a wide audience. Appealing - High-level visualization and interpretation of graphical models with even 100 variables has often been difficult even for a graphical model expert---despite visualization being one of the original motivators for graphical models. This difficulty is likely due to the lack of collaboration between graphical model experts and visualization experts. To begin bridging this gap, we develop a novel "what if?" interaction that manipulates and leverages the probabilistic power of graphical models. Our approach defines: the probabilistic mechanism via conditional probability; the query language to map text input to a conditional probability query; and the formal underlying probabilistic model. We then propose to visualize these query-specific probabilistic graphical models by combining the intuitiveness of force-directed layouts with the beauty and readability of word clouds, which pack many words into valuable screen space while ensuring words do not overlap via pixel-level collision detection. Although both the force-directed layout and the pixel-level packing problems are challenging in their own right, we approximate both simultaneously via adaptive simulated annealing starting from careful initialization. For visualizing mixture distributions, we also design a meaningful mapping from the properties of the mixture distribution to a color in the perceptually uniform CIELUV color space. Finally, we demonstrate our approach via illustrative visualizations of several real-world datasets.Computer Science

Texas ScholarWorks

ISBIS 2016: Meeting on Statistics in Business and Industry

Author: Banks David
Mahmoudvand Rahim
Oliveira Amilcar
Oliveira Teresa
Ravishankar Nalini
Publication venue: Universidade Aberta
Publication date: 10/11/2016
Field of study

This Book includes the abstracts of the talks presented at the 2016 International Symposium on Business and Industrial Statistics, held at Barcelona, June 8-10, 2016, hosted at the Universitat Politècnica de Catalunya - Barcelona TECH, by the Department of Statistics and Operations Research. The location of the meeting was at ETSEIB Building (Escola Tecnica Superior d'Enginyeria Industrial) at Avda Diagonal 647. The meeting organizers celebrated the continued success of ISBIS and ENBIS society, and the meeting draw together the international community of statisticians, both academics and industry professionals, who share the goal of making statistics the foundation for decision making in business and related applications. The Scientific Program Committee was constituted by: David Banks, Duke University Amílcar Oliveira, DCeT - Universidade Aberta and CEAUL Teresa A. Oliveira, DCeT - Universidade Aberta and CEAUL Nalini Ravishankar, University of Connecticut Xavier Tort Martorell, Universitat Politécnica de Catalunya, Barcelona TECH Martina Vandebroek, KU Leuven Vincenzo Esposito Vinzi, ESSEC Business Schoo

Repositório Aberto da Universidade Aberta

CLADAG 2021 BOOK OF ABSTRACTS AND SHORT PAPERS

Author
Publication venue: 'Firenze University Press'
Publication date: 31/05/2022
Field of study

The book collects the short papers presented at the 13th Scientific Meeting of the Classification and Data Analysis Group (CLADAG) of the Italian Statistical Society (SIS). The meeting has been organized by the Department of Statistics, Computer Science and Applications of the University of Florence, under the auspices of the Italian Statistical Society and the International Federation of Classification Societies (IFCS). CLADAG is a member of the IFCS, a federation of national, regional, and linguistically-based classification societies. It is a non-profit, non-political scientific organization, whose aims are to further classification research

Directory of Open Access Books (DOAB)

Contributions to behavioural freight transport modelling

Author: Irannezhad Elnaz
Publication venue: 'University of Queensland Library'
Publication date: 16/11/2018
Field of study

University of Queensland eSpace

A review of probabilistic forecasting and prediction with machine learning

Author: Papacharalampous Georgia
Tyralis Hristos
Publication venue
Publication date: 17/09/2022
Field of study

Predictions and forecasts of machine learning models should take the form of probability distributions, aiming to increase the quantity of information communicated to end users. Although applications of probabilistic prediction and forecasting with machine learning models in academia and industry are becoming more frequent, related concepts and methods have not been formalized and structured under a holistic view of the entire field. Here, we review the topic of predictive uncertainty estimation with machine learning algorithms, as well as the related metrics (consistent scoring functions and proper scoring rules) for assessing probabilistic predictions. The review covers a time period spanning from the introduction of early statistical (linear regression and time series models, based on Bayesian statistics or quantile regression) to recent machine learning algorithms (including generalized additive models for location, scale and shape, random forests, boosting and deep learning algorithms) that are more flexible by nature. The review of the progress in the field, expedites our understanding on how to develop new algorithms tailored to users' needs, since the latest advancements are based on some fundamental concepts applied to more complex algorithms. We conclude by classifying the material and discussing challenges that are becoming a hot topic of research.Comment: 83 pages, 5 figure

arXiv.org e-Print Archive