Search CORE

2,072 research outputs found

An Empirical Comparison of Multiple Imputation Methods for Categorical Data

Author: Akande Olanrewaju
Li Fan
Reiter Jerome
Publication venue: 'Informa UK Limited'
Publication date: 22/12/2016
Field of study

Multiple imputation is a common approach for dealing with missing values in statistical databases. The imputer fills in missing values with draws from predictive models estimated from the observed data, resulting in multiple, completed versions of the database. Researchers have developed a variety of default routines to implement multiple imputation; however, there has been limited research comparing the performance of these methods, particularly for categorical data. We use simulation studies to compare repeated sampling properties of three default multiple imputation methods for categorical data, including chained equations using generalized linear models, chained equations using classification and regression trees, and a fully Bayesian joint distribution based on Dirichlet Process mixture models. We base the simulations on categorical data from the American Community Survey. In the circumstances of this study, the results suggest that default chained equations approaches based on generalized linear models are dominated by the default regression tree and Bayesian mixture model approaches. They also suggest competing advantages for the regression tree and Bayesian mixture model approaches, making both reasonable default engines for multiple imputation of categorical data. A supplementary material for this article is available online

arXiv.org e-Print Archive

FigShare

Multiple Imputation for Multilevel Data with Continuous and Binary Variables

Author: Audigier Vincent
Carpenter James
Debray Thomas PA
Jolani Shahab
Quartagno Matteo
Resche-Rigon Matthieu
van Buuren Stef
White Ian R
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 27/11/2017
Field of study

We present and compare multiple imputation methods for multilevel continuous and binary data where variables are systematically and sporadically missing. The methods are compared from a theoretical point of view and through an extensive simulation study motivated by a real dataset comprising multiple studies. The comparisons show that these multiple imputation methods are the most appropriate to handle missing values in a multilevel setting and why their relative performances can vary according to the missing data pattern, the multilevel structure and the type of missing variables. This study shows that valid inferences can only be obtained if the dataset includes a large number of clusters. In addition, it highlights that heteroscedastic multiple imputation methods provide more accurate inferences than homoscedastic methods, which should be reserved for data with few individuals per cluster. Finally, guidelines are given to choose the most suitable multiple imputation method according to the structure of the data

arXiv.org e-Print Archive

Maastricht University Research Portal

Crossref

LSHTM Research Online

HAL-Inserm

UCL Discovery

HAL Descartes

Utrecht University Repository

Hal-Diderot

Multiple imputation methods for bivariate outcomes in cluster randomised trials.

Author: DiazOrdaz K
Gomes M
Grieve R
Kenward MG
Publication venue: 'Wiley'
Publication date: 14/03/2016
Field of study

Missing observations are common in cluster randomised trials. The problem is exacerbated when modelling bivariate outcomes jointly, as the proportion of complete cases is often considerably smaller than the proportion having either of the outcomes fully observed. Approaches taken to handling such missing data include the following: complete case analysis, single-level multiple imputation that ignores the clustering, multiple imputation with a fixed effect for each cluster and multilevel multiple imputation. We contrasted the alternative approaches to handling missing data in a cost-effectiveness analysis that uses data from a cluster randomised trial to evaluate an exercise intervention for care home residents. We then conducted a simulation study to assess the performance of these approaches on bivariate continuous outcomes, in terms of confidence interval coverage and empirical bias in the estimated treatment effects. Missing-at-random clustered data scenarios were simulated following a full-factorial design. Across all the missing data mechanisms considered, the multiple imputation methods provided estimators with negligible bias, while complete case analysis resulted in biased treatment effect estimates in scenarios where the randomised treatment arm was associated with missingness. Confidence interval coverage was generally in excess of nominal levels (up to 99.8%) following fixed-effects multiple imputation and too low following single-level multiple imputation. Multilevel multiple imputation led to coverage levels of approximately 95% throughout. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd

Crossref

LSHTM Research Online

PubMed Central

Multiple Imputation Methods for Treatment Noncompliance and Nonresponse in Randomized Clinical Trials

Author: Taylor Leslie
Zhou Xiao-Hua (Andrew)
Publication venue: Collection of Biostatistics Research Archive
Publication date: 19/02/2009
Field of study

Summary: Randomized clinical trials are a powerful tool for investigating causal treatment effects, but in human trials there are oftentimes problems of noncompliance which standard analyses, such as the intention-to-treat or as-treated analysis, either ignore or incorporate in such a way that the resulting estimand is no longer a causal effect. One alternative to these analyses is the complier average causal effect (CACE) which estimates the average causal treatment effect among a subpopulation that would comply under any treatment assigned. We focus on the setting of a randomized clinical trial with crossover treatment noncompliance (e.g., control subjects could receive the intervention and intervention subjects could receive the control) and outcome nonresponse. In this article, we develop estimators for the CACE using multiple imputation methods, which have been successfully applied to a wide variety of missing data problems, but have not yet been applied to the potential outcomes setting of causal inference. Using simulated data we investigate the finite sample properties of these estimators as well as of competing procedures in a simple setting. Finally we illustrate our methods using a real randomized encouragement design study on the effectiveness of the influenza vaccine

Collection Of Biostatistics Research Archive

Multiple imputation methods for longitudinal blood pressure measurements from the Framingham Heart Study

Author: Gauderman W James
Kang Terri
Kraft Peter
Thomas Duncan
Publication venue: BioMed Central
Publication date: 01/01/2003
Field of study

Missing data are a great concern in longitudinal studies, because few subjects will have complete data and missingness could be an indicator of an adverse outcome. Analyses that exclude potentially informative observations due to missing data can be inefficient or biased. To assess the extent of these problems in the context of genetic analyses, we compared case-wise deletion to two multiple imputation methods available in the popular SAS package, the propensity score and regression methods. For both the real and simulated data sets, the propensity score and regression methods produced results similar to case-wise deletion. However, for the simulated data, the estimates of heritability for case-wise deletion and the two multiple imputation methods were much lower than for the complete data. This suggests that if missingness patterns are correlated within families, then imputation methods that do not allow this correlation can yield biased results

Crossref

Springer - Publisher Connector

PubMed Central

Comparison of Multiple Imputation Methods for Categorical Survey Items with High Missing Rates: Application to the Family Life, Activity, Sun, Health and Eating (FLASHE) Study

Author: Dwyer Laura A.
Hennessy Erin
Liu Benmei
Nebeling Linda
Oh April
Publication venue: DigitalCommons@WayneState
Publication date: 05/09/2018
Field of study

Two multiple imputation methods, the Sequential Regression Multivariate Imputation Algorithm and the Cox-Lannacchione Weighted Sequential Hotdeck, were examined and compared to impute highly missing categorical variables from the Family Life, Activity, Sun, Health and Eating (FLASHE) study. This paper describes the imputation approaches and results from the study

Digital Commons@Wayne State University

Laparoscopic versus open colorectal resection for cancer and polyps: A cost-effectiveness study

Author: Dowson H
Gage H
Jackson D
Jordan J
Rockall T
Publication venue: 'Dove Medical Press Ltd.'
Publication date: 01/01/2014
Field of study

Methods: Participants were recruited in 2006-2007 in a district general hospital in the south of England; those with a diagnosis of cancer or polyps were included in the analysis. Quality of life data were collected using EQ-5D, on alternate days after surgery for 4 weeks. Costs per patient, from a National Health Service perspective (in British pounds, 2006) comprised the sum of operative, hospital, and community costs. Missing data were filled using multiple imputation methods. The difference in mean quality adjusted life years and costs between surgery groups were estimated simultaneously using a multivariate regression model applied to 20 imputed datasets. The probability that laparoscopic surgery is cost-effective compared to open surgery for a given societal willingness-to-pay threshold is illustrated using a cost-effectiveness acceptability curve

Crossref

PubMed Central

Surrey Research Insight

Brunel University Research Archive

Missing covariates in logistic regression, estimation and distribution selection.

Author: Claeskens Gerda
Consentino Fabrizio
Publication venue
Publication date
Field of study

We derive explicit formulae for estimation in logistic regression models where some of the covariates are missing. Our approach allows for modeling the distribution of the missing covariates either as a multivariate normal or multivariate t-distribution. A main advantage of this method is that it is fast and does not require the use of iterative procedures. A model selection method is derived which allows to choose amongst these distributions. In addition we consider versions of AIC that are based on the EM algorithm and on multiple imputation methods that have a wide applicability to model selection in likelihood models in general.Akaike information criterion; Likelihood model; Logistic regression; Missing covariates; Model selection; Multiple imputation; t-distribution;

Research Papers in Economics