Search CORE

2,354 research outputs found

Comparison of techniques for handling missing covariate data within prognostic modelling studies: a simulation study

Author: A Burton
A Burton
A Marshall
AH Herring
AM Wood
Andrea Marshall
DB Rubin
DB Rubin
DB Rubin
Douglas G Altman
F Barzi
FE Harrell
FE Harrell
FH Kong
HY Chen
I White
J Schafer
J Scheffer
JL Schafer
JL Schafer
JL Schafer
JL Schafer
JL Schafer
KH Li
LM Collins
LQ Tang
M Hu
N Schenker
NJ Horton
P Royston
Patrick Royston
PD Faris
R Bender
R Development Core Team
R Oostenbrink
RJA Little
Roger L Holder
S Demissie
S Greenland
S van Buuren
S van Buuren
SR Lipsitz
SR Lipsitz
TG Clark
W Sauerbrei
W Vach
XL Meng
XL Meng
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Background: There is no consensus on the most appropriate approach to handle missing covariate data within prognostic modelling studies. Therefore a simulation study was performed to assess the effects of different missing data techniques on the performance of a prognostic model. Methods: Datasets were generated to resemble the skewed distributions seen in a motivating breast cancer example. Multivariate missing data were imposed on four covariates using four different mechanisms; missing completely at random (MCAR), missing at random (MAR), missing not at random (MNAR) and a combination of all three mechanisms. Five amounts of incomplete cases from 5% to 75% were considered. Complete case analysis (CC), single imputation (SI) and five multiple imputation (MI) techniques available within the R statistical software were investigated: a) data augmentation (DA) approach assuming a multivariate normal distribution, b) DA assuming a general location model, c) regression switching imputation, d) regression switching with predictive mean matching (MICE-PMM) and e) flexible additive imputation models. A Cox proportional hazards model was fitted and appropriate estimates for the regression coefficients and model performance measures were obtained. Results: Performing a CC analysis produced unbiased regression estimates, but inflated standard errors, which affected the significance of the covariates in the model with 25% or more missingness. Using SI, underestimated the variability; resulting in poor coverage even with 10% missingness. Of the MI approaches, applying MICE-PMM produced, in general, the least biased estimates and better coverage for the incomplete covariates and better model performance for all mechanisms. However, this MI approach still produced biased regression coefficient estimates for the incomplete skewed continuous covariates when 50% or more cases had missing data imposed with a MCAR, MAR or combined mechanism. When the missingness depended on the incomplete covariates, i.e. MNAR, estimates were biased with more than 10% incomplete cases for all MI approaches. Conclusion: The results from this simulation study suggest that performing MICE-PMM may be the preferred MI approach provided that less than 50% of the cases have missing data and the missing data are not MNAR

Crossref

Springer - Publisher Connector

University of Birmingham Research Portal

PubMed Central

UCL Discovery

Warwick Research Archives Portal Repository

Oxford University Research Archive

Comparison of methods for handling missing data on immunohistochemical markers in survival analysis of breast cancer

Author: A Marshall
AR Donders
DB Rubin
DB Rubin
DB Rubin
G Ambler
G Van der Heijden
IR White
JA Sterne
JL Schafer
JL Schafer
JL Schafer
JM Engels
JO Kim
KG Moons
NJ Horton
RA Little
RA Little
S Greenland
S van Buuren
SJ Dawson
TE Bodner
W Vach
Publication venue: Nature Publishing Group
Publication date: 01/01/2011
Field of study

Background:Tissue micro-arrays (TMAs) are increasingly used to generate data of the molecular phenotype of tumours in clinical epidemiology studies, such as studies of disease prognosis. However, TMA data are particularly prone to missingness. A variety of methods to deal with missing data are available. However, the validity of the various approaches is dependent on the structure of the missing data and there are few empirical studies dealing with missing data from molecular pathology. The purpose of this study was to investigate the results of four commonly used approaches to handling missing data from a large, multi-centre study of the molecular pathological determinants of prognosis in breast cancer.Patients and Methods:We pooled data from over 11 000 cases of invasive breast cancer from five studies that collected information on seven prognostic indicators together with survival time data. We compared the results of a multi-variate Cox regression using four approaches to handling missing data-complete case analysis (CCA), mean substitution (MS) and multiple imputation without inclusion of the outcome (MI) and multiple imputation with inclusion of the outcome (MI). We also performed an analysis in which missing data were simulated under different assumptions and the results of the four methods were compared.Results:Over half the cases had missing data on at least one of the seven variables and 11 percent had missing data on 4 or more. The multi-variate hazard ratio estimates based on multiple imputation models were very similar to those derived after using MS, with similar standard errors. Hazard ratio estimates based on the CCA were only slightly different, but the estimates were less precise as the standard errors were large. However, in data simulated to be missing completely at random (MCAR) or missing at random (MAR), estimates for MI were least biased and most accurate, whereas estimates for CCA were most biased and least accurate.Conclusion:In this study, empirical results from analyses using CCA, MS, MI and MI were similar, although results from CCA were less precise. The results from simulations suggest that in general MI is likely to be the best. Given the ease of implementing MI in standard statistical software, the results of MI and CCA should be compared in any multi-variate analysis where missing data are a problem. © 2011 Cancer Research UK. All rights reserved

Crossref

PubMed Central

Archivio della Ricerca - Università di Pisa

University of Melbourne Institutional Repository

Missing covariate data within cancer prognostic studies: a review of current reporting and proposed guidelines

Author: A Burton
A Ciampi
D G Altman
DB Rubin
DG Altman
DG Altman
FE Harrell
JL Schafer
JL Schafer
M Staquet
M Vach
P Peduzzi
R Simon
RD Riley
RJA Little
S Greenland
W Vach
Publication venue: Nature Publishing Group
Publication date: 01/01/2004
Field of study

Prognostic models play a crucial role in the clinical decision-making process. Unfortunately, missing covariate data impede the construction of valid and reliable models, potentially introducing bias, if handled inappropriately. The extent of missing covariate data within reported cancer prognostic studies, the current handling and the quality of reporting this missing covariate data are unknown. Therefore, a review was conducted of 100 articles reporting multivariate survival analyses to assess potential prognostic factors, published within seven cancer journals in 2002. Missing covariate data is a common occurrence in studies performing multivariate survival analyses, being apparent in 81 of the 100 articles reviewed. The percentage of eligible cases with complete data was obtainable in 39 articles, and was <90% in 17 of these articles. The methods used to handle incomplete covariates were obtainable in 32 of the 81 articles with known missing data and the most commonly reported approaches were complete case and available case analysis. This review has highlighted deficiencies in the reporting of missing covariate data. Guidelines for presenting prognostic studies with missing covariate data are proposed, which if followed should clarify and standardise the reporting in future articles

Crossref

University of Birmingham Research Portal

PubMed Central

Oxford University Research Archive

Stereotyping and the treatment of missing data for drug and alcohol clinical trials

Author: AJ Figueredo
DR Rubin
JL Schafer
R Little
RJA Little
S Fielding
S Hedden
Stephan Arndt
Publication venue: BioMed Central
Publication date: 01/02/2009
Field of study

Stigma and stereotyping of marginalized groups often is insidious and shows up in unlikely places, for instance in how clinical trials consider dropouts in treatment research. A surprising number of studies presume that people who do not complete the study protocol relapse and code their data as if they had been observed. There is no good statistical rationale for this treatment of missing data and numerous and more defensible alternative methods are available. We need to be mindful about our attitudes and preconceptions about the people we are intending to help. There is no good reason to continue to support science built on this scientifically indefensible stereotyping, however unintentional

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

On multivariate imputation and forecasting of decadal wind speed missing data

Author: ASS Dorvlo
B Walsh
D Heckerman
DB Rubin
ES Gardner Jr
F Collopy
G Li
IYF Lun
JL Schafer
JL Schafer
JL Schafer
R Calif
R Core Team
R Wesonga
R Wesonga
R Wesonga
S Buuren Van
S Buuren van
S Buuren Van
S Buuren Van
TE Raghunathan
Z Qin
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Recommended from our members

Audio Cartography: Visual Encoding of Acoustic Parameters

Author: A Arteaga
AH Robinson
AL Kornfeld
AM MacEachren
AM MacEachren
C Nicolai
C Ware
CA Brewer
F Michel
H Scharlach
J Kang
J Mackinlay
J Wood
JK Wright
JL Caivano
JL Morrison
M Bugajska
M Dodge
M Southworth
M Wijffelaars
M Woolman
MA Harrower
P Lercher
R Kosara
R Scaife
RM Schafer
S Baron-Cohen
SK Card
Working Group on the Assessment of Exposure to Noise
WS Cleveland
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Our sonic environment is the matter of subject in multiple domains which developed individual means of its description. As a result, it lacks an established visual language through which knowledge can be connected and insights shared. We provide a visual communication framework for the systematic and coherent documentation of sound in large-scale environments. This consists of visual encodings and mappings of acoustic parameters into distinct graphic variables that present plausible solutions for the visualization of sound. These candidate encodings are assembled into an application-independent, multifunctional, and extensible design guide. We apply the guidelines and show example maps that acts as a basis for the exploration of audio cartography

City Research Online

Crossref

Comparison of imputation methods for handling missing covariate data when fitting a Cox proportional hazards model: a resampling study

Author: A Burton
A Burton
A Marshall
A Marshall
AH Herring
Andrea Marshall
B Efron
DB Rubin
DB Rubin
Douglas G Altman
F Barzi
FE Harrell
FE Harrell
G Ambler
GB Durrant
I White
J Concato
J Schafer
JJ Deeks
JL Schafer
JL Schafer
JL Schafer
LM Yu
MG Kenward
N Schenker
P Royston
P Royston
R Gray
RG Gray
RJ Little
RJA Little
Roger L Holder
S Demissie
S van Buuren
S van Buuren
S van Buuren
SP Murphy
T Ezzati-Rice
TG Clark
TG Clark
W Vach
XL Meng
Z Xia
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background The appropriate handling of missing covariate data in prognostic modelling studies is yet to be conclusively determined. A resampling study was performed to investigate the effects of different missing data methods on the performance of a prognostic model. Methods Observed data for 1000 cases were sampled with replacement from a large complete dataset of 7507 patients to obtain 500 replications. Five levels of missingness (ranging from 5% to 75%) were imposed on three covariates using a missing at random (MAR) mechanism. Five missing data methods were applied; a) complete case analysis (CC) b) single imputation using regression switching with predictive mean matching (SI), c) multiple imputation using regression switching imputation, d) multiple imputation using regression switching with predictive mean matching (MICE-PMM) and e) multiple imputation using flexible additive imputation models. A Cox proportional hazards model was fitted to each dataset and estimates for the regression coefficients and model performance measures obtained. Results CC produced biased regression coefficient estimates and inflated standard errors (SEs) with 25% or more missingness. The underestimated SE after SI resulted in poor coverage with 25% or more missingness. Of the MI approaches investigated, MI using MICE-PMM produced the least biased estimates and better model performance measures. However, this MI approach still produced biased regression coefficient estimates with 75% missingness. Conclusions Very few differences were seen between the results from all missing data approaches with 5% missingness. However, performing MI using MICE-PMM may be the preferred missing data approach for handling between 10% and 50% MAR missingness.</p

Crossref

Springer - Publisher Connector

University of Birmingham Research Portal

Directory of Open Access Journals

PubMed Central

Oxford University Research Archive

Warwick Research Archives Portal Repository

Recovery of information from multiple imputation: a simulation study

Author: A Marshall
D Rubin
DB Rubin
DB Rubin
F Barzi
H Demirtas
I White
JAC Sterne
JL Schafer
JL Schafer
John B Carlin
JR Carpenter
JR Carpenter
Katherine J Lee
KJ Lee
LM Collins
MA Klebanoff
P Royston
S VanBuuren
StataCorp
TE Raghunathan
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Pathophysiology of acute experimental pancreatitis: Lessons from genetically engineered animal models and new molecular approaches

Author: Anne Barbara Tietz
Arnush M
Bhagat L
Bhatia M
Bhatia M
Bockman DE
Burkhard Göke
Claus Schäfer
Demols A
Ebert MP
Frossard JL
Gardner JD
Gerard C
Grady T
Grisham M
Gu D
Gukovskaya AS
Gukovsky I
Hahm KB
Halangk W
Halangk W
Korc M
Lee MS
Leser HG
Massague J
Meda P
Norman J
Ohnishi H
Ohnishi H
Ohnishi H
Roberts AB
Saluja A
Schafer C
Schafer C
Schmid RM
Song AM
Steinle A
Van Acker GJD
Vogelmann R
Publication venue: 'S. Karger AG'
Publication date: 01/01/2005
Field of study

The incidence of acute pancreatitis is growing and worldwide population-based studies report a doubling or tripling since the 1970s. 25% of acute pancreatitis are severe and associated with histological changes of necrotizing pancreatitis. There is still no specific medical treatment for acute pancreatitis. The average mortality resides around 10%. In order to develop new specific medical treatment strategies for acute pancreatitis, a better understanding of the pathophysiology during the onset of acute pancreatitis is necessary. Since it is difficult to study the early acinar events in human pancreatitis, several animal models of acute pancreatitis have been developed. By this, it is hoped that clues into human pathophysiology become possible. In the last decade, while employing molecular biology techniques, a major progress has been made. The genome of the mouse was recently sequenced. Various strategies are possible to prove a causal effect of a single gene or protein, using either gain-of-function (i.e., overexpression of the protein of interest) or loss-of-function studies (i.e., genetic deletion of the gene of interest). The availability of transgenic mouse models and gene deletion studies has clearly increased our knowledge about the pathophysiology of acute pancreatitis and enables us to study and confirm in vitro findings in animal models. In addition, transgenic models with specific genetic deletion or overexpression of genes help in understanding the role of one specific protein in a cascade of inflammatory processes such as pancreatitis where different proteins interact and co-react. This review summarizes the recent progress in this field. Copyright (c) 2005 S. Karger AG, Basel

Crossref

Open Access LMU ( Ludwig-Maximilians-Univ. München)

Multiple Imputation Ensembles (MIE) for dealing with missing data

Author: A Farhangfar
AM Sefidian
B Schölkopf
C Cortes
CT Tran
DA Newman
DB Rubin
DB Rubin
DH Wolpert
EL Silva-Ramírez
GE Batista
GJ van der Heijden
H Gao
IH Witten
J Demšar
J Honaker
J Honaker
J Scheffer
JA Sterne
JL Schafer
JL Schafer
JR Quinlan
K Abayomi
KM Ting
L Breiman
L Breiman
L Rokach
M Fichman
M Khalilia
M Spratt
MA Klebanoff
MJ Azur
NJ Horton
PJ García-Laencina
PJ Kelly
PN Tan
RJ Little
S García
S Van Buuren
S Van Buuren
SS Chae
SS Choi
U Garciarena
V Vapnik
X Chen
Y Dong
Y Freund
Y He
Z Che
Z Liu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/05/2020
Field of study

Missing data is a significant issue in many real-world datasets, yet there are no robust methods for dealing with it appropriately. In this paper, we propose a robust approach to dealing with missing data in classification problems: Multiple Imputation Ensembles (MIE). Our method integrates two approaches: multiple imputation and ensemble methods and compares two types of ensembles: bagging and stacking. We also propose a robust experimental set-up using 20 benchmark datasets from the UCI machine learning repository. For each dataset, we introduce increasing amounts of data Missing Completely at Random. Firstly, we use a number of single/multiple imputation methods to recover the missing values and then ensemble a number of different classifiers built on the imputed data. We assess the quality of the imputation by using dissimilarity measures. We also evaluate the MIE performance by comparing classification accuracy on the complete and imputed data. Furthermore, we use the accuracy of simple imputation as a benchmark for comparison. We find that our proposed approach combining multiple imputation with ensemble techniques outperform others, particularly as missing data increases

Crossref

University of East Anglia digital repository