Search CORE

Imputation of Continuous Variables Missing at Random using the Method of Simulated Scores

Author: C Gourieroux
D Fadden Mc
DB Rubin
DB Rubin
DB Rubin
DB Rubin
G Calzolari
JL Schafer
NJ Horton
RA Thisted
RJA Little
TE Raghunathan
V Hajivassiliou
WH Greene
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2002
Field of study

Combining estimates of interest in prognostic modelling studies after multiple imputation: current practice and guidelines

Author: A Burton
A Rouxel
AC Mertens
Andrea Marshall
BL Thomsen
C Serrat
D Collett
DB Rubin
DB Rubin
DB Rubin
DG Altman
Douglas G Altman
DW Hosmer
FE Harrell
FE Harrell
FR Hampel
G Ambler
G Vaughn
HC van Houwelingen
J O'Quigley
JA Hoeting
JC Wyatt
JL Schafer
JW Graham
KH Li
M Schemper
M Schemper
MG Kenward
MW Heymans
N Orsini
O Harel
P Peduzzi
P Royston
Patrick Royston
RA Fisher
Roger L Holder
S Gill
S Sinharay
S van Buuren
T Bärnighausen
TG Clark
TG Clark
WM Stadler
XL Meng
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Background: Multiple imputation (MI) provides an effective approach to handle missing covariate data within prognostic modelling studies, as it can properly account for the missing data uncertainty. The multiply imputed datasets are each analysed using standard prognostic modelling techniques to obtain the estimates of interest. The estimates from each imputed dataset are then combined into one overall estimate and variance, incorporating both the within and between imputation variability. Rubin's rules for combining these multiply imputed estimates are based on asymptotic theory. The resulting combined estimates may be more accurate if the posterior distribution of the population parameter of interest is better approximated by the normal distribution. However, the normality assumption may not be appropriate for all the parameters of interest when analysing prognostic modelling studies, such as predicted survival probabilities and model performance measures. Methods: Guidelines for combining the estimates of interest when analysing prognostic modelling studies are provided. A literature review is performed to identify current practice for combining such estimates in prognostic modelling studies. Results: Methods for combining all reported estimates after MI were not well reported in the current literature. Rubin's rules without applying any transformations were the standard approach used, when any method was stated. Conclusion: The proposed simple guidelines for combining estimates after MI may lead to a wider and more appropriate use of MI in future prognostic modelling studies

University of Birmingham Research Portal

Directory of Open Access Journals

Warwick Research Archives Portal Repository

UCL Discovery

Oxford University Research Archive

Comparison of methods for handling missing data on immunohistochemical markers in survival analysis of breast cancer

Author: A Marshall
AR Donders
DB Rubin
DB Rubin
DB Rubin
G Ambler
G Van der Heijden
IR White
JA Sterne
JL Schafer
JL Schafer
JL Schafer
JM Engels
JO Kim
KG Moons
NJ Horton
RA Little
RA Little
S Greenland
S van Buuren
SJ Dawson
TE Bodner
W Vach
Publication venue: Nature Publishing Group
Publication date: 01/01/2011
Field of study

Background:Tissue micro-arrays (TMAs) are increasingly used to generate data of the molecular phenotype of tumours in clinical epidemiology studies, such as studies of disease prognosis. However, TMA data are particularly prone to missingness. A variety of methods to deal with missing data are available. However, the validity of the various approaches is dependent on the structure of the missing data and there are few empirical studies dealing with missing data from molecular pathology. The purpose of this study was to investigate the results of four commonly used approaches to handling missing data from a large, multi-centre study of the molecular pathological determinants of prognosis in breast cancer.Patients and Methods:We pooled data from over 11 000 cases of invasive breast cancer from five studies that collected information on seven prognostic indicators together with survival time data. We compared the results of a multi-variate Cox regression using four approaches to handling missing data-complete case analysis (CCA), mean substitution (MS) and multiple imputation without inclusion of the outcome (MI) and multiple imputation with inclusion of the outcome (MI). We also performed an analysis in which missing data were simulated under different assumptions and the results of the four methods were compared.Results:Over half the cases had missing data on at least one of the seven variables and 11 percent had missing data on 4 or more. The multi-variate hazard ratio estimates based on multiple imputation models were very similar to those derived after using MS, with similar standard errors. Hazard ratio estimates based on the CCA were only slightly different, but the estimates were less precise as the standard errors were large. However, in data simulated to be missing completely at random (MCAR) or missing at random (MAR), estimates for MI were least biased and most accurate, whereas estimates for CCA were most biased and least accurate.Conclusion:In this study, empirical results from analyses using CCA, MS, MI and MI were similar, although results from CCA were less precise. The results from simulations suggest that in general MI is likely to be the best. Given the ease of implementing MI in standard statistical software, the results of MI and CCA should be compared in any multi-variate analysis where missing data are a problem. © 2011 Cancer Research UK. All rights reserved

Archivio della Ricerca - Università di Pisa

University of Melbourne Institutional Repository

Comparison of techniques for handling missing covariate data within prognostic modelling studies: a simulation study

Author: A Burton
A Burton
A Marshall
AH Herring
AM Wood
Andrea Marshall
DB Rubin
DB Rubin
DB Rubin
Douglas G Altman
F Barzi
FE Harrell
FE Harrell
FH Kong
HY Chen
I White
J Schafer
J Scheffer
JL Schafer
JL Schafer
JL Schafer
JL Schafer
JL Schafer
KH Li
LM Collins
LQ Tang
M Hu
N Schenker
NJ Horton
P Royston
Patrick Royston
PD Faris
R Bender
R Development Core Team
R Oostenbrink
RJA Little
Roger L Holder
S Demissie
S Greenland
S van Buuren
S van Buuren
SR Lipsitz
SR Lipsitz
TG Clark
W Sauerbrei
W Vach
XL Meng
XL Meng
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Background: There is no consensus on the most appropriate approach to handle missing covariate data within prognostic modelling studies. Therefore a simulation study was performed to assess the effects of different missing data techniques on the performance of a prognostic model. Methods: Datasets were generated to resemble the skewed distributions seen in a motivating breast cancer example. Multivariate missing data were imposed on four covariates using four different mechanisms; missing completely at random (MCAR), missing at random (MAR), missing not at random (MNAR) and a combination of all three mechanisms. Five amounts of incomplete cases from 5% to 75% were considered. Complete case analysis (CC), single imputation (SI) and five multiple imputation (MI) techniques available within the R statistical software were investigated: a) data augmentation (DA) approach assuming a multivariate normal distribution, b) DA assuming a general location model, c) regression switching imputation, d) regression switching with predictive mean matching (MICE-PMM) and e) flexible additive imputation models. A Cox proportional hazards model was fitted and appropriate estimates for the regression coefficients and model performance measures were obtained. Results: Performing a CC analysis produced unbiased regression estimates, but inflated standard errors, which affected the significance of the covariates in the model with 25% or more missingness. Using SI, underestimated the variability; resulting in poor coverage even with 10% missingness. Of the MI approaches, applying MICE-PMM produced, in general, the least biased estimates and better coverage for the incomplete covariates and better model performance for all mechanisms. However, this MI approach still produced biased regression coefficient estimates for the incomplete skewed continuous covariates when 50% or more cases had missing data imposed with a MCAR, MAR or combined mechanism. When the missingness depended on the incomplete covariates, i.e. MNAR, estimates were biased with more than 10% incomplete cases for all MI approaches. Conclusion: The results from this simulation study suggest that performing MICE-PMM may be the preferred MI approach provided that less than 50% of the cases have missing data and the missing data are not MNAR

University of Birmingham Research Portal

Warwick Research Archives Portal Repository

UCL Discovery

Oxford University Research Archive

Recovery of information from multiple imputation: a simulation study

Author: A Marshall
D Rubin
DB Rubin
DB Rubin
F Barzi
H Demirtas
I White
JAC Sterne
JL Schafer
JL Schafer
John B Carlin
JR Carpenter
JR Carpenter
Katherine J Lee
KJ Lee
LM Collins
MA Klebanoff
P Royston
S VanBuuren
StataCorp
TE Raghunathan
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Aberdeen University Research

Resource use data by patient report or hospital records: Do they agree?

Author: Adrian Grant
Andrew DM Kennedy
Anne P Leigh-Brown
AP Leigh Brown
DA Dillman
David J Torgerson
DB Rubin
J Brown-Betz
J Jobe
James Campbell
JB Fowles
JM Bland
ME McKinnon
NP Gordon
PE Shrout
R Roberts
SA Reijneveld
WJ Ungar
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2002
Field of study

Background: Economic evaluations alongside clinical trials are becoming increasingly common. Cost data are often collected through the use of postal questionnaires; however, the accuracy of this method is uncertain. We compared postal questionnaires with hospital records for collecting data on physiotherapy service use. Methods: As part of a randomised trial of orthopaedic medicine compared with orthopaedic surgery we collected physiotherapy use data on a group of patients from retrospective postal questionnaires and from hospital records. Results: 315 patients were referred for physiotherapy. Hospital data on attendances was available for 30% (n = 96), compared with 48% (n = 150) of patients completing questionnaire data (95% Cl for difference = 10% to 24%); 19% (n = 59) had data available from both sources. The two methods produced an intraclass correlation coefficient of 0.54 (95% Cl 0.31 to 0.70). However, the two methods produced significantly different estimates of resource use with patient self report recalling a mean of 1.3 extra visits (95% Cl 0.4 to 2.2) compared with hospital records. Conclusions: Using questionnaires in this study produced data on a greater number of patients compared with examination of hospital records. However, the two data sources did differ in the quantity of physiotherapy used and this should be taken into account in any analysi

Directory of Open Access Journals

White Rose Research Online

Brunel University Research Archive

Small Oscillatory Accelerations, Independent of Matrix Deformations, Increase Osteoblast Activity and Enhance Bone Morphology

Author: A Malone
AM Parfitt
BL Schaer
C Rubin
C Rubin
C Rubin
C Rubin
CH Turner
Clinton Rubin
CT Rubin
CT Rubin
DB Burr
DM Nunamaker
E Morey-Holton
G Galileo
HM Frost
J Rubin
J You
JP Spalazzi
KJ McLeod
L Xie
MA Lafortune
MJ Kaab
PR Stephens
R Garman
RG Bacabac
Russell Garman
S Judex
S Judex
S Judex
Shuguang Zhang
SP Fritton
Stefan Judex
TS Gross
V Gilsanz
Y Han
YX Qin
Publication venue: Public Library of Science
Publication date: 25/07/2007
Field of study

A range of tissues have the capacity to adapt to mechanical challenges, an attribute presumed to be regulated through deformation of the cell and/or surrounding matrix. In contrast, it is shown here that extremely small oscillatory accelerations, applied as unconstrained motion and inducing negligible deformation, serve as an anabolic stimulus to osteoblasts in vivo. Habitual background loading was removed from the tibiae of 18 female adult mice by hindlimb-unloading. For 20 min/d, 5 d/wk, the left tibia of each mouse was subjected to oscillatory 0.6 g accelerations at 45 Hz while the right tibia served as control. Sham-loaded (n = 9) and normal age-matched control (n = 18) mice provided additional comparisons. Oscillatory accelerations, applied in the absence of weight bearing, resulted in 70% greater bone formation rates in the trabeculae of the metaphysis, but similar levels of bone resorption, when compared to contralateral controls. Quantity and quality of trabecular bone also improved as a result of the acceleration stimulus, as evidenced by a significantly greater bone volume fraction (17%) and connectivity density (33%), and significantly smaller trabecular spacing (−6%) and structural model index (−11%). These in vivo data indicate that mechanosensory elements of resident bone cell populations can perceive and respond to acceleratory signals, and point to an efficient means of introducing intense physical signals into a biologic system without putting the matrix at risk of overloading. In retrospect, acceleration, as opposed to direct mechanical distortion, represents a more generic and safe, and perhaps more fundamental means of transducing physical challenges to the cells and tissues of an organism

Public Library of Science (PLOS)

Multiple imputation for estimating hazard ratios and predictive abilities in case-cohort surveys

Author: A Alperovitch
AP Dempster
B Langholz
DB Rubin
DB Rubin
F Harrell
FE Harrell
H Marti
Helena Marti
I Tzoulaki
K Chen
L Carcaillon
Laure Carcaillon
M Kulich
M Pencina
Michel Chavance
N Breslow
O Borgan
R Little
R Prentice
TH Scheike
TJ Wang
TM Therneau
WK Kremers
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Abstract Background The weighted estimators generally used for analyzing case-cohort studies are not fully efficient and naive estimates of the predictive ability of a model from case-cohort data depend on the subcohort size. However, case-cohort studies represent a special type of incomplete data, and methods for analyzing incomplete data should be appropriate, in particular multiple imputation (MI). Methods We performed simulations to validate the MI approach for estimating hazard ratios and the predictive ability of a model or of an additional variable in case-cohort surveys. As an illustration, we analyzed a case-cohort survey from the Three-City study to estimate the predictive ability of D-dimer plasma concentration on coronary heart disease (CHD) and on vascular dementia (VaD) risks. Results When the imputation model of the phase-2 variable was correctly specified, MI estimates of hazard ratios and predictive abilities were similar to those obtained with full data. When the imputation model was misspecified, MI could provide biased estimates of hazard ratios and predictive abilities. In the Three-City case-cohort study, elevated D-dimer levels increased the risk of VaD (hazard ratio for two consecutive tertiles = 1.69, 95%CI: 1.63-1.74). However, D-dimer levels did not improve the predictive ability of the model. Conclusions MI is a simple approach for analyzing case-cohort data and provides an easy evaluation of the predictive ability of a model or of an additional variable.</p

Directory of Open Access Journals

HAL-Inserm