Skip to main content
Article thumbnail
Location of Repository

Combining estimates of interest in prognostic modelling studies after multiple imputation: current practice and guidelines

By A. (Andrea) Marshall, Douglas G. Altman, Roger L. Holder and Patrick Royston


Background: Multiple imputation (MI) provides an effective approach to handle missing covariate\ud data within prognostic modelling studies, as it can properly account for the missing data\ud uncertainty. The multiply imputed datasets are each analysed using standard prognostic modelling\ud techniques to obtain the estimates of interest. The estimates from each imputed dataset are then\ud combined into one overall estimate and variance, incorporating both the within and between\ud imputation variability. Rubin's rules for combining these multiply imputed estimates are based on\ud asymptotic theory. The resulting combined estimates may be more accurate if the posterior\ud distribution of the population parameter of interest is better approximated by the normal\ud distribution. However, the normality assumption may not be appropriate for all the parameters of\ud interest when analysing prognostic modelling studies, such as predicted survival probabilities and\ud model performance measures.\ud Methods: Guidelines for combining the estimates of interest when analysing prognostic modelling\ud studies are provided. A literature review is performed to identify current practice for combining\ud such estimates in prognostic modelling studies.\ud Results: Methods for combining all reported estimates after MI were not well reported in the\ud current literature. Rubin's rules without applying any transformations were the standard approach\ud used, when any method was stated.\ud Conclusion: The proposed simple guidelines for combining estimates after MI may lead to a wider\ud and more appropriate use of MI in future prognostic modelling studies

Topics: R1, QA
Publisher: BioMed Central Ltd.
Year: 2009
OAI identifier:

Suggested articles


  1. (2008). A: Association of physical activity with cancer incidence, mortality, and survival: a populationbased study of men. doi
  2. (1991). Altman DG: A note on the calculation of expected survival, illustrated by the survival of liver transplant patients. Statistics in Medicine doi
  3. (1995). Altman DG: Commentary: Prognostic models: clinically useful or quickly forgotten? doi
  4. (2003). Altman DG: Developing a prognostic model in the presence of missing data. an ovarian cancer case study.
  5. (2004). Altman DG: Missing covariate data within cancer prognostic studies: a review of current reporting and proposed guidelines. doi
  6. (1997). Analysis of Incomplete Multivariate Data doi
  7. (1998). Cayla JA: CD4+ lymphocytes and tuberculin skin test as survival predictors in pulmonary tuberculosis HIV-infected patients. doi
  8. (1999). CT: Bayesian model averaging: A tutorial. Statistical Science doi
  9. (1996). DB: Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Statistics in Medicine doi
  10. (1992). DB: Performing likelihood ratio tests with multiply-imputed data sets. Biometrika doi
  11. (1991). DB: Significance levels from repeated p-values with multiply-imputed data. Statistica Sinica
  12. (2007). de Vet HCW: Variable selection under multiple imputation using the bootstrap in a prognostic study. doi
  13. (2005). Explained randomness in proportional hazards models. Statistics in Medicine doi
  14. (1996). Explained variation in survival analysis. doi
  15. (1995). Holford TR: Importance of events per independent variable in proportional hazards regression analysis. II. Accuracy and precision of regression estimates. doi
  16. (2001). JF: A prognostic model for ovarian cancer. doi
  17. (1999). Knook DL: Multiple imputation of missing blood pressure covariates in survival analysis. Statistics in Medicine doi
  18. (1990). Le Cessie S: Predictive value of statistical models. Statistics in Medicine doi
  19. (2004). Leenhardt L: Prognostic factors associated with the survival of patients developing loco-regional recurrences of differentiated thyroid carcinomas. doi
  20. (1999). Lemeshow S: Applied survival analysis – Regression modeling of time to event data doi
  21. (2001). LL: Late mortality experience in fiveyear survivors of childhood and adolescent cancer: The childhood cancer survivor study.
  22. (2008). M-L: High HIV incidence in a community with high HIV prevalence in rural South Africa: findings from a prospective population-based study. AIDS doi
  23. (2003). Modelling survival data in medical research Second edition. doi
  24. (1986). Multiple imputation for interval estimation from simple random samples with ignorable nonresponse. doi
  25. (2004). Multiple Imputation for Nonresponse in Surveys doi
  26. (1991). Multiple imputation in health-care databases: an overview and some applications. Statistics in Medicine doi
  27. (2007). Multiple imputation: current perspectives. doi
  28. (2004). Pooled analysis of fluorouracil-based adjuvant therapy for stage II and III colon cancer: Who benefits and by how much?
  29. (2000). Predictive accuracy and explained variation in Cox regression. Biometrics doi
  30. (2007). Protease inhibitors and cardiovascular disease: analysis of the Los Angeles County adult spectrum of disease cohort. AIDS Care doi
  31. (2001). Regression Modeling Strategies with Applications to Linear Models, Logistic Regression, and Survival Analysis doi
  32. (2004). Sauerbrei W: A new measure of prognostic separation in survival data. Statistics in Medicine doi
  33. (2002). Simplifying a prognostic model: a simulation study based on clinical data. Statistics in Medicine doi
  34. (1986). Stahel WA: Robust statistics. The approach based on influence functions doi
  35. (1941). Statistical Methods for Research Workers Edinburgh: Oliver and Boyd Ltd;
  36. (2007). TD: How many imputations are really needed? Some practical clarifications of multiple imputation theory. Prevention Science doi
  37. (2009). The estimation of R^2 and adjusted R^2 in incomplete data sets using multiple imputation. doi
  38. (2001). The use of multiple imputation for the analysis of missing data. Psychological Methods doi
  39. (2003). Vogelzang NJ: Prognostic factors for survival with gemcitabine plus 5-fluorouracil based regimens for metastatic renal cancer. doi
  40. (2000). What do we mean by validating a prognostic model? Statistics in Medicine doi

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.