2,354 research outputs found
Comparison of techniques for handling missing covariate data within prognostic modelling studies: a simulation study
Background: There is no consensus on the most appropriate approach to handle missing covariate data within prognostic modelling studies. Therefore a simulation study was performed to assess the effects of different missing data techniques on the performance of a prognostic model.
Methods: Datasets were generated to resemble the skewed distributions seen in a motivating breast cancer example. Multivariate missing data were imposed on four covariates using four different mechanisms; missing completely at random (MCAR), missing at random (MAR), missing not at random (MNAR) and a combination of all three mechanisms. Five amounts of incomplete cases from 5% to 75% were considered. Complete case analysis (CC), single imputation (SI) and five multiple imputation (MI) techniques available within the R statistical software were investigated: a) data augmentation (DA) approach assuming a multivariate normal distribution, b) DA assuming a general location model, c) regression switching imputation, d) regression switching with predictive mean matching (MICE-PMM) and e) flexible additive imputation models. A Cox proportional hazards model was fitted and appropriate estimates for the regression coefficients and model performance measures were obtained.
Results: Performing a CC analysis produced unbiased regression estimates, but inflated standard errors, which affected the significance of the covariates in the model with 25% or more missingness. Using SI, underestimated the variability; resulting in poor coverage even with 10% missingness. Of the MI approaches, applying MICE-PMM produced, in general, the least biased estimates and better coverage for the incomplete covariates and better model performance for all mechanisms. However, this MI approach still produced biased regression coefficient estimates for the incomplete skewed continuous covariates when 50% or more cases had missing data imposed with a MCAR, MAR or combined mechanism. When the missingness depended on the incomplete covariates, i.e. MNAR, estimates were biased with more than 10% incomplete cases for all MI approaches.
Conclusion: The results from this simulation study suggest that performing MICE-PMM may be the preferred MI approach provided that less than 50% of the cases have missing data and the missing data are not MNAR
Comparison of methods for handling missing data on immunohistochemical markers in survival analysis of breast cancer
Background:Tissue micro-arrays (TMAs) are increasingly used to generate data of the molecular phenotype of tumours in clinical epidemiology studies, such as studies of disease prognosis. However, TMA data are particularly prone to missingness. A variety of methods to deal with missing data are available. However, the validity of the various approaches is dependent on the structure of the missing data and there are few empirical studies dealing with missing data from molecular pathology. The purpose of this study was to investigate the results of four commonly used approaches to handling missing data from a large, multi-centre study of the molecular pathological determinants of prognosis in breast cancer.Patients and Methods:We pooled data from over 11 000 cases of invasive breast cancer from five studies that collected information on seven prognostic indicators together with survival time data. We compared the results of a multi-variate Cox regression using four approaches to handling missing data-complete case analysis (CCA), mean substitution (MS) and multiple imputation without inclusion of the outcome (MI) and multiple imputation with inclusion of the outcome (MI). We also performed an analysis in which missing data were simulated under different assumptions and the results of the four methods were compared.Results:Over half the cases had missing data on at least one of the seven variables and 11 percent had missing data on 4 or more. The multi-variate hazard ratio estimates based on multiple imputation models were very similar to those derived after using MS, with similar standard errors. Hazard ratio estimates based on the CCA were only slightly different, but the estimates were less precise as the standard errors were large. However, in data simulated to be missing completely at random (MCAR) or missing at random (MAR), estimates for MI were least biased and most accurate, whereas estimates for CCA were most biased and least accurate.Conclusion:In this study, empirical results from analyses using CCA, MS, MI and MI were similar, although results from CCA were less precise. The results from simulations suggest that in general MI is likely to be the best. Given the ease of implementing MI in standard statistical software, the results of MI and CCA should be compared in any multi-variate analysis where missing data are a problem. © 2011 Cancer Research UK. All rights reserved
Missing covariate data within cancer prognostic studies: a review of current reporting and proposed guidelines
Prognostic models play a crucial role in the clinical decision-making process. Unfortunately, missing covariate data impede the construction of valid and reliable models, potentially introducing bias, if handled inappropriately. The extent of missing covariate data within reported cancer prognostic studies, the current handling and the quality of reporting this missing covariate data are unknown. Therefore, a review was conducted of 100 articles reporting multivariate survival analyses to assess potential prognostic factors, published within seven cancer journals in 2002. Missing covariate data is a common occurrence in studies performing multivariate survival analyses, being apparent in 81 of the 100 articles reviewed. The percentage of eligible cases with complete data was obtainable in 39 articles, and was <90% in 17 of these articles. The methods used to handle incomplete covariates were obtainable in 32 of the 81 articles with known missing data and the most commonly reported approaches were complete case and available case analysis. This review has highlighted deficiencies in the reporting of missing covariate data. Guidelines for presenting prognostic studies with missing covariate data are proposed, which if followed should clarify and standardise the reporting in future articles
Stereotyping and the treatment of missing data for drug and alcohol clinical trials
Stigma and stereotyping of marginalized groups often is insidious and shows up in unlikely places, for instance in how clinical trials consider dropouts in treatment research. A surprising number of studies presume that people who do not complete the study protocol relapse and code their data as if they had been observed. There is no good statistical rationale for this treatment of missing data and numerous and more defensible alternative methods are available. We need to be mindful about our attitudes and preconceptions about the people we are intending to help. There is no good reason to continue to support science built on this scientifically indefensible stereotyping, however unintentional
Recommended from our members
Audio Cartography: Visual Encoding of Acoustic Parameters
Our sonic environment is the matter of subject in multiple domains which developed individual means of its description. As a result, it lacks an established visual language through which knowledge can be connected and insights shared. We provide a visual communication framework for the systematic and coherent documentation of sound in large-scale environments. This consists of visual encodings and mappings of acoustic parameters into distinct graphic variables that present plausible solutions for the visualization of sound. These candidate encodings are assembled into an application-independent, multifunctional, and extensible design guide. We apply the guidelines and show example maps that acts as a basis for the exploration of audio cartography
Comparison of imputation methods for handling missing covariate data when fitting a Cox proportional hazards model: a resampling study
<p>Abstract</p> <p>Background</p> <p>The appropriate handling of missing covariate data in prognostic modelling studies is yet to be conclusively determined. A resampling study was performed to investigate the effects of different missing data methods on the performance of a prognostic model.</p> <p>Methods</p> <p>Observed data for 1000 cases were sampled with replacement from a large complete dataset of 7507 patients to obtain 500 replications. Five levels of missingness (ranging from 5% to 75%) were imposed on three covariates using a missing at random (MAR) mechanism. Five missing data methods were applied; a) complete case analysis (CC) b) single imputation using regression switching with predictive mean matching (SI), c) multiple imputation using regression switching imputation, d) multiple imputation using regression switching with predictive mean matching (MICE-PMM) and e) multiple imputation using flexible additive imputation models. A Cox proportional hazards model was fitted to each dataset and estimates for the regression coefficients and model performance measures obtained.</p> <p>Results</p> <p>CC produced biased regression coefficient estimates and inflated standard errors (SEs) with 25% or more missingness. The underestimated SE after SI resulted in poor coverage with 25% or more missingness. Of the MI approaches investigated, MI using MICE-PMM produced the least biased estimates and better model performance measures. However, this MI approach still produced biased regression coefficient estimates with 75% missingness.</p> <p>Conclusions</p> <p>Very few differences were seen between the results from all missing data approaches with 5% missingness. However, performing MI using MICE-PMM may be the preferred missing data approach for handling between 10% and 50% MAR missingness.</p
Pathophysiology of acute experimental pancreatitis: Lessons from genetically engineered animal models and new molecular approaches
The incidence of acute pancreatitis is growing and worldwide population-based studies report a doubling or tripling since the 1970s. 25% of acute pancreatitis are severe and associated with histological changes of necrotizing pancreatitis. There is still no specific medical treatment for acute pancreatitis. The average mortality resides around 10%. In order to develop new specific medical treatment strategies for acute pancreatitis, a better understanding of the pathophysiology during the onset of acute pancreatitis is necessary. Since it is difficult to study the early acinar events in human pancreatitis, several animal models of acute pancreatitis have been developed. By this, it is hoped that clues into human pathophysiology become possible. In the last decade, while employing molecular biology techniques, a major progress has been made. The genome of the mouse was recently sequenced. Various strategies are possible to prove a causal effect of a single gene or protein, using either gain-of-function (i.e., overexpression of the protein of interest) or loss-of-function studies (i.e., genetic deletion of the gene of interest). The availability of transgenic mouse models and gene deletion studies has clearly increased our knowledge about the pathophysiology of acute pancreatitis and enables us to study and confirm in vitro findings in animal models. In addition, transgenic models with specific genetic deletion or overexpression of genes help in understanding the role of one specific protein in a cascade of inflammatory processes such as pancreatitis where different proteins interact and co-react. This review summarizes the recent progress in this field. Copyright (c) 2005 S. Karger AG, Basel
Multiple Imputation Ensembles (MIE) for dealing with missing data
Missing data is a significant issue in many real-world datasets, yet there are no robust methods for dealing with it appropriately. In this paper, we propose a robust approach to dealing with missing data in classification problems: Multiple Imputation Ensembles (MIE). Our method integrates two approaches: multiple imputation and ensemble methods and compares two types of ensembles: bagging and stacking. We also propose a robust experimental set-up using 20 benchmark datasets from the UCI machine learning repository. For each dataset, we introduce increasing amounts of data Missing Completely at Random. Firstly, we use a number of single/multiple imputation methods to recover the missing values and then ensemble a number of different classifiers built on the imputed data. We assess the quality of the imputation by using dissimilarity measures. We also evaluate the MIE performance by comparing classification accuracy on the complete and imputed data. Furthermore, we use the accuracy of simple imputation as a benchmark for comparison. We find that our proposed approach combining multiple imputation with ensemble techniques outperform others, particularly as missing data increases
- …