544 research outputs found

    Comparison of techniques for handling missing covariate data within prognostic modelling studies: a simulation study

    Get PDF
    Background: There is no consensus on the most appropriate approach to handle missing covariate data within prognostic modelling studies. Therefore a simulation study was performed to assess the effects of different missing data techniques on the performance of a prognostic model. Methods: Datasets were generated to resemble the skewed distributions seen in a motivating breast cancer example. Multivariate missing data were imposed on four covariates using four different mechanisms; missing completely at random (MCAR), missing at random (MAR), missing not at random (MNAR) and a combination of all three mechanisms. Five amounts of incomplete cases from 5% to 75% were considered. Complete case analysis (CC), single imputation (SI) and five multiple imputation (MI) techniques available within the R statistical software were investigated: a) data augmentation (DA) approach assuming a multivariate normal distribution, b) DA assuming a general location model, c) regression switching imputation, d) regression switching with predictive mean matching (MICE-PMM) and e) flexible additive imputation models. A Cox proportional hazards model was fitted and appropriate estimates for the regression coefficients and model performance measures were obtained. Results: Performing a CC analysis produced unbiased regression estimates, but inflated standard errors, which affected the significance of the covariates in the model with 25% or more missingness. Using SI, underestimated the variability; resulting in poor coverage even with 10% missingness. Of the MI approaches, applying MICE-PMM produced, in general, the least biased estimates and better coverage for the incomplete covariates and better model performance for all mechanisms. However, this MI approach still produced biased regression coefficient estimates for the incomplete skewed continuous covariates when 50% or more cases had missing data imposed with a MCAR, MAR or combined mechanism. When the missingness depended on the incomplete covariates, i.e. MNAR, estimates were biased with more than 10% incomplete cases for all MI approaches. Conclusion: The results from this simulation study suggest that performing MICE-PMM may be the preferred MI approach provided that less than 50% of the cases have missing data and the missing data are not MNAR

    Imputation of continuous variables missing at random using the method of simulated scores

    Get PDF
    For multivariate datasets with missing values, we present a procedure of statistical inference and state its "optimal" properties. Two main assumptions are needed: (1) data are missing at random (MAR); (2) the data generating process is a multivariate normal linear regression. Disentangling the problem of convergence of the iterative estimation/imputation procedure, we show that the estimator is a "method of simulated scores" (a particular case of McFadden's "method of simulated moments"); thus the estimator is equivalent to maximum likelihood if the number of replications is conveniently large, and the whole procedure can be considered an optimal parametric technique for imputation of missing data

    Comparison of methods for handling missing data on immunohistochemical markers in survival analysis of breast cancer

    Get PDF
    Background:Tissue micro-arrays (TMAs) are increasingly used to generate data of the molecular phenotype of tumours in clinical epidemiology studies, such as studies of disease prognosis. However, TMA data are particularly prone to missingness. A variety of methods to deal with missing data are available. However, the validity of the various approaches is dependent on the structure of the missing data and there are few empirical studies dealing with missing data from molecular pathology. The purpose of this study was to investigate the results of four commonly used approaches to handling missing data from a large, multi-centre study of the molecular pathological determinants of prognosis in breast cancer.Patients and Methods:We pooled data from over 11 000 cases of invasive breast cancer from five studies that collected information on seven prognostic indicators together with survival time data. We compared the results of a multi-variate Cox regression using four approaches to handling missing data-complete case analysis (CCA), mean substitution (MS) and multiple imputation without inclusion of the outcome (MI) and multiple imputation with inclusion of the outcome (MI). We also performed an analysis in which missing data were simulated under different assumptions and the results of the four methods were compared.Results:Over half the cases had missing data on at least one of the seven variables and 11 percent had missing data on 4 or more. The multi-variate hazard ratio estimates based on multiple imputation models were very similar to those derived after using MS, with similar standard errors. Hazard ratio estimates based on the CCA were only slightly different, but the estimates were less precise as the standard errors were large. However, in data simulated to be missing completely at random (MCAR) or missing at random (MAR), estimates for MI were least biased and most accurate, whereas estimates for CCA were most biased and least accurate.Conclusion:In this study, empirical results from analyses using CCA, MS, MI and MI were similar, although results from CCA were less precise. The results from simulations suggest that in general MI is likely to be the best. Given the ease of implementing MI in standard statistical software, the results of MI and CCA should be compared in any multi-variate analysis where missing data are a problem. © 2011 Cancer Research UK. All rights reserved

    Pancreatic adenocarcinoma in a patient with Situs Inversus: a case report of this rare coincidence

    Get PDF
    <p>Abstract</p> <p>Background</p> <p><it>Situs inversus </it>(SI) is a relatively rare occurrence in patients with pancreatic adenocarcinoma. Pancreatic resection in these patients has rarely been described. CT scan imaging is a principle modality for detecting pancreatic cancer and its use in SI patients is seldom reported.</p> <p>Case Presentation</p> <p>We report a 48 year old woman with SI who, despite normal CT scan 8 months earlier, presented with obstructive jaundice and a pancreatic head mass requiring a pancreaticoduodenectomy. The surgical pathology report demonstrated pancreatic adenocarcinoma.</p> <p>Conclusion</p> <p>SI is a rare condition with concurrent pancreatic cancer being even rarer. Despite the rarity, pancreaticoduodenectomy in these patients for resectable lesions is safe as long as special consideration to the anatomy is taken. Additionally, radiographic imaging has significantly improved detection of early pancreatic cancer; however, there continues to be a need for improved detection of small neoplasms.</p

    Multiple Imputation Ensembles (MIE) for dealing with missing data

    Get PDF
    Missing data is a significant issue in many real-world datasets, yet there are no robust methods for dealing with it appropriately. In this paper, we propose a robust approach to dealing with missing data in classification problems: Multiple Imputation Ensembles (MIE). Our method integrates two approaches: multiple imputation and ensemble methods and compares two types of ensembles: bagging and stacking. We also propose a robust experimental set-up using 20 benchmark datasets from the UCI machine learning repository. For each dataset, we introduce increasing amounts of data Missing Completely at Random. Firstly, we use a number of single/multiple imputation methods to recover the missing values and then ensemble a number of different classifiers built on the imputed data. We assess the quality of the imputation by using dissimilarity measures. We also evaluate the MIE performance by comparing classification accuracy on the complete and imputed data. Furthermore, we use the accuracy of simple imputation as a benchmark for comparison. We find that our proposed approach combining multiple imputation with ensemble techniques outperform others, particularly as missing data increases

    Coordinated optimization of visual cortical maps (I) Symmetry-based analysis

    Get PDF
    In the primary visual cortex of primates and carnivores, functional architecture can be characterized by maps of various stimulus features such as orientation preference (OP), ocular dominance (OD), and spatial frequency. It is a long-standing question in theoretical neuroscience whether the observed maps should be interpreted as optima of a specific energy functional that summarizes the design principles of cortical functional architecture. A rigorous evaluation of this optimization hypothesis is particularly demanded by recent evidence that the functional architecture of OP columns precisely follows species invariant quantitative laws. Because it would be desirable to infer the form of such an optimization principle from the biological data, the optimization approach to explain cortical functional architecture raises the following questions: i) What are the genuine ground states of candidate energy functionals and how can they be calculated with precision and rigor? ii) How do differences in candidate optimization principles impact on the predicted map structure and conversely what can be learned about an hypothetical underlying optimization principle from observations on map structure? iii) Is there a way to analyze the coordinated organization of cortical maps predicted by optimization principles in general? To answer these questions we developed a general dynamical systems approach to the combined optimization of visual cortical maps of OP and another scalar feature such as OD or spatial frequency preference.Comment: 90 pages, 16 figure
    • …
    corecore