544 research outputs found
Comparison of techniques for handling missing covariate data within prognostic modelling studies: a simulation study
Background: There is no consensus on the most appropriate approach to handle missing covariate data within prognostic modelling studies. Therefore a simulation study was performed to assess the effects of different missing data techniques on the performance of a prognostic model.
Methods: Datasets were generated to resemble the skewed distributions seen in a motivating breast cancer example. Multivariate missing data were imposed on four covariates using four different mechanisms; missing completely at random (MCAR), missing at random (MAR), missing not at random (MNAR) and a combination of all three mechanisms. Five amounts of incomplete cases from 5% to 75% were considered. Complete case analysis (CC), single imputation (SI) and five multiple imputation (MI) techniques available within the R statistical software were investigated: a) data augmentation (DA) approach assuming a multivariate normal distribution, b) DA assuming a general location model, c) regression switching imputation, d) regression switching with predictive mean matching (MICE-PMM) and e) flexible additive imputation models. A Cox proportional hazards model was fitted and appropriate estimates for the regression coefficients and model performance measures were obtained.
Results: Performing a CC analysis produced unbiased regression estimates, but inflated standard errors, which affected the significance of the covariates in the model with 25% or more missingness. Using SI, underestimated the variability; resulting in poor coverage even with 10% missingness. Of the MI approaches, applying MICE-PMM produced, in general, the least biased estimates and better coverage for the incomplete covariates and better model performance for all mechanisms. However, this MI approach still produced biased regression coefficient estimates for the incomplete skewed continuous covariates when 50% or more cases had missing data imposed with a MCAR, MAR or combined mechanism. When the missingness depended on the incomplete covariates, i.e. MNAR, estimates were biased with more than 10% incomplete cases for all MI approaches.
Conclusion: The results from this simulation study suggest that performing MICE-PMM may be the preferred MI approach provided that less than 50% of the cases have missing data and the missing data are not MNAR
Imputation of continuous variables missing at random using the method of simulated scores
For multivariate datasets with missing values, we present a procedure of statistical inference and state its "optimal" properties. Two main assumptions are needed: (1) data are missing at random (MAR); (2) the data generating process is a multivariate normal linear regression. Disentangling the problem of convergence of the iterative estimation/imputation procedure, we show that the estimator is a "method of simulated scores" (a particular case of McFadden's "method of simulated moments"); thus the estimator is equivalent to maximum likelihood if the number of replications is conveniently large, and the whole procedure can be considered an optimal parametric technique for imputation of missing data
Imputation of Continuous Variables Missing at Random using the Method of Simulated Scores
Comparison of methods for handling missing data on immunohistochemical markers in survival analysis of breast cancer
Background:Tissue micro-arrays (TMAs) are increasingly used to generate data of the molecular phenotype of tumours in clinical epidemiology studies, such as studies of disease prognosis. However, TMA data are particularly prone to missingness. A variety of methods to deal with missing data are available. However, the validity of the various approaches is dependent on the structure of the missing data and there are few empirical studies dealing with missing data from molecular pathology. The purpose of this study was to investigate the results of four commonly used approaches to handling missing data from a large, multi-centre study of the molecular pathological determinants of prognosis in breast cancer.Patients and Methods:We pooled data from over 11 000 cases of invasive breast cancer from five studies that collected information on seven prognostic indicators together with survival time data. We compared the results of a multi-variate Cox regression using four approaches to handling missing data-complete case analysis (CCA), mean substitution (MS) and multiple imputation without inclusion of the outcome (MI) and multiple imputation with inclusion of the outcome (MI). We also performed an analysis in which missing data were simulated under different assumptions and the results of the four methods were compared.Results:Over half the cases had missing data on at least one of the seven variables and 11 percent had missing data on 4 or more. The multi-variate hazard ratio estimates based on multiple imputation models were very similar to those derived after using MS, with similar standard errors. Hazard ratio estimates based on the CCA were only slightly different, but the estimates were less precise as the standard errors were large. However, in data simulated to be missing completely at random (MCAR) or missing at random (MAR), estimates for MI were least biased and most accurate, whereas estimates for CCA were most biased and least accurate.Conclusion:In this study, empirical results from analyses using CCA, MS, MI and MI were similar, although results from CCA were less precise. The results from simulations suggest that in general MI is likely to be the best. Given the ease of implementing MI in standard statistical software, the results of MI and CCA should be compared in any multi-variate analysis where missing data are a problem. © 2011 Cancer Research UK. All rights reserved
Pancreatic adenocarcinoma in a patient with Situs Inversus: a case report of this rare coincidence
<p>Abstract</p> <p>Background</p> <p><it>Situs inversus </it>(SI) is a relatively rare occurrence in patients with pancreatic adenocarcinoma. Pancreatic resection in these patients has rarely been described. CT scan imaging is a principle modality for detecting pancreatic cancer and its use in SI patients is seldom reported.</p> <p>Case Presentation</p> <p>We report a 48 year old woman with SI who, despite normal CT scan 8 months earlier, presented with obstructive jaundice and a pancreatic head mass requiring a pancreaticoduodenectomy. The surgical pathology report demonstrated pancreatic adenocarcinoma.</p> <p>Conclusion</p> <p>SI is a rare condition with concurrent pancreatic cancer being even rarer. Despite the rarity, pancreaticoduodenectomy in these patients for resectable lesions is safe as long as special consideration to the anatomy is taken. Additionally, radiographic imaging has significantly improved detection of early pancreatic cancer; however, there continues to be a need for improved detection of small neoplasms.</p
Multiple Imputation Ensembles (MIE) for dealing with missing data
Missing data is a significant issue in many real-world datasets, yet there are no robust methods for dealing with it appropriately. In this paper, we propose a robust approach to dealing with missing data in classification problems: Multiple Imputation Ensembles (MIE). Our method integrates two approaches: multiple imputation and ensemble methods and compares two types of ensembles: bagging and stacking. We also propose a robust experimental set-up using 20 benchmark datasets from the UCI machine learning repository. For each dataset, we introduce increasing amounts of data Missing Completely at Random. Firstly, we use a number of single/multiple imputation methods to recover the missing values and then ensemble a number of different classifiers built on the imputed data. We assess the quality of the imputation by using dissimilarity measures. We also evaluate the MIE performance by comparing classification accuracy on the complete and imputed data. Furthermore, we use the accuracy of simple imputation as a benchmark for comparison. We find that our proposed approach combining multiple imputation with ensemble techniques outperform others, particularly as missing data increases
Food choice and phytoestrogen consumption in women previously treated for postmenopausal breast cancer
Coordinated optimization of visual cortical maps (I) Symmetry-based analysis
In the primary visual cortex of primates and carnivores, functional
architecture can be characterized by maps of various stimulus features such as
orientation preference (OP), ocular dominance (OD), and spatial frequency. It
is a long-standing question in theoretical neuroscience whether the observed
maps should be interpreted as optima of a specific energy functional that
summarizes the design principles of cortical functional architecture. A
rigorous evaluation of this optimization hypothesis is particularly demanded by
recent evidence that the functional architecture of OP columns precisely
follows species invariant quantitative laws. Because it would be desirable to
infer the form of such an optimization principle from the biological data, the
optimization approach to explain cortical functional architecture raises the
following questions: i) What are the genuine ground states of candidate energy
functionals and how can they be calculated with precision and rigor? ii) How do
differences in candidate optimization principles impact on the predicted map
structure and conversely what can be learned about an hypothetical underlying
optimization principle from observations on map structure? iii) Is there a way
to analyze the coordinated organization of cortical maps predicted by
optimization principles in general? To answer these questions we developed a
general dynamical systems approach to the combined optimization of visual
cortical maps of OP and another scalar feature such as OD or spatial frequency
preference.Comment: 90 pages, 16 figure
Recommended from our members
Five Questions about Viral Trafficking in Neurons
One of the most exciting areas in biology is the nervous system and how it works. Viral infections of the nervous system have provided exceptional insight at many levels, from pathogenesis to basic biology. The nervous system has evolved rather complicated barriers that facilitate access to nutrients and contact with the outside world, but block entry of pathogens and toxins [1]. However, when these barriers are reduced for any number of reasons, nervous system infections are possible. When they occur, they can be devastating and, even with good antiviral drugs, difficult to manage. Viral infections can enter the brain via the blood (e.g., HIV, various encephalitis viruses) or by spread inside neurons from the body surface (e.g., rabies and alpha herpes viruses) [2,3]. In vertebrates, the nervous system comprises a peripheral collection of neurons (the peripheral nervous system, PNS) and a central set found in the brain and spinal cord (the central nervous system, CNS). While neurons are central players in neurobiology, it is important to realize that the majority of cells that comprise the nervous system are highly specialized, nonneuronal cells (e.g., different types of glial cells) [4]. Cells of the immune system also engage with and signal to the PNS to affect changes in the CNS [5]. We will focus on neurons, despite the other cellular complexity, because neurons provide direct avenues for viral infection. Recognition that viral infection follows nerve pathways enabled the development of viruses for neuronal circuit tracing [6–8]
- …