3,342 research outputs found

    Imputation of continuous variables missing at random using the method of simulated scores

    Get PDF
    For multivariate datasets with missing values, we present a procedure of statistical inference and state its "optimal" properties. Two main assumptions are needed: (1) data are missing at random (MAR); (2) the data generating process is a multivariate normal linear regression. Disentangling the problem of convergence of the iterative estimation/imputation procedure, we show that the estimator is a "method of simulated scores" (a particular case of McFadden's "method of simulated moments"); thus the estimator is equivalent to maximum likelihood if the number of replications is conveniently large, and the whole procedure can be considered an optimal parametric technique for imputation of missing data

    Combining estimates of interest in prognostic modelling studies after multiple imputation: current practice and guidelines

    Get PDF
    Background: Multiple imputation (MI) provides an effective approach to handle missing covariate data within prognostic modelling studies, as it can properly account for the missing data uncertainty. The multiply imputed datasets are each analysed using standard prognostic modelling techniques to obtain the estimates of interest. The estimates from each imputed dataset are then combined into one overall estimate and variance, incorporating both the within and between imputation variability. Rubin's rules for combining these multiply imputed estimates are based on asymptotic theory. The resulting combined estimates may be more accurate if the posterior distribution of the population parameter of interest is better approximated by the normal distribution. However, the normality assumption may not be appropriate for all the parameters of interest when analysing prognostic modelling studies, such as predicted survival probabilities and model performance measures. Methods: Guidelines for combining the estimates of interest when analysing prognostic modelling studies are provided. A literature review is performed to identify current practice for combining such estimates in prognostic modelling studies. Results: Methods for combining all reported estimates after MI were not well reported in the current literature. Rubin's rules without applying any transformations were the standard approach used, when any method was stated. Conclusion: The proposed simple guidelines for combining estimates after MI may lead to a wider and more appropriate use of MI in future prognostic modelling studies

    Comparison of methods for handling missing data on immunohistochemical markers in survival analysis of breast cancer

    Get PDF
    Background:Tissue micro-arrays (TMAs) are increasingly used to generate data of the molecular phenotype of tumours in clinical epidemiology studies, such as studies of disease prognosis. However, TMA data are particularly prone to missingness. A variety of methods to deal with missing data are available. However, the validity of the various approaches is dependent on the structure of the missing data and there are few empirical studies dealing with missing data from molecular pathology. The purpose of this study was to investigate the results of four commonly used approaches to handling missing data from a large, multi-centre study of the molecular pathological determinants of prognosis in breast cancer.Patients and Methods:We pooled data from over 11 000 cases of invasive breast cancer from five studies that collected information on seven prognostic indicators together with survival time data. We compared the results of a multi-variate Cox regression using four approaches to handling missing data-complete case analysis (CCA), mean substitution (MS) and multiple imputation without inclusion of the outcome (MI) and multiple imputation with inclusion of the outcome (MI). We also performed an analysis in which missing data were simulated under different assumptions and the results of the four methods were compared.Results:Over half the cases had missing data on at least one of the seven variables and 11 percent had missing data on 4 or more. The multi-variate hazard ratio estimates based on multiple imputation models were very similar to those derived after using MS, with similar standard errors. Hazard ratio estimates based on the CCA were only slightly different, but the estimates were less precise as the standard errors were large. However, in data simulated to be missing completely at random (MCAR) or missing at random (MAR), estimates for MI were least biased and most accurate, whereas estimates for CCA were most biased and least accurate.Conclusion:In this study, empirical results from analyses using CCA, MS, MI and MI were similar, although results from CCA were less precise. The results from simulations suggest that in general MI is likely to be the best. Given the ease of implementing MI in standard statistical software, the results of MI and CCA should be compared in any multi-variate analysis where missing data are a problem. © 2011 Cancer Research UK. All rights reserved

    Comparison of techniques for handling missing covariate data within prognostic modelling studies: a simulation study

    Get PDF
    Background: There is no consensus on the most appropriate approach to handle missing covariate data within prognostic modelling studies. Therefore a simulation study was performed to assess the effects of different missing data techniques on the performance of a prognostic model. Methods: Datasets were generated to resemble the skewed distributions seen in a motivating breast cancer example. Multivariate missing data were imposed on four covariates using four different mechanisms; missing completely at random (MCAR), missing at random (MAR), missing not at random (MNAR) and a combination of all three mechanisms. Five amounts of incomplete cases from 5% to 75% were considered. Complete case analysis (CC), single imputation (SI) and five multiple imputation (MI) techniques available within the R statistical software were investigated: a) data augmentation (DA) approach assuming a multivariate normal distribution, b) DA assuming a general location model, c) regression switching imputation, d) regression switching with predictive mean matching (MICE-PMM) and e) flexible additive imputation models. A Cox proportional hazards model was fitted and appropriate estimates for the regression coefficients and model performance measures were obtained. Results: Performing a CC analysis produced unbiased regression estimates, but inflated standard errors, which affected the significance of the covariates in the model with 25% or more missingness. Using SI, underestimated the variability; resulting in poor coverage even with 10% missingness. Of the MI approaches, applying MICE-PMM produced, in general, the least biased estimates and better coverage for the incomplete covariates and better model performance for all mechanisms. However, this MI approach still produced biased regression coefficient estimates for the incomplete skewed continuous covariates when 50% or more cases had missing data imposed with a MCAR, MAR or combined mechanism. When the missingness depended on the incomplete covariates, i.e. MNAR, estimates were biased with more than 10% incomplete cases for all MI approaches. Conclusion: The results from this simulation study suggest that performing MICE-PMM may be the preferred MI approach provided that less than 50% of the cases have missing data and the missing data are not MNAR

    Resource use data by patient report or hospital records: Do they agree?

    Get PDF
    Background: Economic evaluations alongside clinical trials are becoming increasingly common. Cost data are often collected through the use of postal questionnaires; however, the accuracy of this method is uncertain. We compared postal questionnaires with hospital records for collecting data on physiotherapy service use. Methods: As part of a randomised trial of orthopaedic medicine compared with orthopaedic surgery we collected physiotherapy use data on a group of patients from retrospective postal questionnaires and from hospital records. Results: 315 patients were referred for physiotherapy. Hospital data on attendances was available for 30% (n = 96), compared with 48% (n = 150) of patients completing questionnaire data (95% Cl for difference = 10% to 24%); 19% (n = 59) had data available from both sources. The two methods produced an intraclass correlation coefficient of 0.54 (95% Cl 0.31 to 0.70). However, the two methods produced significantly different estimates of resource use with patient self report recalling a mean of 1.3 extra visits (95% Cl 0.4 to 2.2) compared with hospital records. Conclusions: Using questionnaires in this study produced data on a greater number of patients compared with examination of hospital records. However, the two data sources did differ in the quantity of physiotherapy used and this should be taken into account in any analysi

    Small Oscillatory Accelerations, Independent of Matrix Deformations, Increase Osteoblast Activity and Enhance Bone Morphology

    Get PDF
    A range of tissues have the capacity to adapt to mechanical challenges, an attribute presumed to be regulated through deformation of the cell and/or surrounding matrix. In contrast, it is shown here that extremely small oscillatory accelerations, applied as unconstrained motion and inducing negligible deformation, serve as an anabolic stimulus to osteoblasts in vivo. Habitual background loading was removed from the tibiae of 18 female adult mice by hindlimb-unloading. For 20 min/d, 5 d/wk, the left tibia of each mouse was subjected to oscillatory 0.6 g accelerations at 45 Hz while the right tibia served as control. Sham-loaded (n = 9) and normal age-matched control (n = 18) mice provided additional comparisons. Oscillatory accelerations, applied in the absence of weight bearing, resulted in 70% greater bone formation rates in the trabeculae of the metaphysis, but similar levels of bone resorption, when compared to contralateral controls. Quantity and quality of trabecular bone also improved as a result of the acceleration stimulus, as evidenced by a significantly greater bone volume fraction (17%) and connectivity density (33%), and significantly smaller trabecular spacing (−6%) and structural model index (−11%). These in vivo data indicate that mechanosensory elements of resident bone cell populations can perceive and respond to acceleratory signals, and point to an efficient means of introducing intense physical signals into a biologic system without putting the matrix at risk of overloading. In retrospect, acceleration, as opposed to direct mechanical distortion, represents a more generic and safe, and perhaps more fundamental means of transducing physical challenges to the cells and tissues of an organism

    Multiple imputation for estimating hazard ratios and predictive abilities in case-cohort surveys

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The weighted estimators generally used for analyzing case-cohort studies are not fully efficient and naive estimates of the predictive ability of a model from case-cohort data depend on the subcohort size. However, case-cohort studies represent a special type of incomplete data, and methods for analyzing incomplete data should be appropriate, in particular multiple imputation (MI).</p> <p>Methods</p> <p>We performed simulations to validate the MI approach for estimating hazard ratios and the predictive ability of a model or of an additional variable in case-cohort surveys. As an illustration, we analyzed a case-cohort survey from the Three-City study to estimate the predictive ability of D-dimer plasma concentration on coronary heart disease (CHD) and on vascular dementia (VaD) risks.</p> <p>Results</p> <p>When the imputation model of the phase-2 variable was correctly specified, MI estimates of hazard ratios and predictive abilities were similar to those obtained with full data. When the imputation model was misspecified, MI could provide biased estimates of hazard ratios and predictive abilities. In the Three-City case-cohort study, elevated D-dimer levels increased the risk of VaD (hazard ratio for two consecutive tertiles = 1.69, 95%CI: 1.63-1.74). However, D-dimer levels did not improve the predictive ability of the model.</p> <p>Conclusions</p> <p>MI is a simple approach for analyzing case-cohort data and provides an easy evaluation of the predictive ability of a model or of an additional variable.</p
    corecore