3,342 research outputs found
Imputation of continuous variables missing at random using the method of simulated scores
For multivariate datasets with missing values, we present a procedure of statistical inference and state its "optimal" properties. Two main assumptions are needed: (1) data are missing at random (MAR); (2) the data generating process is a multivariate normal linear regression. Disentangling the problem of convergence of the iterative estimation/imputation procedure, we show that the estimator is a "method of simulated scores" (a particular case of McFadden's "method of simulated moments"); thus the estimator is equivalent to maximum likelihood if the number of replications is conveniently large, and the whole procedure can be considered an optimal parametric technique for imputation of missing data
Combining estimates of interest in prognostic modelling studies after multiple imputation: current practice and guidelines
Background: Multiple imputation (MI) provides an effective approach to handle missing covariate
data within prognostic modelling studies, as it can properly account for the missing data
uncertainty. The multiply imputed datasets are each analysed using standard prognostic modelling
techniques to obtain the estimates of interest. The estimates from each imputed dataset are then
combined into one overall estimate and variance, incorporating both the within and between
imputation variability. Rubin's rules for combining these multiply imputed estimates are based on
asymptotic theory. The resulting combined estimates may be more accurate if the posterior
distribution of the population parameter of interest is better approximated by the normal
distribution. However, the normality assumption may not be appropriate for all the parameters of
interest when analysing prognostic modelling studies, such as predicted survival probabilities and
model performance measures.
Methods: Guidelines for combining the estimates of interest when analysing prognostic modelling
studies are provided. A literature review is performed to identify current practice for combining
such estimates in prognostic modelling studies.
Results: Methods for combining all reported estimates after MI were not well reported in the
current literature. Rubin's rules without applying any transformations were the standard approach
used, when any method was stated.
Conclusion: The proposed simple guidelines for combining estimates after MI may lead to a wider
and more appropriate use of MI in future prognostic modelling studies
Comparison of methods for handling missing data on immunohistochemical markers in survival analysis of breast cancer
Background:Tissue micro-arrays (TMAs) are increasingly used to generate data of the molecular phenotype of tumours in clinical epidemiology studies, such as studies of disease prognosis. However, TMA data are particularly prone to missingness. A variety of methods to deal with missing data are available. However, the validity of the various approaches is dependent on the structure of the missing data and there are few empirical studies dealing with missing data from molecular pathology. The purpose of this study was to investigate the results of four commonly used approaches to handling missing data from a large, multi-centre study of the molecular pathological determinants of prognosis in breast cancer.Patients and Methods:We pooled data from over 11 000 cases of invasive breast cancer from five studies that collected information on seven prognostic indicators together with survival time data. We compared the results of a multi-variate Cox regression using four approaches to handling missing data-complete case analysis (CCA), mean substitution (MS) and multiple imputation without inclusion of the outcome (MI) and multiple imputation with inclusion of the outcome (MI). We also performed an analysis in which missing data were simulated under different assumptions and the results of the four methods were compared.Results:Over half the cases had missing data on at least one of the seven variables and 11 percent had missing data on 4 or more. The multi-variate hazard ratio estimates based on multiple imputation models were very similar to those derived after using MS, with similar standard errors. Hazard ratio estimates based on the CCA were only slightly different, but the estimates were less precise as the standard errors were large. However, in data simulated to be missing completely at random (MCAR) or missing at random (MAR), estimates for MI were least biased and most accurate, whereas estimates for CCA were most biased and least accurate.Conclusion:In this study, empirical results from analyses using CCA, MS, MI and MI were similar, although results from CCA were less precise. The results from simulations suggest that in general MI is likely to be the best. Given the ease of implementing MI in standard statistical software, the results of MI and CCA should be compared in any multi-variate analysis where missing data are a problem. © 2011 Cancer Research UK. All rights reserved
Comparison of techniques for handling missing covariate data within prognostic modelling studies: a simulation study
Background: There is no consensus on the most appropriate approach to handle missing covariate data within prognostic modelling studies. Therefore a simulation study was performed to assess the effects of different missing data techniques on the performance of a prognostic model.
Methods: Datasets were generated to resemble the skewed distributions seen in a motivating breast cancer example. Multivariate missing data were imposed on four covariates using four different mechanisms; missing completely at random (MCAR), missing at random (MAR), missing not at random (MNAR) and a combination of all three mechanisms. Five amounts of incomplete cases from 5% to 75% were considered. Complete case analysis (CC), single imputation (SI) and five multiple imputation (MI) techniques available within the R statistical software were investigated: a) data augmentation (DA) approach assuming a multivariate normal distribution, b) DA assuming a general location model, c) regression switching imputation, d) regression switching with predictive mean matching (MICE-PMM) and e) flexible additive imputation models. A Cox proportional hazards model was fitted and appropriate estimates for the regression coefficients and model performance measures were obtained.
Results: Performing a CC analysis produced unbiased regression estimates, but inflated standard errors, which affected the significance of the covariates in the model with 25% or more missingness. Using SI, underestimated the variability; resulting in poor coverage even with 10% missingness. Of the MI approaches, applying MICE-PMM produced, in general, the least biased estimates and better coverage for the incomplete covariates and better model performance for all mechanisms. However, this MI approach still produced biased regression coefficient estimates for the incomplete skewed continuous covariates when 50% or more cases had missing data imposed with a MCAR, MAR or combined mechanism. When the missingness depended on the incomplete covariates, i.e. MNAR, estimates were biased with more than 10% incomplete cases for all MI approaches.
Conclusion: The results from this simulation study suggest that performing MICE-PMM may be the preferred MI approach provided that less than 50% of the cases have missing data and the missing data are not MNAR
Resource use data by patient report or hospital records: Do they agree?
Background: Economic evaluations alongside clinical trials are becoming increasingly common.
Cost data are often collected through the use of postal questionnaires; however, the accuracy of
this method is uncertain. We compared postal questionnaires with hospital records for collecting
data on physiotherapy service use.
Methods: As part of a randomised trial of orthopaedic medicine compared with orthopaedic
surgery we collected physiotherapy use data on a group of patients from retrospective postal
questionnaires and from hospital records.
Results: 315 patients were referred for physiotherapy. Hospital data on attendances was available
for 30% (n = 96), compared with 48% (n = 150) of patients completing questionnaire data (95% Cl
for difference = 10% to 24%); 19% (n = 59) had data available from both sources. The two methods
produced an intraclass correlation coefficient of 0.54 (95% Cl 0.31 to 0.70). However, the two
methods produced significantly different estimates of resource use with patient self report recalling
a mean of 1.3 extra visits (95% Cl 0.4 to 2.2) compared with hospital records.
Conclusions: Using questionnaires in this study produced data on a greater number of patients
compared with examination of hospital records. However, the two data sources did differ in the
quantity of physiotherapy used and this should be taken into account in any analysi
Small Oscillatory Accelerations, Independent of Matrix Deformations, Increase Osteoblast Activity and Enhance Bone Morphology
A range of tissues have the capacity to adapt to mechanical challenges, an attribute presumed to be regulated through deformation of the cell and/or surrounding matrix. In contrast, it is shown here that extremely small oscillatory accelerations, applied as unconstrained motion and inducing negligible deformation, serve as an anabolic stimulus to osteoblasts in vivo. Habitual background loading was removed from the tibiae of 18 female adult mice by hindlimb-unloading. For 20 min/d, 5 d/wk, the left tibia of each mouse was subjected to oscillatory 0.6 g accelerations at 45 Hz while the right tibia served as control. Sham-loaded (n = 9) and normal age-matched control (n = 18) mice provided additional comparisons. Oscillatory accelerations, applied in the absence of weight bearing, resulted in 70% greater bone formation rates in the trabeculae of the metaphysis, but similar levels of bone resorption, when compared to contralateral controls. Quantity and quality of trabecular bone also improved as a result of the acceleration stimulus, as evidenced by a significantly greater bone volume fraction (17%) and connectivity density (33%), and significantly smaller trabecular spacing (−6%) and structural model index (−11%). These in vivo data indicate that mechanosensory elements of resident bone cell populations can perceive and respond to acceleratory signals, and point to an efficient means of introducing intense physical signals into a biologic system without putting the matrix at risk of overloading. In retrospect, acceleration, as opposed to direct mechanical distortion, represents a more generic and safe, and perhaps more fundamental means of transducing physical challenges to the cells and tissues of an organism
Multiple imputation for estimating hazard ratios and predictive abilities in case-cohort surveys
<p>Abstract</p> <p>Background</p> <p>The weighted estimators generally used for analyzing case-cohort studies are not fully efficient and naive estimates of the predictive ability of a model from case-cohort data depend on the subcohort size. However, case-cohort studies represent a special type of incomplete data, and methods for analyzing incomplete data should be appropriate, in particular multiple imputation (MI).</p> <p>Methods</p> <p>We performed simulations to validate the MI approach for estimating hazard ratios and the predictive ability of a model or of an additional variable in case-cohort surveys. As an illustration, we analyzed a case-cohort survey from the Three-City study to estimate the predictive ability of D-dimer plasma concentration on coronary heart disease (CHD) and on vascular dementia (VaD) risks.</p> <p>Results</p> <p>When the imputation model of the phase-2 variable was correctly specified, MI estimates of hazard ratios and predictive abilities were similar to those obtained with full data. When the imputation model was misspecified, MI could provide biased estimates of hazard ratios and predictive abilities. In the Three-City case-cohort study, elevated D-dimer levels increased the risk of VaD (hazard ratio for two consecutive tertiles = 1.69, 95%CI: 1.63-1.74). However, D-dimer levels did not improve the predictive ability of the model.</p> <p>Conclusions</p> <p>MI is a simple approach for analyzing case-cohort data and provides an easy evaluation of the predictive ability of a model or of an additional variable.</p
- …