15 research outputs found

    Dealing with missing predictor values when applying clinical prediction models.

    No full text
    Contains fulltext : 79972.pdf (publisher's version ) (Closed access)BACKGROUND: Prediction models combine patient characteristics and test results to predict the presence of a disease or the occurrence of an event in the future. In the event that test results (predictor) are unavailable, a strategy is needed to help users applying a prediction model to deal with such missing values. We evaluated 6 strategies to deal with missing values. METHODS: We developed and validated (in 1295 and 532 primary care patients, respectively) a prediction model to predict the risk of deep venous thrombosis. In an application set (259 patients), we mimicked 3 situations in which (1) an important predictor (D-dimer test), (2) a weaker predictor (difference in calf circumference), and (3) both predictors simultaneously were missing. The 6 strategies to deal with missing values were (1) ignoring the predictor, (2) overall mean imputation, (3) subgroup mean imputation, (4) multiple imputation, (5) applying a submodel including only the observed predictors as derived from the development set, or (6) the "one-step-sweep" method. We compared the model's discriminative ability (expressed by the ROC area) with the true ROC area (no missing values) and the model's estimated calibration slope and intercept with the ideal values of 1 and 0, respectively. RESULTS: Ignoring the predictor led to the worst and multiple imputation to the best discrimination. Multiple imputation led to calibration intercepts closest to the true value. The effect of the strategies on the slope differed between the 3 scenarios. CONCLUSIONS: Multiple imputation is preferred if a predictor value is missing

    Missing covariate data in medical research: to impute is better than to ignore.

    No full text
    Contains fulltext : 88952.pdf (publisher's version ) (Closed access)OBJECTIVE: We compared popular methods to handle missing data with multiple imputation (a more sophisticated method that preserves data). STUDY DESIGN AND SETTING: We used data of 804 patients with a suspicion of deep venous thrombosis (DVT). We studied three covariates to predict the presence of DVT: d-dimer level, difference in calf circumference, and history of leg trauma. We introduced missing values (missing at random) ranging from 10% to 90%. The risk of DVT was modeled with logistic regression for the three methods, that is, complete case analysis, exclusion of d-dimer level from the model, and multiple imputation. RESULTS: Multiple imputation showed less bias in the regression coefficients of the three variables and more accurate coverage of the corresponding 90% confidence intervals than complete case analysis and dropping d-dimer level from the analysis. Multiple imputation showed unbiased estimates of the area under the receiver operating characteristic curve (0.88) compared with complete case analysis (0.77) and when the variable with missing values was dropped (0.65). CONCLUSION: As this study shows that simple methods to deal with missing data can lead to seriously misleading results, we advise to consider multiple imputation. The purpose of multiple imputation is not to create data, but to prevent the exclusion of observed data.1 juli 201

    The Odds Ratio is "portable" across baseline risk but not the Relative Risk: Time to do away with the log link in binomial regression

    No full text
    Objectives: In a recent paper we suggest that the relative risk (RR) be replaced with the odds ratio (OR) as the effect measure of choice in clinical epidemiology. In response, Chu, and colleagues raise several points that argue for the status quo. In this paper, we respond to their response. Study designs and Settings: We use the same examples given by Chu and colleagues to recompute estimates of effect and demonstrate the problem with the RR. Results: We reaffirm the following findings: a) the OR and RR measure different things and their numerical difference is only important if misinterpreted b) this potential misinterpretation is a trivial issue compared to the lack of portability of the RR c) the same examples reaffirm non-portability of the RR and demonstrate how misleading the results might be in contrast to the OR, which is independent of the baseline risk d) the concept of non-collapsibility for the OR should be expected in the presence of a non-confounding risk factor, and is not a bias e) the log link in regression models that generate RRs as well as the use of RRs in meta-analysis is shown to be problematic using the same examples. Conclusion: The OR should replace the RR in clinical research and meta-analyses though there should be conversion of the end product into ratios or differences of risk, solely, for interpretation. To this end we provide a Stata module (logittorisk) for this purpose.This work was made possible by Program Grant #NPRP10-0129-170274 from the Qatar National Research Fund (a member of Qatar Foundation). The findings herein reflect the work, and are solely the responsibility of the authors. All authors had full access to all the data in the study and the corresponding author had final responsibility for the decision to submit for publication. LFK is supported by an Australian National Health and Medical Research Council Fellowship ( APP1158469 ).Scopu
    corecore