231 research outputs found
Heterogeneity in Prediction Research: methods and applications
William Osler noted in 1893 that “If it were not for the great variability between individuals, medicine might as well be a science, not an art”.
In contrast, this thesis is based on the scientific paradigm that prediction models have the potential to guide medical decisions by exploiting identifiable heterogeneity across individual patients.
Prediction research focuses on the development of well performing prediction models and on the assessment of their generalizability and applicability. Several methods to measure prediction model performance across clusters of patients are proposed in PART I of this thesis. PART II contains novel methods for development and validation of models that incorporate heterogeneity of treatment effect across patients. In PART III, methods for development and validation of prediction models are applied to several case studies in cardiovascular medicine, oncology, and public health
Pitfalls of single-study external validation illustrated with a model predicting functional outcome after aneurysmal subarachnoid hemorrhage
Background: Prediction models are often externally validated with data from a single study or cohort. However, the interpretation of performance estimates obtained with single-study external validation is not as straightforward as assumed. We aimed to illustrate this by conducting a large number of external validations of a prediction model for functional outcome in subarachnoid hemorrhage (SAH) patients.Methods: We used data from the Subarachnoid Hemorrhage International Trialists (SAHIT) data repository (n = 11,931, 14 studies) to refit the SAHIT model for predicting a dichotomous functional outcome (favorable versus unfavorable), with the (extended) Glasgow Outcome Scale or modified Rankin Scale score, at a minimum of three months after discharge. We performed leave-one-cluster-out cross-validation to mimic the process of multiple single-study external validations. Each study represented one cluster. In each of these validations, we assessed discrimination with Harrell’s c-statistic and calibration with calibration plots, the intercepts, and the slopes. We used random effects meta-analysis to obtain the (reference) mean performance estimates and between-study heterogeneity (I2-statistic). The influence of case-mix variation on discriminative performance was assessed with the model-based c-statistic and we fitted a “membership model” to obtain a gross estimate of transportability. Results: Across 14 single-study external validations, model performance was highly variable. The mean c-statistic was 0.74 (95%CI 0.70–0.78, range 0.52–0.84, I2 = 0.92), the mean intercept was -0.06 (95%CI -0.37–0.24, range -1.40–0.75, I2 = 0.97), and the mean slope was 0.96 (95%CI 0.78–1.13, range 0.53–1.31, I2 = 0.90). The decrease in discriminative performance was attributable to case-mix variation, between-study heterogeneity, or a combination of both. Incidentally, we observed poor generalizability or transportability of the model. Conclusions: We demonstrate two potential pitfalls in the interpretation of model performance with single-study external validation. With single-study external validation. (1) model performance is highly variable and depends on the choice of validation data and (2) no insight is provided into generalizability or transportability of the model that is needed to guide local implementation. As such, a single single-study external validation can easily be misinterpreted and lead to a false appreciation of the clinical prediction model. Cross-validation is better equipped to address these pitfalls.</p
Weighted metrics are required when evaluating the performance of prediction models in nested case–control studies
Background: Nested case–control (NCC) designs are efficient for developing and validating prediction models that use expensive or difficult-to-obtain predictors, especially when the outcome is rare. Previous research has focused on how to develop prediction models in this sampling design, but little attention has been given to model validation in this context. We therefore aimed to systematically characterize the key elements for the correct evaluation of the performance of prediction models in NCC data. Methods: We proposed how to correctly evaluate prediction models in NCC data, by adjusting performance metrics with sampling weights to account for the NCC sampling. We included in this study the C-index, threshold-based metrics, Observed-to-expected events ratio (O/E ratio), calibration slope, and decision curve analysis. We illustrated the proposed metrics with a validation of the Breast and Ovarian Analysis of Disease Incidence and Carrier Estimation Algorithm (BOADICEA version 5) in data from the population-based Rotterdam study. We compared the metrics obtained in the full cohort with those obtained in NCC datasets sampled from the Rotterdam study, with and without a matched design. Results: Performance metrics without weight adjustment were biased: the unweighted C-index in NCC datasets was 0.61 (0.58–0.63) for the unmatched design, while the C-index in the full cohort and the weighted C-index in the NCC datasets were similar: 0.65 (0.62–0.69) and 0.65 (0.61–0.69), respectively. The unweighted O/E ratio was 18.38 (17.67–19.06) in the NCC datasets, while it was 1.69 (1.42–1.93) in the full cohort and its weighted version in the NCC datasets was 1.68 (1.53–1.84). Similarly, weighted adjustments of threshold-based metrics and net benefit for decision curves were unbiased estimates of the corresponding metrics in the full cohort, while the corresponding unweighted metrics were biased. In the matched design, the bias of the unweighted metrics was larger, but it could also be compensated by the weight adjustment. Conclusions: Nested case–control studies are an efficient solution for evaluating the performance of prediction models that use expensive or difficult-to-obtain biomarkers, especially when the outcome is rare, but the performance metrics need to be adjusted to the sampling procedure.</p
Weighted metrics are required when evaluating the performance of prediction models in nested case–control studies
Background: Nested case–control (NCC) designs are efficient for developing and validating prediction models that use expensive or difficult-to-obtain predictors, especially when the outcome is rare. Previous research has focused on how to develop prediction models in this sampling design, but little attention has been given to model validation in this context. We therefore aimed to systematically characterize the key elements for the correct evaluation of the performance of prediction models in NCC data. Methods: We proposed how to correctly evaluate prediction models in NCC data, by adjusting performance metrics with sampling weights to account for the NCC sampling. We included in this study the C-index, threshold-based metrics, Observed-to-expected events ratio (O/E ratio), calibration slope, and decision curve analysis. We illustrated the proposed metrics with a validation of the Breast and Ovarian Analysis of Disease Incidence and Carrier Estimation Algorithm (BOADICEA version 5) in data from the population-based Rotterdam study. We compared the metrics obtained in the full cohort with those obtained in NCC datasets sampled from the Rotterdam study, with and without a matched design. Results: Performance metrics without weight adjustment were biased: the unweighted C-index in NCC datasets was 0.61 (0.58–0.63) for the unmatched design, while the C-index in the full cohort and the weighted C-index in the NCC datasets were similar: 0.65 (0.62–0.69) and 0.65 (0.61–0.69), respectively. The unweighted O/E ratio was 18.38 (17.67–19.06) in the NCC datasets, while it was 1.69 (1.42–1.93) in the full cohort and its weighted version in the NCC datasets was 1.68 (1.53–1.84). Similarly, weighted adjustments of threshold-based metrics and net benefit for decision curves were unbiased estimates of the corresponding metrics in the full cohort, while the corresponding unweighted metrics were biased. In the matched design, the bias of the unweighted metrics was larger, but it could also be compensated by the weight adjustment. Conclusions: Nested case–control studies are an efficient solution for evaluating the performance of prediction models that use expensive or difficult-to-obtain biomarkers, especially when the outcome is rare, but the performance metrics need to be adjusted to the sampling procedure.</p
Geographic and temporal validity of prediction models: different approaches were useful to examine model performance
AbstractObjectiveValidation of clinical prediction models traditionally refers to the assessment of model performance in new patients. We studied different approaches to geographic and temporal validation in the setting of multicenter data from two time periods.Study Design and SettingWe illustrated different analytic methods for validation using a sample of 14,857 patients hospitalized with heart failure at 90 hospitals in two distinct time periods. Bootstrap resampling was used to assess internal validity. Meta-analytic methods were used to assess geographic transportability. Each hospital was used once as a validation sample, with the remaining hospitals used for model derivation. Hospital-specific estimates of discrimination (c-statistic) and calibration (calibration intercepts and slopes) were pooled using random-effects meta-analysis methods. I2 statistics and prediction interval width quantified geographic transportability. Temporal transportability was assessed using patients from the earlier period for model derivation and patients from the later period for model validation.ResultsEstimates of reproducibility, pooled hospital-specific performance, and temporal transportability were on average very similar, with c-statistics of 0.75. Between-hospital variation was moderate according to I2 statistics and prediction intervals for c-statistics.ConclusionThis study illustrates how performance of prediction models can be assessed in settings with multicenter data at different time periods
Personalized decision‑making for aneurysm treatment of aneurysmal subarachnoid hemorrhage:development and validation of a clinical prediction tool
Background: In patients with aneurysmal subarachnoid hemorrhage suitable for endovascular coiling and neurosurgical clip-reconstruction, the aneurysm treatment decision-making process could be improved by considering heterogeneity of treatment effect and durability of treatment. We aimed to develop and validate a tool to predict individualized treatment benefit of endovascular coiling compared to neurosurgical clip-reconstruction. Methods: We used randomized data (International Subarachnoid Aneurysm Trial, n = 2143) to develop models to predict 2-month functional outcome and to predict time-to-rebleed-or-retreatment. We modeled for heterogeneity of treatment effect by adding interaction terms of treatment with prespecified predictors and with baseline risk of the outcome. We predicted outcome with both treatments and calculated absolute treatment benefit. We described the patient characteristics of patients with ≥ 5% point difference in the predicted probability of favorable functional outcome (modified Rankin Score 0–2) and of no rebleed or retreatment within 10 years. Model performance was expressed with the c-statistic and calibration plots. We performed bootstrapping and leave-one-cluster-out cross-validation and pooled cluster-specific c-statistics with random effects meta-analysis. Results: The pooled c-statistics were 0.72 (95% CI: 0.69–0.75) for the prediction of 2-month favorable functional outcome and 0.67 (95% CI: 0.63–0.71) for prediction of no rebleed or retreatment within 10 years. We found no significant interaction between predictors and treatment. The average predicted benefit in favorable functional outcome was 6% (95% CI: 3–10%) in favor of coiling, but 11% (95% CI: 9–13%) for no rebleed or retreatment in favor of clip-reconstruction. 134 patients (6%), young and in favorable clinical condition, had negligible functional outcome benefit of coiling but had a ≥ 5% point benefit of clip-reconstruction in terms of durability of treatment. Conclusions: We show that young patients in favorable clinical condition and without extensive vasospasm have a negligible benefit in functional outcome of endovascular coiling – compared to neurosurgical clip-reconstruction – while at the same time having a substantially lower probability of retreatment or rebleeding from neurosurgical clip-reconstruction – compared to endovascular coiling. The SHARP prediction tool (https://sharpmodels.shinyapps.io/sharpmodels/) could support and incentivize a multidisciplinary discussion about aneurysm treatment decision-making by providing individualized treatment benefit estimates.</p
Personalized decision‑making for aneurysm treatment of aneurysmal subarachnoid hemorrhage:development and validation of a clinical prediction tool
Background: In patients with aneurysmal subarachnoid hemorrhage suitable for endovascular coiling and neurosurgical clip-reconstruction, the aneurysm treatment decision-making process could be improved by considering heterogeneity of treatment effect and durability of treatment. We aimed to develop and validate a tool to predict individualized treatment benefit of endovascular coiling compared to neurosurgical clip-reconstruction. Methods: We used randomized data (International Subarachnoid Aneurysm Trial, n = 2143) to develop models to predict 2-month functional outcome and to predict time-to-rebleed-or-retreatment. We modeled for heterogeneity of treatment effect by adding interaction terms of treatment with prespecified predictors and with baseline risk of the outcome. We predicted outcome with both treatments and calculated absolute treatment benefit. We described the patient characteristics of patients with ≥ 5% point difference in the predicted probability of favorable functional outcome (modified Rankin Score 0–2) and of no rebleed or retreatment within 10 years. Model performance was expressed with the c-statistic and calibration plots. We performed bootstrapping and leave-one-cluster-out cross-validation and pooled cluster-specific c-statistics with random effects meta-analysis. Results: The pooled c-statistics were 0.72 (95% CI: 0.69–0.75) for the prediction of 2-month favorable functional outcome and 0.67 (95% CI: 0.63–0.71) for prediction of no rebleed or retreatment within 10 years. We found no significant interaction between predictors and treatment. The average predicted benefit in favorable functional outcome was 6% (95% CI: 3–10%) in favor of coiling, but 11% (95% CI: 9–13%) for no rebleed or retreatment in favor of clip-reconstruction. 134 patients (6%), young and in favorable clinical condition, had negligible functional outcome benefit of coiling but had a ≥ 5% point benefit of clip-reconstruction in terms of durability of treatment. Conclusions: We show that young patients in favorable clinical condition and without extensive vasospasm have a negligible benefit in functional outcome of endovascular coiling – compared to neurosurgical clip-reconstruction – while at the same time having a substantially lower probability of retreatment or rebleeding from neurosurgical clip-reconstruction – compared to endovascular coiling. The SHARP prediction tool (https://sharpmodels.shinyapps.io/sharpmodels/) could support and incentivize a multidisciplinary discussion about aneurysm treatment decision-making by providing individualized treatment benefit estimates.</p
Development and External Validation of a Prediction Model for Patients with Varicose Veins Suitable for Isolated Ambulatory Phlebectomy
Objective: Isolated ambulatory phlebectomy is a potential treatment option for patients with an incompetent great saphenous vein (GSV) or anterior accessory saphenous vein and one or more incompetent tributaries. Being able to determine which patients will most likely benefit from isolated phlebectomy is important. This study aimed to identify predictors for avoidance of secondary axial ablation after isolated phlebectomy and to develop and externally validate a multivariable model for predicting this outcome. Methods: For model development, data from patients included in the SAPTAP trial were used. The investigated outcome was avoidance of ablation of the saphenous trunk one year after isolated ambulatory phlebectomy. Pre-defined candidate predictors were analysed with multivariable logistic regression. Predictors were selected using Akaike information criterion backward selection. Discriminative ability was assessed by the concordance index. Bootstrapping was used to correct regression coefficients, and the C index for overfitting. The model was externally validated using a population of 94 patients, with an incompetent GSV and one or more incompetent tributaries, who underwent isolated phlebectomy. Results: For model development, 225 patients were used, of whom 167 (74.2%) did not undergo additional ablation of the saphenous trunk one year after isolated phlebectomy. The final model consisted of three predictors for avoidance of axial ablation: tributary length (< 15 cm vs. > 30 cm: odds ratio [OR] 0.09, 95% confidence interval [CI] 0.02 – 0.40; 15 – 30 cm vs. > 30 cm: OR 0.18, 95% CI 0.09 – 0.38); saphenofemoral junction (SFJ) reflux (absent vs. present: OR 2.53, 95% CI 0.81 – 7.87); and diameter of the saphenous trunk (per millimetre change: OR 0.63, 95% CI 0.41 – 0.96). The discriminative ability of the model was moderate (0.72 at internal validation; 0.73 at external validation). Conclusion: A model was developed for predicting avoidance of secondary ablation of the saphenous trunk one year after isolated ambulatory phlebectomy, which can be helpful in daily practice to determine the suitable treatment strategy in patients with an incompetent saphenous trunk and one or more incompetent tributaries. Patients having a longer tributary, smaller diameter saphenous trunk, and absence of terminal valve reflux in the SFJ are more likely to benefit from isolated phlebectomy.</p
Integrated care in patients with atrial fibrillation- a predictive heterogeneous treatment effect analysis of the ALL-IN trial
Introduction:Integrated care is effective in reducing all-cause mortality in patients with atrial fibrillation (AF) in primary care, though time and resource intensive. The aim of the current study was to assess whether integrated care should be directed at all AF patients equally. Methods:The ALL-IN trial (n = 1,240 patients, median age 77 years) was a cluster-randomized trial in which primary care practices were randomized to provide integrated care or usual care to AF patients aged 65 years and older. Integrated care comprised of (i) anticoagulation monitoring, (ii) quarterly checkups and (iii) easy-access consultation with cardiologists. For the current analysis, cox proportional hazard analysis with all clinical variables from the CHA2DS2-VASc score was used to predict all-cause mortality in the ALL-IN trial. Subsequently, the hazard ratio and absolute risk reduction were plotted as a function of this predicted mortality risk to explore treatment heterogeneity. Results:Under usual care, after a median of 2 years follow-up the absolute risk of all-cause mortality in the highest-risk quarter was 31.0%, compared to 4.6% in the lowest-risk quarter. On the relative scale, there was no evidence of treatment heterogeneity (p for interaction = 0.90). However, there was substantial treatment heterogeneity on the absolute scale: risk reduction in the lowest risk- quarter of risk 3.3% (95% CI -0.4% - 7.0) compared to 12.0% (95% CI 2.7% - 22.0) in the highest risk quarter. Conclusion:While the relative degree of benefit from integrated AF care is similar in all patients, patients with a high all-cause mortality risk have a greater benefit on an absolute scale and should therefore be prioritized when implementing integrated care.</p
- …