Search CORE

Keele Research Repository

UNSWorks

Minimum sample size for external validation of a clinical prediction model with a binary outcome

Author: Altman DG
Altman DG
Archer L
Austin PC
Austin PC
Borenstein M
Collins GS
Collins GS
Collins GS
Copas JB
Copas JB
Debray TP
Debray TP
Debray TPA
Demidenko E
Dorfman R
Feng D
Harrell FE
Harrell FE
Localio A
Marsh TL
Moons KG
Moons KG
Newcombe RG
Novikov I
Pavlou M
Reilly BM
Riley RD
Riley RD
Riley RD
Snell KIE
Steyerberg EW
Steyerberg EW
Steyerberg EW
Steyerberg EW
Van Calster B
Van Calster B
Van Hoorde K
Van Houwelingen JC
Vergouwe Y
Vickers AJ
Vickers AJ
Vickers AJ
Wyatt J
Wynants L
Publication venue
Publication date: 24/05/2021
Field of study

In prediction model research, external validation is needed to examine an existing model's performance using data independent to that for model development. Current external validation studies often suffer from small sample sizes and consequently imprecise predictive performance estimates. To address this, we propose how to determine the minimum sample size needed for a new external validation study of a prediction model for a binary outcome. Our calculations aim to precisely estimate calibration (Observed/Expected and calibration slope), discrimination (C-statistic), and clinical utility (net benefit). For each measure, we propose closed-form and iterative solutions for calculating the minimum sample size required. These require specifying: (i) target SEs (confidence interval widths) for each estimate of interest, (ii) the anticipated outcome event proportion in the validation population, (iii) the prediction model's anticipated (mis)calibration and variance of linear predictor values in the validation population, and (iv) potential risk thresholds for clinical decision-making. The calculations can also be used to inform whether the sample size of an existing (already collected) dataset is adequate for external validation. We illustrate our proposal for external validation of a prediction model for mechanical heart valve failure with an expected outcome event proportion of 0.018. Calculations suggest at least 9835 participants (177 events) are required to precisely estimate the calibration and discrimination measures, with this number driven by the calibration slope criterion, which we anticipate will often be the case. Also, 6443 participants (116 events) are required to precisely estimate net benefit at a risk threshold of 8%. Software code is provided.</p

University of Birmingham Research Portal

Oxford University Research Archive

Utrecht University Repository

Extensions to decision curve analysis, a novel method for evaluating diagnostic tests, prediction models and molecular markers

Author: AJ Vickers
AJ Vickers
Andrew J Vickers
Angel M Cronin
B Efron
BH Bochner
CB Begg
DG Altman
Elena B Elkin
EW Steyerberg
EW Steyerberg
FE Harrell
JD Kalbfleisch
JM Satagopan
K Claxton
L Marchionni
Mithat Gonen
PJ Heagerty
S Sheridan
SG Pauker
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Decision curve analysis is a novel method for evaluating diagnostic tests, prediction models and molecular markers. It combines the mathematical simplicity of accuracy measures, such as sensitivity and specificity, with the clinical applicability of decision analytic approaches. Most critically, decision curve analysis can be applied directly to a data set, and does not require the sort of external data on costs, benefits and preferences typically required by traditional decision analytic techniques. Methods In this paper we present several extensions to decision curve analysis including correction for overfit, confidence intervals, application to censored data (including competing risk) and calculation of decision curves directly from predicted probabilities. All of these extensions are based on straightforward methods that have previously been described in the literature for application to analogous statistical techniques. Results Simulation studies showed that repeated 10-fold crossvalidation provided the best method for correcting a decision curve for overfit. The method for applying decision curves to censored data had little bias and coverage was excellent; for competing risk, decision curves were appropriately affected by the incidence of the competing risk and the association between the competing risk and the predictor of interest. Calculation of decision curves directly from predicted probabilities led to a smoothing of the decision curve. Conclusion Decision curve analysis can be easily extended to many of the applications common to performance measures for prediction models. Software to implement decision curve analysis is provided.</p

Springer - Publisher Connector

Keele Research Repository

Validation and Recalibration of Two Multivariable Prognostic Models for Survival and Independence in Acute Stroke

Author: A Arora
C Collin
C Counsell
C Counsell
C Roffe
C Weimar
Christine Roffe
DB Toll
DG Altman
E Steyerberg
E Teale
EA Teale
ER DeLong
EW Steyerberg
EW Steyerberg
EW Steyerberg
EW Steyerberg
G Teasdale
Gianpaolo Reboldi
H Hemingway
IR König
JC Wyatt
JM Reid
Julius Sim
KGM Moons
Lucy Teece
M Dennis
Martin S. Dennis
PW New
S Lewis
SA Ayis
T Brott
TPA Debray
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 26/05/2016
Field of study

Introduction Various prognostic models have been developed for acute stroke, including one based on age and five binary variables (‘six simple variables’ model; SSVMod) and one based on age plus scores on the National Institutes of Health Stroke Scale (NIHSSMod). The aims of this study were to externally validate and recalibrate these models, and to compare their predictive ability in relation to both survival and independence. Methods Data from a large clinical trial of oxygen therapy (n = 8003) were used to determine the discrimination and calibration of the models, using C-statistics, calibration plots, and Hosmer-Lemeshow statistics. Methods of recalibration in the large and logistic recalibration were used to update the models. Results For discrimination, both models functioned better for survival (C-statistics between .802 and .837) than for independence (C-statistics between .725 and .735). Both models showed slight shortcomings with regard to calibration, over-predicting survival and under-predicting independence; the NIHSSMod performed slightly better than the SSVMod. For the most part, there were only minor differences between ischaemic and haemorrhagic strokes. Logistic recalibration successfully updated the models for a clinical trial population. Conclusions Both prognostic models performed well overall in a clinical trial population. The choice between them is probably better based on clinical and practical considerations than on statistical considerations

Edinburgh Research Explorer

FigShare

Morphological and immunohistochemical differences between gonadal maturation delay and early germ cell neoplasia in patients with undervirilization syndromes

Author: Boter M
Cools Martine
Drop SLS
Kersemaekers AMF
Looijenga LHJ
Oosterhuis JW
Steyerberg EW
Van Aerde K
Wolffenbuttel KP
Publication venue: 'The Endocrine Society'
Publication date: 01/01/2005
Field of study

Ghent University Academic Bibliography

Personalized Prediction of Lifetime Benefits with Statin Therapy for Asymptomatic Individuals: A Modeling Study

Author: Ferket BS
Fleischmann KE
Fleischmann Kirsten
Heeringa J
Hofman A
Hunink MGM
Nijhuis RLG
Spronk S
Steyerberg EW
van BJH
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2012
Field of study

Background: Physicians need to inform asymptomatic individuals about personalized outcomes of statin therapy for primary prevention of cardiovascular disease (CVD). However, current prediction models focus on short-term outcomes and ignore the competing risk of death due to other causes. We aimed to predict the potential lifetime benefits with statin therapy, taking into account competing risks. Methods and Findings: A microsimulation model based on 5-y follow-up data from the Rotterdam Study, a population-based cohort of individuals aged 55 y and older living in the Ommoord district of Rotterdam, the Netherlands, was used to estimate lifetime outcomes with and without statin therapy. The model was validated in-sample using 10-y follow-up data. We used baseline variables and model output to construct (1) a web-based calculator for gains in total and CVD-free life expectancy and (2) color charts for comparing these gains to the Systematic Coronary Risk Evaluation (SCORE) charts. In 2,428 participants (mean age 67.7 y, 35.5% men), statin therapy increased total life expectancy by 0.3 y (SD 0.2) and CVD-free life expectancy by 0.7 y (SD 0.4). Age, sex, smoking, blood pressure, hypertension, lipids, diabetes, glucose, body mass index, waist-to-hip ratio, and creatinine were included in the calculator. Gains in total and CVD-free life expectancy increased with blood pressure, unfavorable lipid levels, and body mass index after multivariable adjustment. Gains decreased considerably with advancing age, while SCORE 10-y CVD mortality risk increased with age. Twenty-five percent of participants with a low SCORE risk achieved equal or larger gains in CVD-free life expectancy than the median gain in participants with a high SCORE risk. Conclusions: We developed tools to predict personalized increases in total and CVD-free life expectancy with statin therapy. The predicted gains we found are small. If the underlying model is validated in an independent cohort, the tools may be useful in discussing with patients their individual outcomes with statin therapy. Please see later in the article for the Editors' Summar

Harvard University - DASH

eScholarship - University of California

EUR Research Repository

Erasmus University Digital Repository

FigShare

Prediction of intracranial findings on CT-scans by alternative modelling techniques

Author: Diederik W Dippel
EW Steyerberg
Ewout W Steyerberg
FE Harrel
IG Stiell
IG Stiell
L Breiman
L Breiman
M Smits
Marion Smits
Myriam Hunink
N Kuppermann
S Zani
Tjeerd van der Ploeg
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Background: Prediction rules for intracranial traumatic findings in patients with minor head injury are designed to reduce the use of computed tomography (CT) without missing patients at risk for complications. This study investigates whether alternative modelling techniques might improve the applicability and simplicity of such prediction rules. Methods. We included 3181 patients with minor head injury who had received CT scans be

Springer - Publisher Connector

Erasmus University Digital Repository

EUR Research Repository

Predictive Value of Updating Framingham Risk Scores with Novel Risk Markers in the U.S. General Population

Author: Agarwal I
Ferket BS
Fleischmann KE
Fleischmann Kirsten
Franco OH
Hunink MGM
Kavousi M
Max Wendy
Steyerberg EW
Van BJH
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2014
Field of study

Background: According to population-based cohort studies CT coronary calcium score (CTCS), carotid intima-media thickness (cIMT), high-sensitivity C- reactive protein (CRP), and ankle-brachial index (ABI) are promising novel risk markers for improving cardiovascular risk assessment. Their impact in the U.S. general population is however uncertain. Our aim was to estimate the predictive value of four novel cardiovascular risk markers for the U.S. general population. Methods and Findings: Risk profiles, CRP and ABI data of 3,736 asymptomatic subjects aged 40 or older from the National Health and Nutrition Examination Survey (NHANES) 2003–2004 exam were used along with predicted CTCS and cIMT values. For each subject, we calculated 10-year cardiovascular risks with and without each risk marker. Event rates adjusted for competing risks were obtained by microsimulation. We assessed the impact of updated 10-year risk scores by reclassification and C-statistics. In the study population (mean age 56±11 years, 48% male), 70% (80%) were at low (<10%), 19% (14%) at intermediate (≥10–<20%), and 11% (6%) at high (≥20%) 10-year CVD (CHD) risk. Net reclassification improvement was highest after updating 10-year CVD risk with CTCS: 0.10 (95%CI 0.02–0.19). The C-statistic for 10-year CVD risk increased from 0.82 by 0.02 (95%CI 0.01–0.03) with CTCS. Reclassification occurred most often in those at intermediate risk: with CTCS, 36% (38%) moved to low and 22% (30%) to high CVD (CHD) risk. Improvements with other novel risk markers were limited. Conclusions: Only CTCS appeared to have significant incremental predictive value in the U.S. general population, especially in those at intermediate risk. In future research, cost-effectiveness analyses should be considered for evaluating novel cardiovascular risk assessment strategies

Harvard University - DASH