Development and validation of colorectal cancer risk prediction tools:A comparison of models

Lansdorp-Vogelaar, Iris; Meester, Reinier G.S.; Mülder, Duco T.; O'Mahony, James F.; van den Puttelaar, Rosita

Development and validation of colorectal cancer risk prediction tools:A comparison of models

Authors: Iris Lansdorp-Vogelaar
Reinier G.S. Meester
Duco T. Mülder
James F. O'Mahony
Rosita van den Puttelaar
Publication date: 1 October 2023
Publisher
Doi

Abstract

Background: Identification of individuals at elevated risk can improve cancer screening programmes by permitting risk-adjusted screening intensities. Previous work introduced a prognostic model using sex, age and two preceding faecal haemoglobin concentrations to predict the risk of colorectal cancer (CRC) in the next screening round. Using data of 3 screening rounds, this model attained an area under the receiver-operating-characteristic curve (AUC) of 0.78 for predicting advanced neoplasia (AN). We validated this existing logistic regression (LR) model and attempted to improve it by applying a more flexible machine-learning approach. Methods: We trained an existing LR and a newly developed random forest (RF) model using updated data from 219,257 third-round participants of the Dutch CRC screening programme until 2018. For both models, we performed two separate out-of-sample validations using 1,137,599 third-round participants after 2018 and 192,793 fourth-round participants from 2020 onwards. We evaluated the AUC and relative risks of the predicted high-risk groups for the outcomes AN and CRC. Results: For third-round participants after 2018, the AUC for predicting AN was 0.77 (95% CI: 0.76–0.77) using LR and 0.77 (95% CI: 0.77–0.77) using RF. For fourth-round participants, the AUCs were 0.73 (95% CI: 0.72–0.74) and 0.73 (95% CI: 0.72–0.74) for the LR and RF models, respectively. For both models, the 5% with the highest predicted risk had a 7-fold risk of AN compared to average, whereas the lowest 80% had a risk below the population average for third-round participants. Conclusion: The LR is a valid risk prediction method in stool-based screening programmes. Although predictive performance declined marginally, the LR model still effectively predicted risk in subsequent screening rounds. An RF did not improve CRC risk prediction compared to an LR, probably due to the limited number of available explanatory variables. The LR remains the preferred prediction tool because of its interpretability.</p

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

EUR Research Repository

oai:pure.eur.nl:openaire_cris_...

Last time updated on 26/10/2023

EUR Research Repository

oai:pure.eur.nl:publications/0...

Last time updated on 26/10/2023