Undesired behaviours in artificial intelligence (AI) systems for cancer detection

Abstract

Artificial Intelligence (AI) is increasingly used in medical imaging to improve the ac- curacy and efficiency of cancer detection. However, when multiple AI models are used in sequence, such as region of interest (ROI) segmentation followed by tumour detec- tion, errors from one model can propagate to the next. This thesis investigates such behaviours using prostate and pancreatic cancer datasets (PI-CAI and PANORAMA) in modelled real-world scenarios. Each scenario was divided into a real and ideal case, with the real case making use of a chain of two segmentation models — model 1 for ROI, and model 2 for tumour —, while the ideal case only used a tumour segmentation model. Models were trained using the nnUNet framework. The ROI segmentation models for prostate and pancreas achieved high dice scores of 0.953 and 0.857 ± 0.081 respectively at best, but showed poor performance in the detection and segmentation of tumours. In the tumour model, the PANORAMA dataset got the best ROC-AUC score of 0.653 in the ideal case. The PI-CAI dataset produced the best ROC-AUC score of 0.610 in the real case of parallel development. These results demonstrate that strong performance in one model does not necessarily lead to strong performance in the next. This underscores the need to evaluate chained AI systems holistically, particularly in high-stakes domains like oncologyArtificial Intelligence (AI) is increasingly used in medical imaging to improve the ac- curacy and efficiency of cancer detection. However, when multiple AI models are used in sequence, such as region of interest (ROI) segmentation followed by tumour detec- tion, errors from one model can propagate to the next. This thesis investigates such behaviours using prostate and pancreatic cancer datasets (PI-CAI and PANORAMA) in modelled real-world scenarios. Each scenario was divided into a real and ideal case, with the real case making use of a chain of two segmentation models — model 1 for ROI, and model 2 for tumour —, while the ideal case only used a tumour segmentation model. Models were trained using the nnUNet framework. The ROI segmentation models for prostate and pancreas achieved high dice scores of 0.953 and 0.857 ± 0.081 respectively at best, but showed poor performance in the detection and segmentation of tumours. In the tumour model, the PANORAMA dataset got the best ROC-AUC score of 0.653 in the ideal case. The PI-CAI dataset produced the best ROC-AUC score of 0.610 in the real case of parallel development. These results demonstrate that strong performance in one model does not necessarily lead to strong performance in the next. This underscores the need to evaluate chained AI systems holistically, particularly in high-stakes domains like oncolog

Similar works

This paper was published in UiS Brage.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.