Artificial Intelligence (AI) is increasingly used in medical imaging to improve the ac-
curacy and efficiency of cancer detection. However, when multiple AI models are used
in sequence, such as region of interest (ROI) segmentation followed by tumour detec-
tion, errors from one model can propagate to the next. This thesis investigates such
behaviours using prostate and pancreatic cancer datasets (PI-CAI and PANORAMA)
in modelled real-world scenarios. Each scenario was divided into a real and ideal case,
with the real case making use of a chain of two segmentation models — model 1 for
ROI, and model 2 for tumour —, while the ideal case only used a tumour segmentation
model. Models were trained using the nnUNet framework. The ROI segmentation
models for prostate and pancreas achieved high dice scores of 0.953 and 0.857 ± 0.081
respectively at best, but showed poor performance in the detection and segmentation
of tumours. In the tumour model, the PANORAMA dataset got the best ROC-AUC
score of 0.653 in the ideal case. The PI-CAI dataset produced the best ROC-AUC score
of 0.610 in the real case of parallel development. These results demonstrate that strong
performance in one model does not necessarily lead to strong performance in the next.
This underscores the need to evaluate chained AI systems holistically, particularly in
high-stakes domains like oncologyArtificial Intelligence (AI) is increasingly used in medical imaging to improve the ac-
curacy and efficiency of cancer detection. However, when multiple AI models are used
in sequence, such as region of interest (ROI) segmentation followed by tumour detec-
tion, errors from one model can propagate to the next. This thesis investigates such
behaviours using prostate and pancreatic cancer datasets (PI-CAI and PANORAMA)
in modelled real-world scenarios. Each scenario was divided into a real and ideal case,
with the real case making use of a chain of two segmentation models — model 1 for
ROI, and model 2 for tumour —, while the ideal case only used a tumour segmentation
model. Models were trained using the nnUNet framework. The ROI segmentation
models for prostate and pancreas achieved high dice scores of 0.953 and 0.857 ± 0.081
respectively at best, but showed poor performance in the detection and segmentation
of tumours. In the tumour model, the PANORAMA dataset got the best ROC-AUC
score of 0.653 in the ideal case. The PI-CAI dataset produced the best ROC-AUC score
of 0.610 in the real case of parallel development. These results demonstrate that strong
performance in one model does not necessarily lead to strong performance in the next.
This underscores the need to evaluate chained AI systems holistically, particularly in
high-stakes domains like oncolog
Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.