9 research outputs found

    Multi-institutional generalizability of a plan complexity machine learning model for predicting pre-treatment quality assurance results in radiotherapy

    No full text
    Abstract: Background and purpose: Treatment plans in radiotherapy are subject to measurement-based pre-treatment verifications. In this study, plan complexity metrics (PCMs) were calculated per beam and used as input features to develop a predictive model. The aim of this study was to determine the robustness against differences in machine type and institutional-specific quality assurance (QA). Material and methods: A number of 567 beams were collected, where 477 passed and 90 failed the pre-treatment QA. Treatment plans of different anatomical regions were included. One type of linear accelerator was represented. For all beams, 16 PCMs were calculated. A random forest classifier was trained to distinct between acceptable and non-acceptable beams. The model was validated on other datasets to investigate its robustness. Firstly, plans for another machine type from the same institution were evaluated. Secondly, an inter-institutional validation was conducted on three datasets from different centres with their associated QA.Results: Intra-institutionally, the PCMs beam modulation, mean MLC gap, Q1 gap, and Modulation Complexity Score were the most informative to detect failing beams. Eighty-tree percent of the failed beams (15/18) were detected correctly. The model could not detect over-modulated beams of another machine type. Interinstitutionally, the model performance reached higher accuracy for centres with comparable equipment both for treatment and QA as the local institute.Conclusions: The study demonstrates that the robustness decreases when major differences appear in the QA platform or in planning strategies, but that it is feasible to extrapolate institutional-specific trained models between centres with similar clinical practice. Predictive models should be developed for each machine type

    A geometry and dose-volume based performance monitoring of artificial intelligence models in radiotherapy treatment planning for prostate cancer

    No full text
    Abstract: Background and Purpose: Clinical Artificial Intelligence (AI) implementations lack ground-truth when applied on real-world data. This study investigated how combined geometrical and dose-volume metrics can be used as performance monitoring tools to detect clinically relevant candidates for model retraining.Materials and Methods: Fifty patients were analyzed for both AI-segmentation and planning. For AI-segmentation, geometrical (Standard Surface Dice 3 mm and Local Surface Dice 3 mm) and dose-volume based parameters were calculated for two organs (bladder and anorectum) to compare AI output against the clinically corrected structure. A Local Surface Dice was introduced to detect geometrical changes in the vicinity of the target volumes, while an Absolute Dose Difference (ADD) evaluation increased focus on dose-volume related changes. AI -planning performance was evaluated using clinical goal analysis in combination with volume and target overlap metrics.Results: The Local Surface Dice reported equal or lower values compared to the Standard Surface Dice (anorectum: (0.93 +/- 0.11) vs (0.98 +/- 0.04); bladder: (0.97 +/- 0.06) vs (0.98 +/- 0.04)). The ADD metric showed a difference of (0.9 +/- 0.8)Gy for the anorectum D1cm3. The bladder D5cm3 reported a difference of (0.7 +/- 1.5)Gy. Mandatory clinical goals were fulfilled in 90 % of the DLP plans.Conclusions: Combining dose-volume and geometrical metrics allowed detection of clinically relevant changes, applied to both auto-segmentation and auto-planning output and the Local Surface Dice was more sensitive to local changes compared to the Standard Surface Dice. This monitoring is able to evaluate AI behavior in clinical practice and allows candidate selection for active learning
    corecore