3 research outputs found

    An investigation into the risk of population bias in deep learning autocontouring

    Get PDF
    Background and Purpose: To date, data used in the development of Deep Learning-based automatic contouring (DLC) algorithms have been largely sourced from single geographic populations. This study aimed to evaluate the risk of population-based bias by determining whether the performance of an autocontouring system is impacted by geographic population.Materials and methods: 80 Head Neck CT deidentified scans were collected from four clinics in Europe (n = 2) and Asia (n = 2). A single observer manually delineated 16 organs-at-risk in each. Subsequently, the data was contoured using a DLC solution, and trained using single institution (European) data. Autocontours were compared to manual delineations using quantitative measures. A Kruskal-Wallis test was used to test for any difference between populations. Clinical acceptability of automatic and manual contours to observers from each participating institution was assessed using a blinded subjective evaluation.Results: Seven organs showed a significant difference in volume between groups. Four organs showed statistical differences in quantitative similarity measures. The qualitative test showed greater variation in acceptance of contouring between observers than between data from different origins, with greater acceptance by the South Korean observers.Conclusion: Much of the statistical difference in quantitative performance could be explained by the difference in organ volume impacting the contour similarity measures and the small sample size. However, the qualitative assessment suggests that observer perception bias has a greater impact on the apparent clinical acceptability than quantitatively observed differences. This investigation of potential geographic bias should extend to more patients, populations, and anatomical regions in the future.</p
    corecore