7 research outputs found

    Why is the winner the best?

    Get PDF
    International benchmarking competitions have becomefundamental for the comparative performance assessmentof image analysis methods. However, little attention hasbeen given to investigating what can be learnt from thesecompetitions. Do they really generate scientific progress?What are common and successful participation strategies?What makes a solution superior to a competing method?To address this gap in the literature, we performed a multi-center study with all 80 competitions that were conducted inthe scope of IEEE ISBI 2021 and MICCAI 2021. Statisticalanalyses performed based on comprehensive descriptions ofthe submitted algorithms linked to their rank as well as theunderlying participation strategies revealed common char-acteristics of winning solutions. These typically includethe use of multi-task learning (63%) and/or multi-stagepipelines (61%), and a focus on augmentation (100%), im-age preprocessing (97%), data curation (79%), and post-processing (66%). The “typical” lead of a winning teamis a computer scientist with a doctoral degree, five years ofexperience in biomedical image analysis, and four years ofexperience in deep learning. Two core general developmentstrategies stood out for highly-ranked teams: the reflectionof the metrics in the method design and the focus on analyz-ing and handling failure cases. According to the organizers,43% of the winning algorithms exceeded the state of the artbut only 11% completely solved the respective domain prob-lem. The insights of our study could help researchers (1)improve algorithm development strategies when approach-ing new problems, and (2) focus on open research questionsrevealed by this work

    Metrics reloaded: Pitfalls and recommendations for image analysis validation

    Get PDF
    Increasing evidence shows that flaws in machine learning (ML) algorithm validation are an underestimated global problem. Particularly in automatic biomedical image analysis, chosen performance metrics often do not reflect the domain interest, thus failing to adequately measure scientific progress and hindering translation of ML techniques into practice. To overcome this, our large international expert consortium created Metrics Reloaded, a comprehensive framework guiding researchers in the problem-aware selection of metrics. Following the convergence of ML methodology across application domains, Metrics Reloaded fosters the convergence of validation methodology. The framework was developed in a multi-stage Delphi process and is based on the novel concept of a problem fingerprint - a structured representation of the given problem that captures all aspects that are relevant for metric selection, from the domain interest to the properties of the target structure(s), data set and algorithm output. Based on the problem fingerprint, users are guided through the process of choosing and applying appropriate validation metrics while being made aware of potential pitfalls. Metrics Reloaded targets image analysis problems that can be interpreted as a classification task at image, object or pixel level, namely image-level classification, object detection, semantic segmentation, and instance segmentation tasks. To improve the user experience, we implemented the framework in the Metrics Reloaded online tool, which also provides a point of access to explore weaknesses, strengths and specific recommendations for the most common validation metrics. The broad applicability of our framework across domains is demonstrated by an instantiation for various biological and medical image analysis use cases

    Understanding metric-related pitfalls in image analysis validation

    Full text link
    Validation metrics are key for tracking scientific progress and bridging the current chasm between artificial intelligence research and its translation into practice. However, increasing evidence shows that, particularly in image analysis, metrics are often chosen inadequately. Although taking into account the individual strengths, weaknesses and limitations of validation metrics is a critical prerequisite to making educated choices, the relevant knowledge is currently scattered and poorly accessible to individual researchers. Based on a multistage Delphi process conducted by a multidisciplinary expert consortium as well as extensive community feedback, the present work provides a reliable and comprehensive common point of access to information on pitfalls related to validation metrics in image analysis. Although focused on biomedical image analysis, the addressed pitfalls generalize across application domains and are categorized according to a newly created, domain-agnostic taxonomy. The work serves to enhance global comprehension of a key topic in image analysis validation

    Why is the winner the best?

    Full text link
    International benchmarking competitions have become fundamental for the comparative performance assessment of image analysis methods. However, little attention has been given to investigating what can be learnt from these competitions. Do they really generate scientific progress? What are common and successful participation strategies? What makes a solution superior to a competing method? To address this gap in the literature, we performed a multi-center study with all 80 competitions that were conducted in the scope of IEEE ISBI 2021 and MICCAI 2021. Statistical analyses performed based on comprehensive descriptions of the submitted algorithms linked to their rank as well as the underlying participation strategies revealed common characteristics of winning solutions. These typically include the use of multi-task learning (63%) and/or multi-stage pipelines (61%), and a focus on augmentation (100%), image preprocessing (97%), data curation (79%), and postprocessing (66%). The "typical" lead of a winning team is a computer scientist with a doctoral degree, five years of experience in biomedical image analysis, and four years of experience in deep learning. Two core general development strategies stood out for highly-ranked teams: the reflection of the metrics in the method design and the focus on analyzing and handling failure cases. According to the organizers, 43% of the winning algorithms exceeded the state of the art but only 11% completely solved the respective domain problem. The insights of our study could help researchers (1) improve algorithm development strategies when approaching new problems, and (2) focus on open research questions revealed by this work
    corecore