9 research outputs found

    Automated curation of large‐scale cancer histopathology image datasets using deep learning

    Get PDF
    Background Artificial intelligence (AI) has numerous applications in pathology, supporting diagnosis and prognostication in cancer. However, most AI models are trained on highly selected data, typically one tissue slide per patient. In reality, especially for large surgical resection specimens, dozens of slides can be available for each patient. Manually sorting and labelling whole-slide images (WSIs) is a very time-consuming process, hindering the direct application of AI on the collected tissue samples from large cohorts. In this study we addressed this issue by developing a deep-learning (DL)-based method for automatic curation of large pathology datasets with several slides per patient. Methods We collected multiple large multicentric datasets of colorectal cancer histopathological slides from the United Kingdom (FOXTROT, N = 21,384 slides; CR07, N = 7985 slides) and Germany (DACHS, N = 3606 slides). These datasets contained multiple types of tissue slides, including bowel resection specimens, endoscopic biopsies, lymph node resections, immunohistochemistry-stained slides, and tissue microarrays. We developed, trained, and tested a deep convolutional neural network model to predict the type of slide from the slide overview (thumbnail) image. The primary statistical endpoint was the macro-averaged area under the receiver operating curve (AUROCs) for detection of the type of slide. Results In the primary dataset (FOXTROT), with an AUROC of 0.995 [95% confidence interval [CI]: 0.994–0.996] the algorithm achieved a high classification performance and was able to accurately predict the type of slide from the thumbnail image alone. In the two external test cohorts (CR07, DACHS) AUROCs of 0.982 [95% CI: 0.979–0.985] and 0.875 [95% CI: 0.864–0.887] were observed, which indicates the generalizability of the trained model on unseen datasets. With a confidence threshold of 0.95, the model reached an accuracy of 94.6% (7331 classified cases) in CR07 and 85.1% (2752 classified cases) for the DACHS cohort. Conclusion Our findings show that using the low-resolution thumbnail image is sufficient to accurately classify the type of slide in digital pathology. This can support researchers to make the vast resource of existing pathology archives accessible to modern AI models with only minimal manual annotations

    Artificial intelligence for detection of microsatellite instability in colorectal cancer-a multicentric analysis of a pre-screening tool for clinical application.

    Get PDF
    BACKGROUND Microsatellite instability (MSI)/mismatch repair deficiency (dMMR) is a key genetic feature which should be tested in every patient with colorectal cancer (CRC) according to medical guidelines. Artificial intelligence (AI) methods can detect MSI/dMMR directly in routine pathology slides, but the test performance has not been systematically investigated with predefined test thresholds. METHOD We trained and validated AI-based MSI/dMMR detectors and evaluated predefined performance metrics using nine patient cohorts of 8343 patients across different countries and ethnicities. RESULTS Classifiers achieved clinical-grade performance, yielding an area under the receiver operating curve (AUROC) of up to 0.96 without using any manual annotations. Subsequently, we show that the AI system can be applied as a rule-out test: by using cohort-specific thresholds, on average 52.73% of tumors in each surgical cohort [total number of MSI/dMMR = 1020, microsatellite stable (MSS)/ proficient mismatch repair (pMMR) = 7323 patients] could be identified as MSS/pMMR with a fixed sensitivity at 95%. In an additional cohort of N = 1530 (MSI/dMMR = 211, MSS/pMMR = 1319) endoscopy biopsy samples, the system achieved an AUROC of 0.89, and the cohort-specific threshold ruled out 44.12% of tumors with a fixed sensitivity at 95%. As a more robust alternative to cohort-specific thresholds, we showed that with a fixed threshold of 0.25 for all the cohorts, we can rule-out 25.51% in surgical specimens and 6.10% in biopsies. INTERPRETATION When applied in a clinical setting, this means that the AI system can rule out MSI/dMMR in a quarter (with global thresholds) or half of all CRC patients (with local fine-tuning), thereby reducing cost and turnaround time for molecular profiling

    Swarm learning for decentralized artificial intelligence in cancer histopathology

    Get PDF
    Artificial intelligence (AI) can predict the presence of molecular alterations directly from routine histopathology slides. However, training robust AI systems requires large datasets for which data collection faces practical, ethical and legal obstacles. These obstacles could be overcome with swarm learning (SL), in which partners jointly train AI models while avoiding data transfer and monopolistic data governance. Here, we demonstrate the successful use of SL in large, multicentric datasets of gigapixel histopathology images from over 5,000 patients. We show that AI models trained using SL can predict BRAF mutational status and microsatellite instability directly from hematoxylin and eosin (H&E)-stained pathology slides of colorectal cancer. We trained AI models on three patient cohorts from Northern Ireland, Germany and the United States, and validated the prediction performance in two independent datasets from the United Kingdom. Our data show that SL-trained AI models outperform most locally trained models, and perform on par with models that are trained on the merged datasets. In addition, we show that SL-based AI models are data efficient. In the future, SL can be used to train distributed AI models for any histopathology image analysis task, eliminating the need for data transfer

    Deep learning-based phenotyping reclassifies combined hepatocellular-cholangiocarcinoma.

    Get PDF
    Primary liver cancer arises either from hepatocytic or biliary lineage cells, giving rise to hepatocellular carcinoma (HCC) or intrahepatic cholangiocarcinoma (ICCA). Combined hepatocellular- cholangiocarcinomas (cHCC-CCA) exhibit equivocal or mixed features of both, causing diagnostic uncertainty and difficulty in determining proper management. Here, we perform a comprehensive deep learning-based phenotyping of multiple cohorts of patients. We show that deep learning can reproduce the diagnosis of HCC vs. CCA with a high performance. We analyze a series of 405 cHCC-CCA patients and demonstrate that the model can reclassify the tumors as HCC or ICCA, and that the predictions are consistent with clinical outcomes, genetic alterations and in situ spatial gene expression profiling. This type of approach could improve treatment decisions and ultimately clinical outcome for patients with rare and biphenotypic cancers such as cHCC-CCA

    The global burden of cancer attributable to risk factors, 2010-19 : a systematic analysis for the Global Burden of Disease Study 2019

    Get PDF
    Background Understanding the magnitude of cancer burden attributable to potentially modifiable risk factors is crucial for development of effective prevention and mitigation strategies. We analysed results from the Global Burden of Diseases, Injuries, and Risk Factors Study (GBD) 2019 to inform cancer control planning efforts globally. Methods The GBD 2019 comparative risk assessment framework was used to estimate cancer burden attributable to behavioural, environmental and occupational, and metabolic risk factors. A total of 82 risk-outcome pairs were included on the basis of the World Cancer Research Fund criteria. Estimated cancer deaths and disability-adjusted life-years (DALYs) in 2019 and change in these measures between 2010 and 2019 are presented. Findings Globally, in 2019, the risk factors included in this analysis accounted for 4.45 million (95% uncertainty interval 4.01-4.94) deaths and 105 million (95.0-116) DALYs for both sexes combined, representing 44.4% (41.3-48.4) of all cancer deaths and 42.0% (39.1-45.6) of all DALYs. There were 2.88 million (2.60-3.18) risk-attributable cancer deaths in males (50.6% [47.8-54.1] of all male cancer deaths) and 1.58 million (1.36-1.84) risk-attributable cancer deaths in females (36.3% [32.5-41.3] of all female cancer deaths). The leading risk factors at the most detailed level globally for risk-attributable cancer deaths and DALYs in 2019 for both sexes combined were smoking, followed by alcohol use and high BMI. Risk-attributable cancer burden varied by world region and Socio-demographic Index (SDI), with smoking, unsafe sex, and alcohol use being the three leading risk factors for risk-attributable cancer DALYs in low SDI locations in 2019, whereas DALYs in high SDI locations mirrored the top three global risk factor rankings. From 2010 to 2019, global risk-attributable cancer deaths increased by 20.4% (12.6-28.4) and DALYs by 16.8% (8.8-25.0), with the greatest percentage increase in metabolic risks (34.7% [27.9-42.8] and 33.3% [25.8-42.0]). Interpretation The leading risk factors contributing to global cancer burden in 2019 were behavioural, whereas metabolic risk factors saw the largest increases between 2010 and 2019. Reducing exposure to these modifiable risk factors would decrease cancer mortality and DALY rates worldwide, and policies should be tailored appropriately to local cancer risk factor burden. Copyright (C) 2022 The Author(s). Published by Elsevier Ltd. This is an Open Access article under the CC BY 4.0 license.Peer reviewe

    Test Time Transform Prediction for Open Set Histopathological Image Recognition

    No full text
    Tissue typology annotation in Whole Slide histological images is a complex and tedious, yet necessary task for the development of computational pathology models. We propose to address this problem by applying Open Set Recognition techniques to the task of jointly classifying tissue that belongs to a set of annotated classes, e.g. clinically relevant tissue categories, while rejecting in test time Open Set samples, i.e. images that belong to categories not present in the training set. To this end, we introduce a new approach for Open Set histopathological image recognition based on training a model to accurately identify image categories and simultaneously predict which data augmentation transform has been applied. In test time, we measure model confidence in predicting this transform, which we expect to be lower for images in the Open Set. We carry out comprehensive experiments in the context of colorectal cancer assessment from histological images, which provide evidence on the strengths of our approach to automatically identify samples from unknown categories. Code is released at https://github.com/agaldran/t3po.Adrian Galdran, Katherine J. Hewitt, Narmin Ghaffari Laleh, Jakob N. Kather, Gustavo Carneiro, Miguel A. González Balleste

    Test time transform prediction for open set histopathological image recognition

    No full text
    Comunicació presentada a 25th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2022), celebrat del 18 al 22 de setembre de 2022 a Sentosa, Singapur.Tissue typology annotation in Whole Slide histological images is a complex and tedious, yet necessary task for the development of computational pathology models. We propose to address this problem by applying Open Set Recognition techniques to the task of jointly classifying tissue that belongs to a set of annotated classes, e.g. clinically relevant tissue categories, while rejecting in test time Open Set samples, i.e. images that belong to categories not present in the training set. To this end, we introduce a new approach for Open Set histopathological image recognition based on training a model to accurately identify image categories and simultaneously predict which data augmentation transform has been applied. In test time, we measure model confidence in predicting this transform, which we expect to be lower for images in the Open Set. We carry out comprehensive experiments in the context of colorectal cancer assessment from histological images, which provide evidence on the strengths of our approach to automatically identify samples from unknown categories. Code is released at https://github.com/agaldran/t3po.This work was partially supported by a Marie Sklodowska-Curie Global Fellowship (No. 892297) and by Australian Research Council grants (DP180103232 and FT190100525)
    corecore