59 research outputs found

    Development and Evaluation of Machine Learning in Whole-Body Magnetic Resonance Imaging for Detecting Metastases in Patients With Lung or Colon Cancer: A Diagnostic Test Accuracy Study

    Get PDF
    OBJECTIVES: Whole-body magnetic resonance imaging (WB-MRI) has been demonstrated to be efficient and cost-effective for cancer staging. The study aim was to develop a machine learning (ML) algorithm to improve radiologists' sensitivity and specificity for metastasis detection and reduce reading times. MATERIALS AND METHODS: A retrospective analysis of 438 prospectively collected WB-MRI scans from multicenter Streamline studies (February 2013-September 2016) was undertaken. Disease sites were manually labeled using Streamline reference standard. Whole-body MRI scans were randomly allocated to training and testing sets. A model for malignant lesion detection was developed based on convolutional neural networks and a 2-stage training strategy. The final algorithm generated lesion probability heat maps. Using a concurrent reader paradigm, 25 radiologists (18 experienced, 7 inexperienced in WB-/MRI) were randomly allocated WB-MRI scans with or without ML support to detect malignant lesions over 2 or 3 reading rounds. Reads were undertaken in the setting of a diagnostic radiology reading room between November 2019 and March 2020. Reading times were recorded by a scribe. Prespecified analysis included sensitivity, specificity, interobserver agreement, and reading time of radiology readers to detect metastases with or without ML support. Reader performance for detection of the primary tumor was also evaluated. RESULTS: Four hundred thirty-three evaluable WB-MRI scans were allocated to algorithm training (245) or radiology testing (50 patients with metastases, from primary 117 colon [n = 117] or lung [n = 71] cancer). Among a total 562 reads by experienced radiologists over 2 reading rounds, per-patient specificity was 86.2% (ML) and 87.7% (non-ML) (-1.5% difference; 95% confidence interval [CI], -6.4%, 3.5%; P = 0.39). Sensitivity was 66.0% (ML) and 70.0% (non-ML) (-4.0% difference; 95% CI, -13.5%, 5.5%; P = 0.344). Among 161 reads by inexperienced readers, per-patient specificity in both groups was 76.3% (0% difference; 95% CI, -15.0%, 15.0%; P = 0.613), with sensitivity of 73.3% (ML) and 60.0% (non-ML) (13.3% difference; 95% CI, -7.9%, 34.5%; P = 0.313). Per-site specificity was high (>90%) for all metastatic sites and experience levels. There was high sensitivity for the detection of primary tumors (lung cancer detection rate of 98.6% with and without ML [0.0% difference; 95% CI, -2.0%, 2.0%; P = 1.00], colon cancer detection rate of 89.0% with and 90.6% without ML [-1.7% difference; 95% CI, -5.6%, 2.2%; P = 0.65]). When combining all reads from rounds 1 and 2, reading times fell by 6.2% (95% CI, -22.8%, 10.0%) when using ML. Round 2 read-times fell by 32% (95% CI, 20.8%, 42.8%) compared with round 1. Within round 2, there was a significant decrease in read-time when using ML support, estimated as 286 seconds (or 11%) quicker (P = 0.0281), using regression analysis to account for reader experience, read round, and tumor type. Interobserver variance suggests moderate agreement, Cohen κ = 0.64; 95% CI, 0.47, 0.81 (with ML), and Cohen κ = 0.66; 95% CI, 0.47, 0.81 (without ML). CONCLUSIONS: There was no evidence of a significant difference in per-patient sensitivity and specificity for detecting metastases or the primary tumor using concurrent ML compared with standard WB-MRI. Radiology read-times with or without ML support fell for round 2 reads compared with round 1, suggesting that readers familiarized themselves with the study reading method. During the second reading round, there was a significant reduction in reading time when using ML support

    Genome-wide association study identifies variants in the MHC class I, IL10, and IL23R-IL12RB2 regions associated with Behcet's disease

    Get PDF
    Behcet's disease is a genetically complex disease of unknown etiology characterized by recurrent inflammatory attacks affecting the orogenital mucosa, eyes and skin. We performed a genome-wide association study with 311,459 SNPs in 1,215 individuals with Behcet's disease (cases) and 1,278 healthy controls from Turkey. We confirmed the known association of Behcet's disease with HLA-B*51 and identified a second, independent association within the MHC Class I region. We also identified an association at IL10 (rs1518111, P = 1.88 x 10(-8)). Using a meta-analysis with an additional five cohorts from Turkey, the Middle East, Europe and Asia, comprising a total of 2,430 cases and 2,660 controls, we identified associations at IL10 (rs1518111, P = 3.54 x 10(-18), odds ratio = 1.45, 95% CI 1.34-1.58) and the IL23R-IL12RB2 locus (rs924080, P = 6.69 x 10(-9), OR = 1.28, 95% CI 1.18-1.39). The disease-associated IL10 variant (the rs1518111 A allele) was associated with diminished mRNA expression and low protein production

    Common Limitations of Image Processing Metrics:A Picture Story

    Get PDF
    While the importance of automatic image analysis is continuously increasing, recent meta-research revealed major flaws with respect to algorithm validation. Performance metrics are particularly key for meaningful, objective, and transparent performance assessment and validation of the used automatic algorithms, but relatively little attention has been given to the practical pitfalls when using specific metrics for a given image analysis task. These are typically related to (1) the disregard of inherent metric properties, such as the behaviour in the presence of class imbalance or small target structures, (2) the disregard of inherent data set properties, such as the non-independence of the test cases, and (3) the disregard of the actual biomedical domain interest that the metrics should reflect. This living dynamically document has the purpose to illustrate important limitations of performance metrics commonly applied in the field of image analysis. In this context, it focuses on biomedical image analysis problems that can be phrased as image-level classification, semantic segmentation, instance segmentation, or object detection task. The current version is based on a Delphi process on metrics conducted by an international consortium of image analysis experts from more than 60 institutions worldwide.Comment: This is a dynamic paper on limitations of commonly used metrics. The current version discusses metrics for image-level classification, semantic segmentation, object detection and instance segmentation. For missing use cases, comments or questions, please contact [email protected] or [email protected]. Substantial contributions to this document will be acknowledged with a co-authorshi

    Understanding metric-related pitfalls in image analysis validation

    Get PDF
    Validation metrics are key for the reliable tracking of scientific progress and for bridging the current chasm between artificial intelligence (AI) research and its translation into practice. However, increasing evidence shows that particularly in image analysis, metrics are often chosen inadequately in relation to the underlying research problem. This could be attributed to a lack of accessibility of metric-related knowledge: While taking into account the individual strengths, weaknesses, and limitations of validation metrics is a critical prerequisite to making educated choices, the relevant knowledge is currently scattered and poorly accessible to individual researchers. Based on a multi-stage Delphi process conducted by a multidisciplinary expert consortium as well as extensive community feedback, the present work provides the first reliable and comprehensive common point of access to information on pitfalls related to validation metrics in image analysis. Focusing on biomedical image analysis but with the potential of transfer to other fields, the addressed pitfalls generalize across application domains and are categorized according to a newly created, domain-agnostic taxonomy. To facilitate comprehension, illustrations and specific examples accompany each pitfall. As a structured body of information accessible to researchers of all levels of expertise, this work enhances global comprehension of a key topic in image analysis validation.Comment: Shared first authors: Annika Reinke, Minu D. Tizabi; shared senior authors: Paul F. J\"ager, Lena Maier-Hei

    Future-ai:International consensus guideline for trustworthy and deployable artificial intelligence in healthcare

    Get PDF
    Despite major advances in artificial intelligence (AI) for medicine and healthcare, the deployment and adoption of AI technologies remain limited in real-world clinical practice. In recent years, concerns have been raised about the technical, clinical, ethical and legal risks associated with medical AI. To increase real world adoption, it is essential that medical AI tools are trusted and accepted by patients, clinicians, health organisations and authorities. This work describes the FUTURE-AI guideline as the first international consensus framework for guiding the development and deployment of trustworthy AI tools in healthcare. The FUTURE-AI consortium was founded in 2021 and currently comprises 118 inter-disciplinary experts from 51 countries representing all continents, including AI scientists, clinicians, ethicists, and social scientists. Over a two-year period, the consortium defined guiding principles and best practices for trustworthy AI through an iterative process comprising an in-depth literature review, a modified Delphi survey, and online consensus meetings. The FUTURE-AI framework was established based on 6 guiding principles for trustworthy AI in healthcare, i.e. Fairness, Universality, Traceability, Usability, Robustness and Explainability. Through consensus, a set of 28 best practices were defined, addressing technical, clinical, legal and socio-ethical dimensions. The recommendations cover the entire lifecycle of medical AI, from design, development and validation to regulation, deployment, and monitoring. FUTURE-AI is a risk-informed, assumption-free guideline which provides a structured approach for constructing medical AI tools that will be trusted, deployed and adopted in real-world practice. Researchers are encouraged to take the recommendations into account in proof-of-concept stages to facilitate future translation towards clinical practice of medical AI

    FUTURE-AI: International consensus guideline for trustworthy and deployable artificial intelligence in healthcare

    Full text link
    Despite major advances in artificial intelligence (AI) for medicine and healthcare, the deployment and adoption of AI technologies remain limited in real-world clinical practice. In recent years, concerns have been raised about the technical, clinical, ethical and legal risks associated with medical AI. To increase real world adoption, it is essential that medical AI tools are trusted and accepted by patients, clinicians, health organisations and authorities. This work describes the FUTURE-AI guideline as the first international consensus framework for guiding the development and deployment of trustworthy AI tools in healthcare. The FUTURE-AI consortium was founded in 2021 and currently comprises 118 inter-disciplinary experts from 51 countries representing all continents, including AI scientists, clinicians, ethicists, and social scientists. Over a two-year period, the consortium defined guiding principles and best practices for trustworthy AI through an iterative process comprising an in-depth literature review, a modified Delphi survey, and online consensus meetings. The FUTURE-AI framework was established based on 6 guiding principles for trustworthy AI in healthcare, i.e. Fairness, Universality, Traceability, Usability, Robustness and Explainability. Through consensus, a set of 28 best practices were defined, addressing technical, clinical, legal and socio-ethical dimensions. The recommendations cover the entire lifecycle of medical AI, from design, development and validation to regulation, deployment, and monitoring. FUTURE-AI is a risk-informed, assumption-free guideline which provides a structured approach for constructing medical AI tools that will be trusted, deployed and adopted in real-world practice. Researchers are encouraged to take the recommendations into account in proof-of-concept stages to facilitate future translation towards clinical practice of medical AI

    Machine learning algorithms performed no better than regression models for prognostication in traumatic brain injury

    Get PDF
    Objective: We aimed to explore the added value of common machine learning (ML) algorithms for prediction of outcome for moderate and severe traumatic brain injury. Study Design and Setting: We performed logistic regression (LR), lasso regression, and ridge regression with key baseline predictors in the IMPACT-II database (15 studies, n = 11,022). ML algorithms included support vector machines, random forests, gradient boosting machines, and artificial neural networks and were trained using the same predictors. To assess generalizability of predictions, we performed internal, internal-external, and external validation on the recent CENTER-TBI study (patients with Glasgow Coma Scale <13, n = 1,554). Both calibration (calibration slope/intercept) and discrimination (area under the curve) was quantified. Results: In the IMPACT-II database, 3,332/11,022 (30%) died and 5,233(48%) had unfavorable outcome (Glasgow Outcome Scale less than 4). In the CENTER-TBI study, 348/1,554(29%) died and 651(54%) had unfavorable outcome. Discrimination and calibration varied widely between the studies and less so between the studied algorithms. The mean area under the curve was 0.82 for mortality and 0.77 for unfavorable outcomes in the CENTER-TBI study. Conclusion: ML algorithms may not outperform traditional regression approaches in a low-dimensional setting for outcome prediction after moderate or severe traumatic brain injury. Similar to regression-based prediction models, ML algorithms should be rigorously validated to ensure applicability to new populations

    Variation in Structure and Process of Care in Traumatic Brain Injury: Provider Profiles of European Neurotrauma Centers Participating in the CENTER-TBI Study.

    Get PDF
    INTRODUCTION: The strength of evidence underpinning care and treatment recommendations in traumatic brain injury (TBI) is low. Comparative effectiveness research (CER) has been proposed as a framework to provide evidence for optimal care for TBI patients. The first step in CER is to map the existing variation. The aim of current study is to quantify variation in general structural and process characteristics among centers participating in the Collaborative European NeuroTrauma Effectiveness Research in Traumatic Brain Injury (CENTER-TBI) study. METHODS: We designed a set of 11 provider profiling questionnaires with 321 questions about various aspects of TBI care, chosen based on literature and expert opinion. After pilot testing, questionnaires were disseminated to 71 centers from 20 countries participating in the CENTER-TBI study. Reliability of questionnaires was estimated by calculating a concordance rate among 5% duplicate questions. RESULTS: All 71 centers completed the questionnaires. Median concordance rate among duplicate questions was 0.85. The majority of centers were academic hospitals (n = 65, 92%), designated as a level I trauma center (n = 48, 68%) and situated in an urban location (n = 70, 99%). The availability of facilities for neuro-trauma care varied across centers; e.g. 40 (57%) had a dedicated neuro-intensive care unit (ICU), 36 (51%) had an in-hospital rehabilitation unit and the organization of the ICU was closed in 64% (n = 45) of the centers. In addition, we found wide variation in processes of care, such as the ICU admission policy and intracranial pressure monitoring policy among centers. CONCLUSION: Even among high-volume, specialized neurotrauma centers there is substantial variation in structures and processes of TBI care. This variation provides an opportunity to study effectiveness of specific aspects of TBI care and to identify best practices with CER approaches

    The Extended Clinical Phenotype of 26 Patients with Chronic Mucocutaneous Candidiasis due to Gain-of-Function Mutations in STAT1

    Get PDF
    PURPOSE: Gain-of-function (GOF) mutations in the signal transducer and activator of transcription 1 (STAT1) result in unbalanced STAT signaling and cause immune dysregulation and immunodeficiency. The latter is often characterized by the susceptibility to recurrent Candida infections, resulting in the clinical picture of chronic mucocutaneous candidiasis (CMC). This study aims to assess the frequency of GOF STAT1 mutations in a large international cohort of CMC patients. METHODS: STAT1 was sequenced in genomic DNA from 57 CMC patients and 35 healthy family members. The functional relevance of nine different STAT1 variants was shown by flow cytometric analysis of STAT1 phosphorylation in patients' peripheral blood cells (PBMC) after stimulation with interferon (IFN)-α, IFN-γ or interleukin-27 respectively. Extended clinical data sets were collected and summarized for 26 patients. RESULTS: Heterozygous mutations within STAT1 were identified in 35 of 57 CMC patients (61 %). Out of 39 familial cases from 11 families, 26 patients (67 %) from 9 families and out of 18 sporadic cases, 9 patients (50 %) were shown to have heterozygous mutations within STAT1. Thirteen distinct STAT1 mutations are reported in this paper. Eight of these mutations are known to cause CMC (p.M202V, p.A267V, p.R274W, p.R274Q, p.T385M, p.K388E, p.N397D, and p.F404Y). However, five STAT1 variants (p.F172L, p.Y287D, p.P293S, p.T385K and p.S466R) have not been reported before in CMC patients. CONCLUSION: STAT1 mutations are frequently observed in patients suffering from CMC. Thus, sequence analysis of STAT1 in CMC patients is advised. Measurement of IFN- or IL-induced STAT1 phosphorylation in PBMC provides a fast and reliable diagnostic tool and should be carried out in addition to genetic testing

    Federated learning enables big data for rare cancer boundary detection.

    Get PDF
    Although machine learning (ML) has shown promise across disciplines, out-of-sample generalizability is concerning. This is currently addressed by sharing multi-site data, but such centralization is challenging/infeasible to scale due to various limitations. Federated ML (FL) provides an alternative paradigm for accurate and generalizable ML, by only sharing numerical model updates. Here we present the largest FL study to-date, involving data from 71 sites across 6 continents, to generate an automatic tumor boundary detector for the rare disease of glioblastoma, reporting the largest such dataset in the literature (n = 6, 314). We demonstrate a 33% delineation improvement for the surgically targetable tumor, and 23% for the complete tumor extent, over a publicly trained model. We anticipate our study to: 1) enable more healthcare studies informed by large diverse data, ensuring meaningful results for rare diseases and underrepresented populations, 2) facilitate further analyses for glioblastoma by releasing our consensus model, and 3) demonstrate the FL effectiveness at such scale and task-complexity as a paradigm shift for multi-site collaborations, alleviating the need for data-sharing
    corecore