Search CORE

28 research outputs found

Toward better public health reporting using existing off the shelf approaches: The value of medical dictionaries in automated cancer detection using plaintext medical data

Author: Dixon Brian E.
Gichoya Judy
Grannis Shaun J.
Kasthurirathne Suranga N.
Mamlin Burke
Xia Yuni
Xu Huiping
Publication venue: 'Elsevier BV'
Publication date: 01/05/2017
Field of study

Objectives Existing approaches to derive decision models from plaintext clinical data frequently depend on medical dictionaries as the sources of potential features. Prior research suggests that decision models developed using non-dictionary based feature sourcing approaches and “off the shelf” tools could predict cancer with performance metrics between 80% and 90%. We sought to compare non-dictionary based models to models built using features derived from medical dictionaries. Materials and methods We evaluated the detection of cancer cases from free text pathology reports using decision models built with combinations of dictionary or non-dictionary based feature sourcing approaches, 4 feature subset sizes, and 5 classification algorithms. Each decision model was evaluated using the following performance metrics: sensitivity, specificity, accuracy, positive predictive value, and area under the receiver operating characteristics (ROC) curve. Results Decision models parameterized using dictionary and non-dictionary feature sourcing approaches produced performance metrics between 70 and 90%. The source of features and feature subset size had no impact on the performance of a decision model. Conclusion Our study suggests there is little value in leveraging medical dictionaries for extracting features for decision model building. Decision models built using features extracted from the plaintext reports themselves achieve comparable results to those built using medical dictionaries. Overall, this suggests that existing “off the shelf” approaches can be leveraged to perform accurate cancer detection using less complex Named Entity Recognition (NER) based feature extraction, automated feature selection and modeling approaches

IUPUIScholarWorks

Recommended from our members

Longitudinal assessment of demographic representativeness in the Medical Imaging and Data Resource Center open data commons

Author: Baughan Natalie
Bower Brad
Chen Weijie
Drukker Karen
Gichoya Judy
Giger Maryellen L.
Gruszauskas Nicholas
Kalpathy-Cramer Jayashree
Koyejo Sanmi
Myers Kyle J.
Sahiner Berkman
Sá Rui C.
Whitney Heather M.
Zhang Zi
Publication venue
Publication date: 18/08/2023
Field of study

Purpose: The Medical Imaging and Data Resource Center (MIDRC) open data commons was launched to accelerate the development of artificial intelligence (AI) algorithms to help address the COVID-19 pandemic. The purpose of this study was to quantify longitudinal representativeness of the demographic characteristics of the primary MIDRC dataset compared to the United States general population (US Census) and COVID-19 positive case counts from the Centers for Disease Control and Prevention (CDC). Approach: The Jensen-Shannon distance (JSD), a measure of similarity of two distributions, was used to longitudinally measure the representativeness of the distribution of (1) all unique patients in the MIDRC data to the 2020 US Census and (2) all unique COVID-19 positive patients in the MIDRC data to the case counts reported by the CDC. The distributions were evaluated in the demographic categories of age at index, sex, race, ethnicity, and the combination of race and ethnicity. Results: Representativeness of the MIDRC data by ethnicity and the combination of race and ethnicity was impacted by the percentage of CDC case counts for which this was not reported. The distributions by sex and race have retained their level of representativeness over time. Conclusion: The representativeness of the open medical imaging datasets in the curated public data commons at MIDRC has evolved over time as the number of contributing institutions and overall number of subjects have grown. The use of metrics, such as the JSD support measurement of representativeness, is one step needed for fair and generalizable AI algorithm development.</p

Knowledge UChicago

Secure and scalable deduplication of horizontally partitioned health data for privacy-preserving distributed statistical computation

Author: A Beimel
A Geissbuhler
AF Karr
AF Karr
AL Potosky
Antonis Michalas
AS Lunde
B Pinkas
BA Malin
BA Stewart
BH Bloom
C Clifton
C Friedman
C Quantin
D Vatsalan
EA Durham
G Cormode
G Hripcsak
GM Weber
GM Weber
GM Weber
IS Kohane
J Gichoya
J Vaidya
JF Ludvigsson
JH Holmes
JL Warren
Johan Gustav Bellika
JT Finnell
K Emam El
K Emam El
K Emam El
K Emam El
Kassaye Yitbarek Yigzaw
L Fan
L Lenert
LH Curtis
M Kantarcioglu
MA Hailemichael
MA Hernández
MK Ross
O Goldreich
P Christen
P Paillier
P Saint-Andre
R Cramer
R Lazarus
R Lazarus
R Schnell
RL Richesson
S Tarkoma
SC Pohlig
SM Randall
T Dimitriou
W Du
W Du
WB Lober
Y Lindell
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Background Techniques have been developed to compute statistics on distributed datasets without revealing private information except the statistical results. However, duplicate records in a distributed dataset may lead to incorrect statistical results. Therefore, to increase the accuracy of the statistical analysis of a distributed dataset, secure deduplication is an important preprocessing step. Methods We designed a secure protocol for the deduplication of horizontally partitioned datasets with deterministic record linkage algorithms. We provided a formal security analysis of the protocol in the presence of semi-honest adversaries. The protocol was implemented and deployed across three microbiology laboratories located in Norway, and we ran experiments on the datasets in which the number of records for each laboratory varied. Experiments were also performed on simulated microbiology datasets and data custodians connected through a local area network. Results The security analysis demonstrated that the protocol protects the privacy of individuals and data custodians under a semi-honest adversarial model. More precisely, the protocol remains secure with the collusion of up to N − 2 corrupt data custodians. The total runtime for the protocol scales linearly with the addition of data custodians and records. One million simulated records distributed across 20 data custodians were deduplicated within 45 s. The experimental results showed that the protocol is more efficient and scalable than previous protocols for the same problem. Conclusions The proposed deduplication protocol is efficient and scalable for practical uses while protecting the privacy of patients and data custodians

Crossref

PubMed Central

WestminsterResearch

Munin - Open Research Archive

NORA - Norwegian Open Research Archives

AI recognition of patient race in medical imaging: a modelling study

Author: Banerjee Imon
Bhimireddy Ananth Reddy
Burns John L.
Celi Leo Anthony
Chen Li-Ching
Correa Ramon
Dullerud Natalie
Ghassemi Marzyeh
Gichoya Judy Wawira
Huang Shih-Cheng
Kuo Po-Chih
Lungren Matthew P.
Oakden-Rayner Lauren
Okechukwu Chima
Palmer Lyle J.
Price Brandon J.
Purkayastha Saptarshi
Pyrros Ayis T.
Seyyed-Kalantari Laleh
Trivedi Hari
Wang Ryan
Zaiman Zachary
Zhang Haoran
Publication venue: 'Elsevier BV'
Publication date: 01/01/2021
Field of study

Background Previous studies in medical imaging have shown disparate abilities of artificial intelligence (AI) to detect a person's race, yet there is no known correlation for race on medical imaging that would be obvious to human experts when interpreting the images. We aimed to conduct a comprehensive evaluation of the ability of AI to recognise a patient's racial identity from medical images. Methods Using private (Emory CXR, Emory Chest CT, Emory Cervical Spine, and Emory Mammogram) and public (MIMIC-CXR, CheXpert, National Lung Cancer Screening Trial, RSNA Pulmonary Embolism CT, and Digital Hand Atlas) datasets, we evaluated, first, performance quantification of deep learning models in detecting race from medical images, including the ability of these models to generalise to external environments and across multiple imaging modalities. Second, we assessed possible confounding of anatomic and phenotypic population features by assessing the ability of these hypothesised confounders to detect race in isolation using regression models, and by re-evaluating the deep learning models by testing them on datasets stratified by these hypothesised confounding variables. Last, by exploring the effect of image corruptions on model performance, we investigated the underlying mechanism by which AI models can recognise race. Findings In our study, we show that standard AI deep learning models can be trained to predict race from medical images with high performance across multiple imaging modalities, which was sustained under external validation conditions (x-ray imaging [area under the receiver operating characteristics curve (AUC) range 0·91-0·99], CT chest imaging [0·87-0·96], and mammography [0·81]). We also showed that this detection is not due to proxies or imaging-related surrogate covariates for race (eg, performance of possible confounders: body-mass index [AUC 0·55], disease distribution [0·61], and breast density [0·61]). Finally, we provide evidence to show that the ability of AI deep learning models persisted over all anatomical regions and frequency spectrums of the images, suggesting the efforts to control this behaviour when it is undesirable will be challenging and demand further study. Interpretation The results from our study emphasise that the ability of AI deep learning models to predict self-reported race is itself not the issue of importance. However, our finding that AI can accurately predict self-reported race, even from corrupted, cropped, and noised medical images, often when clinical experts cannot, creates an enormous risk for all model deployments in medical imaging. Funding National Institute of Biomedical Imaging and Bioengineering, MIDRC grant of National Institutes of Health, US National Science Foundation, National Library of Medicine of the National Institutes of Health, and Taiwan Ministry of Science and Technology

arXiv.org e-Print Archive

IUPUIScholarWorks

PubMed Central

Reading Race: AI Recognises Patient's Racial Identity In Medical Images

Author: Banerjee Imon
Bhimireddy Ananth Reddy
Burns John L.
Celi Leo Anthony
Chen Li-Ching
Correa Ramon
Dullerud Natalie
Ghassemi Marzyeh
Gichoya Judy W.
Huang Shih-Cheng
Kuo Po-Chih
Lungren Matthew P.
Oakden-Rayner Luke
Okechukwu Chima
Palmer Lyle
Price Brandon J.
Purkayastha Saptarshi
Pyrros Ayis
Seyyed-Kalantari Laleh
Trivedi Hari
Wang Ryan
Zaiman Zachary
Zhang Haoran
Publication venue: arXiv
Publication date: 01/01/2021
Field of study

Background: In medical imaging, prior studies have demonstrated disparate AI performance by race, yet there is no known correlation for race on medical imaging that would be obvious to the human expert interpreting the images. Methods: Using private and public datasets we evaluate: A) performance quantification of deep learning models to detect race from medical images, including the ability of these models to generalize to external environments and across multiple imaging modalities, B) assessment of possible confounding anatomic and phenotype population features, such as disease distribution and body habitus as predictors of race, and C) investigation into the underlying mechanism by which AI models can recognize race. Findings: Standard deep learning models can be trained to predict race from medical images with high performance across multiple imaging modalities. Our findings hold under external validation conditions, as well as when models are optimized to perform clinically motivated tasks. We demonstrate this detection is not due to trivial proxies or imaging-related surrogate covariates for race, such as underlying disease distribution. Finally, we show that performance persists over all anatomical regions and frequency spectrum of the images suggesting that mitigation efforts will be challenging and demand further study. Interpretation: We emphasize that model ability to predict self-reported race is itself not the issue of importance. However, our findings that AI can trivially predict self-reported race -- even from corrupted, cropped, and noised medical images -- in a setting where clinical experts cannot, creates an enormous risk for all model deployments in medical imaging: if an AI model secretly used its knowledge of self-reported race to misclassify all Black patients, radiologists would not be able to tell using the same data the model has access to

IUPUIScholarWorks

Opportunistic Detection of Type 2 Diabetes Using Deep Learning From Frontal Chest Radiographs

Author: Borstelmann Stephen M.
Cohen Joseph Paul
Flanders Adam E.
Galanter William
Garrett John W.
Gichoya Judy Wawira
Greenstein Eugene
Gupta Amit
Hines-Shah John
Horowitz Jeanne M.
Khandwala Nishith
Koyejo Sanmi
Layden Brian T.
Lungren Matthew P.
Mantravadi Ramana
Nikolaidis Paul
Pickhardt Perry J.
Price Brandon
Pyrros Ayis
Rodríguez-Fernández Jorge Mario
Shulhan Ihar
Siddiqui Nasir
Thomas Kaesha
Willis Melinda
Zaiman Zachary
Publication venue: Jefferson Digital Commons
Publication date: 07/07/2023
Field of study

Deep learning (DL) models can harness electronic health records (EHRs) to predict diseases and extract radiologic findings for diagnosis. With ambulatory chest radiographs (CXRs) frequently ordered, we investigated detecting type 2 diabetes (T2D) by combining radiographic and EHR data using a DL model. Our model, developed from 271,065 CXRs and 160,244 patients, was tested on a prospective dataset of 9,943 CXRs. Here we show the model effectively detected T2D with a ROC AUC of 0.84 and a 16% prevalence. The algorithm flagged 1,381 cases (14%) as suspicious for T2D. External validation at a distinct institution yielded a ROC AUC of 0.77, with 5% of patients subsequently diagnosed with T2D. Explainable AI techniques revealed correlations between specific adiposity measures and high predictivity, suggesting CXRs\u27 potential for enhanced T2D screening

Jefferson Digital Commons

Future-ai:International consensus guideline for trustworthy and deployable artificial intelligence in healthcare

Author: Abadía Mónica Cano
Abolmaesumi Purang
Aerts Hugo JWL
Albarqouni Shadi
Alberich Leonor Cerdá
Ammar Mohammed
Amugongo Lameck M
Aquino Yves Saint James
Ashrafuzzaman Md
Asselbergs Folkert W
Aussó Susanna
Awate Suyash
Beger Brigit
Bielikova Maria
Bobowicz Maciej
Botwe Benard O
Brown Pieta
Bruijne Marleen De
Buvat Irène
Buyx Alena
Cardoso M Jorge
Carter Stacy
Chan An-Wen
Chouvarda Ioanna
Cintas Celia
Colantonio Sara
Collins Gary
Cook Tessa
Donoso-Bach Lluís
Dou Qi
Duncan James
Dwivedi Girish
Díaz Oliver
Elattar Mustafa
Emelie Anais
Feragen Aasa
Ferrante Enzo
Fofanah Abdul Joseph
Fotiadis Dimitrios I
Frangi Alejandro F
Fritzsche Marie-Christine
Fromont Lauren A
Ghassemi Marzyeh
Gichoya Judy W
Glocker Ben
Goisauf Melanie
González Fabio A
Gordebeke Peter
Guevara Pamela
Jayakody Harsha
Joshi Smriti
Kaissis Georgios
Kalpathy-Cramer Jayashree
Khanal Bishesh
Klein Stefan
Kondylakis Haridimos
Krestin Gabriel P
Kushibar Kaisar
Lambin Philippe
Langlotz Curtis P
Lara Andrea
Lazrak Noussair
Lekadir Karim
Linguraru Marius George
Lu Qinghua
Mahmoud Mukhtar M E
Maier-Hein Lena
Marias Kostas
Marrakchi-Kacem Linda
Martí-Bonmatí Luis
Meijering Erik
Misuraca Gianluca
Mohammed Yunusa G
Mongan John
Mori Kensaku
Mutsvangwa Tinashe E M
Mzurikwao Deogratias
Nakasi Rose
Napel Sandy
Navarro Arcadi
Niessen Wiro J
Osuala Richard
Papanikolaou Nikolaos
Park Jinah
Petersen Steffen E
Phiri Lighton
Porras Antonio R
Prior Fred
Puig-Bosch Xènia
Pujol Oriol
Raviv Tammy Riklin
Rekik Islem
Rieke Nicola
Riklund Katrine
Rittner Leticia
Rogers Wendy A
Rueckert Daniel
Salahuddin Zohaib
Sall Ousmane
Salvado Olivier
Schnabel Julia A
Shabani Mahsa
Starmans Martijn P A
Tegenaw Geletaw S
Tolsgaard Martin G
Tsakou Gianna
Tsiknakis Manolis
Walsh Ian
Weicken Eva
Wenzel Markus
Woodruf Henry C
Wu Carol C
Yaqub Mohammad
Zahir Jihad
Zeng Yi
Zhou S Kevin
Zhussupov Doszhan
Zuluaga Maria A
Publication venue
Publication date: 11/08/2023
Field of study

Despite major advances in artificial intelligence (AI) for medicine and healthcare, the deployment and adoption of AI technologies remain limited in real-world clinical practice. In recent years, concerns have been raised about the technical, clinical, ethical and legal risks associated with medical AI. To increase real world adoption, it is essential that medical AI tools are trusted and accepted by patients, clinicians, health organisations and authorities. This work describes the FUTURE-AI guideline as the first international consensus framework for guiding the development and deployment of trustworthy AI tools in healthcare. The FUTURE-AI consortium was founded in 2021 and currently comprises 118 inter-disciplinary experts from 51 countries representing all continents, including AI scientists, clinicians, ethicists, and social scientists. Over a two-year period, the consortium defined guiding principles and best practices for trustworthy AI through an iterative process comprising an in-depth literature review, a modified Delphi survey, and online consensus meetings. The FUTURE-AI framework was established based on 6 guiding principles for trustworthy AI in healthcare, i.e. Fairness, Universality, Traceability, Usability, Robustness and Explainability. Through consensus, a set of 28 best practices were defined, addressing technical, clinical, legal and socio-ethical dimensions. The recommendations cover the entire lifecycle of medical AI, from design, development and validation to regulation, deployment, and monitoring. FUTURE-AI is a risk-informed, assumption-free guideline which provides a structured approach for constructing medical AI tools that will be trusted, deployed and adopted in real-world practice. Researchers are encouraged to take the recommendations into account in proof-of-concept stages to facilitate future translation towards clinical practice of medical AI

The University of Manchester - Institutional Repository