Search CORE

15 research outputs found

Non-Imaging Medical Data Synthesis for Trustworthy AI: A Comprehensive Survey

Author: Del Ser Javier
Stenson Iain
Walsh Simon
Wang Lichao
Wu Huanjun
Xing Xiaodan
Yang Guang
Yong May
Publication venue
Publication date: 17/09/2022
Field of study

Data quality is the key factor for the development of trustworthy AI in healthcare. A large volume of curated datasets with controlled confounding factors can help improve the accuracy, robustness and privacy of downstream AI algorithms. However, access to good quality datasets is limited by the technical difficulty of data acquisition and large-scale sharing of healthcare data is hindered by strict ethical restrictions. Data synthesis algorithms, which generate data with a similar distribution as real clinical data, can serve as a potential solution to address the scarcity of good quality data during the development of trustworthy AI. However, state-of-the-art data synthesis algorithms, especially deep learning algorithms, focus more on imaging data while neglecting the synthesis of non-imaging healthcare data, including clinical measurements, medical signals and waveforms, and electronic healthcare records (EHRs). Thus, in this paper, we will review the synthesis algorithms, particularly for non-imaging medical data, with the aim of providing trustworthy AI in this domain. This tutorial-styled review paper will provide comprehensive descriptions of non-imaging medical data synthesis on aspects including algorithms, evaluations, limitations and future research directions.Comment: 35 pages, Submitted to ACM Computing Survey

arXiv.org e-Print Archive

A method for machine learning generation of realistic synthetic datasets for validating healthcare applications

Author: Arvanitis Theodoros N.
Chaplin Rupert
Despotou George
Harrison Stuart
White Sean
Publication venue: 'SAGE Publications'
Publication date: 01/01/2022
Field of study

Digital health applications can improve quality and effectiveness of healthcare, by offering a number of new tools to users, which are often considered a medical device. Assuring their safe operation requires, amongst others, clinical validation, needing large datasets to test them in realistic clinical scenarios. Access to datasets is challenging, due to patient privacy concerns. Development of synthetic datasets is seen as a potential alternative. The objective of the paper is the development of a method for the generation of realistic synthetic datasets, statistically equivalent to real clinical datasets, and demonstrate that the Generative Adversarial Network (GAN) based approach is fit for purpose. A generative adversarial network was implemented and trained, in a series of six experiments, using numerical and categorical variables, including ICD-9 and laboratory codes, from three clinically relevant datasets. A number of contextual steps provided the success criteria for the synthetic dataset. A synthetic dataset that exhibits very similar statistical characteristics with the real dataset was generated. Pairwise association of variables is very similar. A high degree of Jaccard similarity and a successful K-S test further support this. The proof of concept of generating realistic synthetic datasets was successful, with the approach showing promise for further work

Warwick Research Archives Portal Repository

Artificial Intelligence for In Silico Clinical Trials: A Review

Author: Gao Chufan
Glass Lucas M.
Sun Jimeng
Wang Zifeng
Publication venue
Publication date: 16/09/2022
Field of study

A clinical trial is an essential step in drug development, which is often costly and time-consuming. In silico trials are clinical trials conducted digitally through simulation and modeling as an alternative to traditional clinical trials. AI-enabled in silico trials can increase the case group size by creating virtual cohorts as controls. In addition, it also enables automation and optimization of trial design and predicts the trial success rate. This article systematically reviews papers under three main topics: clinical simulation, individualized predictive modeling, and computer-aided trial design. We focus on how machine learning (ML) may be applied in these applications. In particular, we present the machine learning problem formulation and available data sources for each task. We end with discussing the challenges and opportunities of AI for in silico trials in real-world applications

arXiv.org e-Print Archive

Synthetic data generation for tabular health records: A systematic review

Author: Alberdi Ane
Cilla Rodrigo
Epelde Gorka
Hernandez Mikel
Rankin Debbie
Publication venue: 'Elsevier BV'
Publication date: 07/07/2022
Field of study

Ulster University's Research Portal

PPGAN: Privacy-preserving Generative Adversarial Network

Author: Liu Yi
Peng Jialiang
Wu Yi
Yu James J. Q
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 04/10/2019
Field of study

Generative Adversarial Network (GAN) and its variants serve as a perfect representation of the data generation model, providing researchers with a large amount of high-quality generated data. They illustrate a promising direction for research with limited data availability. When GAN learns the semantic-rich data distribution from a dataset, the density of the generated distribution tends to concentrate on the training data. Due to the gradient parameters of the deep neural network contain the data distribution of the training samples, they can easily remember the training samples. When GAN is applied to private or sensitive data, for instance, patient medical records, as private information may be leakage. To address this issue, we propose a Privacy-preserving Generative Adversarial Network (PPGAN) model, in which we achieve differential privacy in GANs by adding well-designed noise to the gradient during the model learning procedure. Besides, we introduced the Moments Accountant strategy in the PPGAN training process to improve the stability and compatibility of the model by controlling privacy loss. We also give a mathematical proof of the differential privacy discriminator. Through extensive case studies of the benchmark datasets, we demonstrate that PPGAN can generate high-quality synthetic data while retaining the required data available under a reasonable privacy budget.Comment: This paper was accepted by IEEE ICPADS 2019 Workshop. This paper contains 10 pages, 3 figure

arXiv.org e-Print Archive

Crossref

CONAN: Complementary Pattern Augmentation for Rare Disease Detection

Author: Biswal Siddharth
Cui Limeng
Glass Lucas M.
Lever Greg
Sun Jimeng
Xiao Cao
Publication venue
Publication date: 26/11/2019
Field of study

Rare diseases affect hundreds of millions of people worldwide but are hard to detect since they have extremely low prevalence rates (varying from 1/1,000 to 1/200,000 patients) and are massively underdiagnosed. How do we reliably detect rare diseases with such low prevalence rates? How to further leverage patients with possibly uncertain diagnosis to improve detection? In this paper, we propose a Complementary pattern Augmentation (CONAN) framework for rare disease detection. CONAN combines ideas from both adversarial training and max-margin classification. It first learns self-attentive and hierarchical embedding for patient pattern characterization. Then, we develop a complementary generative adversarial networks (GAN) model to generate candidate positive and negative samples from the uncertain patients by encouraging a max-margin between classes. In addition, CONAN has a disease detector that serves as the discriminator during the adversarial training for identifying rare diseases. We evaluated CONAN on two disease detection tasks. For low prevalence inflammatory bowel disease (IBD) detection, CONAN achieved .96 precision recall area under the curve (PR-AUC) and 50.1% relative improvement over best baseline. For rare disease idiopathic pulmonary fibrosis (IPF) detection, CONAN achieves .22 PR-AUC with 41.3% relative improvement over the best baseline

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Generative Adversarial Networks for Creating Synthetic Free-Text Medical Data: A Proposal for Collaborative Research and Re-use of Machine Learning Models

Author: Dexter Gregory
Grannis Shaun J.
Kasthurirathne Suranga N.
Publication venue: AMIA Informatics summit 2021 Conference Proceedings.
Publication date: 01/03/2021
Field of study

Restrictions in sharing Patient Health Identifiers (PHI) limit cross-organizational re-use of free-text medical data. We leverage Generative Adversarial Networks (GAN) to produce synthetic unstructured free-text medical data with low re-identification risk, and assess the suitability of these datasets to replicate machine learning models. We trained GAN models using unstructured free-text laboratory messages pertaining to salmonella, and identified the most accurate models for creating synthetic datasets that reflect the informational characteristics of the original dataset. Natural Language Generation metrics comparing the real and synthetic datasets demonstrated high similarity. Decision models generated using these datasets reported high performance metrics. There was no statistically significant difference in performance measures reported by models trained using real and synthetic datasets. Our results inform the use of GAN models to generate synthetic unstructured free-text data with limited re-identification risk, and use of this data to enable collaborative research and re-use of machine learning models

IUPUIScholarWorks