895 research outputs found
Synthetic Observational Health Data with GANs: from slow adoption to a boom in medical research and ultimately digital twins?
After being collected for patient care, Observational Health Data (OHD) can
further benefit patient well-being by sustaining the development of health
informatics and medical research. Vast potential is unexploited because of the
fiercely private nature of patient-related data and regulations to protect it.
Generative Adversarial Networks (GANs) have recently emerged as a
groundbreaking way to learn generative models that produce realistic synthetic
data. They have revolutionized practices in multiple domains such as
self-driving cars, fraud detection, digital twin simulations in industrial
sectors, and medical imaging.
The digital twin concept could readily apply to modelling and quantifying
disease progression. In addition, GANs posses many capabilities relevant to
common problems in healthcare: lack of data, class imbalance, rare diseases,
and preserving privacy. Unlocking open access to privacy-preserving OHD could
be transformative for scientific research. In the midst of COVID-19, the
healthcare system is facing unprecedented challenges, many of which of are data
related for the reasons stated above.
Considering these facts, publications concerning GAN applied to OHD seemed to
be severely lacking. To uncover the reasons for this slow adoption, we broadly
reviewed the published literature on the subject. Our findings show that the
properties of OHD were initially challenging for the existing GAN algorithms
(unlike medical imaging, for which state-of-the-art model were directly
transferable) and the evaluation synthetic data lacked clear metrics.
We find more publications on the subject than expected, starting slowly in
2017, and since then at an increasing rate. The difficulties of OHD remain, and
we discuss issues relating to evaluation, consistency, benchmarking, data
modelling, and reproducibility.Comment: 31 pages (10 in previous version), not including references and
glossary, 51 in total. Inclusion of a large number of recent publications and
expansion of the discussion accordingl
Representation Learning With Autoencoders For Electronic Health Records
Increasing volume of Electronic Health Records (EHR) in recent years provides great opportunities for data scientists to collaborate on different aspects of healthcare research by applying advanced analytics to these EHR clinical data. A key requirement however
is obtaining meaningful insights from high dimensional, sparse and complex clinical data. Data science approaches typically address this challenge by performing feature learning in order to build more reliable and informative feature representations from clinical data followed by supervised learning. In this research, we propose a predictive modeling approach based on deep feature representations and word embedding techniques. Our method uses different deep architectures (stacked sparse autoencoders, deep belief network, adversarial autoencoders and variational autoencoders) for feature representation in higher-level abstraction to obtain effective and robust features from EHRs, and then build prediction models on top of them. Our approach is particularly useful when the unlabeled data is abundant whereas labeled data is scarce. We investigate the performance of representation learning through a supervised learning approach. Our focus is to present a comparative study to evaluate the performance of different deep architectures through supervised learning and provide insights for the choice of deep feature representation techniques. Our experiments demonstrate that for small data sets, stacked sparse autoencoder demonstrates a superior generality performance in prediction due to sparsity regularization whereas variational autoencoders outperform the competing approaches for large data sets due to its capability of learning the representation distribution
Applications of Machine Learning in Medical Prognosis Using Electronic Medical Records
Approximately 84 % of hospitals are adopting electronic medical records (EMR) In the United States. EMR is a vital resource to help clinicians diagnose the onset or predict the future condition of a specific disease. With machine learning advances, many research projects attempt to extract medically relevant and actionable data from massive EMR databases using machine learning algorithms. However, collecting patients\u27 prognosis factors from Electronic EMR is challenging due to privacy, sensitivity, and confidentiality. In this study, we developed medical generative adversarial networks (GANs) to generate synthetic EMR prognosis factors using minimal information collected during routine care in specialized healthcare facilities. The generated prognosis variables used in developing predictive models for (1) chronic wound healing in patients diagnosed with Venous Leg Ulcers (VLUs) and (2) antibiotic resistance in patients diagnosed with Skin and soft tissue infections (SSTIs). Our proposed medical GANs, EMR-TCWGAN and DermaGAN, can produce both continuous and categorical features from EMR. We utilized conditional training strategies to enhance training and generate classified data regarding healing vs. non-healing in EMR-TCWGAN and susceptibility vs. resistance in DermGAN. The ability of the proposed GAN models to generate realistic EMR data was evaluated by TSTR (test on the synthetic, train on the real), discriminative accuracy, and visualization. We analyzed the synthetic data augmentation technique\u27s practicality in improving the wound healing prognostic model and antibiotic resistance classifier. We achieved the area under the curve (AUC) of 0.875 in the wound healing prognosis model and an average AUC of 0.830 in the antibiotic resistance classifier by using the synthetic samples generated by GANs in the training process. These results suggest that GANs can be considered a data augmentation method to generate realistic EMR data
The Significance of Machine Learning in Clinical Disease Diagnosis: A Review
The global need for effective disease diagnosis remains substantial, given
the complexities of various disease mechanisms and diverse patient symptoms. To
tackle these challenges, researchers, physicians, and patients are turning to
machine learning (ML), an artificial intelligence (AI) discipline, to develop
solutions. By leveraging sophisticated ML and AI methods, healthcare
stakeholders gain enhanced diagnostic and treatment capabilities. However,
there is a scarcity of research focused on ML algorithms for enhancing the
accuracy and computational efficiency. This research investigates the capacity
of machine learning algorithms to improve the transmission of heart rate data
in time series healthcare metrics, concentrating particularly on optimizing
accuracy and efficiency. By exploring various ML algorithms used in healthcare
applications, the review presents the latest trends and approaches in ML-based
disease diagnosis (MLBDD). The factors under consideration include the
algorithm utilized, the types of diseases targeted, the data types employed,
the applications, and the evaluation metrics. This review aims to shed light on
the prospects of ML in healthcare, particularly in disease diagnosis. By
analyzing the current literature, the study provides insights into
state-of-the-art methodologies and their performance metrics.Comment: 8 page
Dual autoencoders modeling of electronic health records for adverse drug event preventability prediction
Background Elderly patients are at increased risk for Adverse Drug Events (ADEs). Proactively screening elderly people visiting the emergency department for the possibility of their hospital admission being drug-related helps to improve patient care as well as prevent potential unnecessary medical costs. Existing routine ADE assessment heavily relies on a rule-based checking process. Recently, machine learning methods have been shown to be effective in automating the detection of ADEs, however, most approaches used only either structured data or free texts for their feature engineering. How to better exploit all available EHRs data for better predictive modeling remains an important question. On the other hand, automated reasoning for the preventability of ADEs is still a nascent line of research. Methods Clinical information of 714 elderly ED-visit patients with ADE preventability labels was provided as ground truth data by Jeroen Bosch Ziekenhuis hospital, the Netherlands. Methods were developed to address the challenges of applying feature engineering to heterogeneous EHRs data. A Dual Autoencoders (2AE) model was proposed to solve the problem of imbalance embedded in the existing training data. Results Experimental results showed that 2AE can capture the patterns of the minority class without incorporating an extra process for class balancing. 2AE yields adequate performance and outperforms other more mainstream approaches, resulting in an AUPRC score of 0.481. Conclusions We have demonstrated how machine learning can be employed to analyze both structured and unstructured data from electronic health records for the purpose of preventable ADE prediction. The developed algorithm 2AE can be used to effectively learn minority group phenotype from imbalanced data
- …