895 research outputs found

    Synthetic Observational Health Data with GANs: from slow adoption to a boom in medical research and ultimately digital twins?

    Full text link
    After being collected for patient care, Observational Health Data (OHD) can further benefit patient well-being by sustaining the development of health informatics and medical research. Vast potential is unexploited because of the fiercely private nature of patient-related data and regulations to protect it. Generative Adversarial Networks (GANs) have recently emerged as a groundbreaking way to learn generative models that produce realistic synthetic data. They have revolutionized practices in multiple domains such as self-driving cars, fraud detection, digital twin simulations in industrial sectors, and medical imaging. The digital twin concept could readily apply to modelling and quantifying disease progression. In addition, GANs posses many capabilities relevant to common problems in healthcare: lack of data, class imbalance, rare diseases, and preserving privacy. Unlocking open access to privacy-preserving OHD could be transformative for scientific research. In the midst of COVID-19, the healthcare system is facing unprecedented challenges, many of which of are data related for the reasons stated above. Considering these facts, publications concerning GAN applied to OHD seemed to be severely lacking. To uncover the reasons for this slow adoption, we broadly reviewed the published literature on the subject. Our findings show that the properties of OHD were initially challenging for the existing GAN algorithms (unlike medical imaging, for which state-of-the-art model were directly transferable) and the evaluation synthetic data lacked clear metrics. We find more publications on the subject than expected, starting slowly in 2017, and since then at an increasing rate. The difficulties of OHD remain, and we discuss issues relating to evaluation, consistency, benchmarking, data modelling, and reproducibility.Comment: 31 pages (10 in previous version), not including references and glossary, 51 in total. Inclusion of a large number of recent publications and expansion of the discussion accordingl

    Representation Learning With Autoencoders For Electronic Health Records

    Get PDF
    Increasing volume of Electronic Health Records (EHR) in recent years provides great opportunities for data scientists to collaborate on different aspects of healthcare research by applying advanced analytics to these EHR clinical data. A key requirement however is obtaining meaningful insights from high dimensional, sparse and complex clinical data. Data science approaches typically address this challenge by performing feature learning in order to build more reliable and informative feature representations from clinical data followed by supervised learning. In this research, we propose a predictive modeling approach based on deep feature representations and word embedding techniques. Our method uses different deep architectures (stacked sparse autoencoders, deep belief network, adversarial autoencoders and variational autoencoders) for feature representation in higher-level abstraction to obtain effective and robust features from EHRs, and then build prediction models on top of them. Our approach is particularly useful when the unlabeled data is abundant whereas labeled data is scarce. We investigate the performance of representation learning through a supervised learning approach. Our focus is to present a comparative study to evaluate the performance of different deep architectures through supervised learning and provide insights for the choice of deep feature representation techniques. Our experiments demonstrate that for small data sets, stacked sparse autoencoder demonstrates a superior generality performance in prediction due to sparsity regularization whereas variational autoencoders outperform the competing approaches for large data sets due to its capability of learning the representation distribution

    Applications of Machine Learning in Medical Prognosis Using Electronic Medical Records

    Get PDF
    Approximately 84 % of hospitals are adopting electronic medical records (EMR) In the United States. EMR is a vital resource to help clinicians diagnose the onset or predict the future condition of a specific disease. With machine learning advances, many research projects attempt to extract medically relevant and actionable data from massive EMR databases using machine learning algorithms. However, collecting patients\u27 prognosis factors from Electronic EMR is challenging due to privacy, sensitivity, and confidentiality. In this study, we developed medical generative adversarial networks (GANs) to generate synthetic EMR prognosis factors using minimal information collected during routine care in specialized healthcare facilities. The generated prognosis variables used in developing predictive models for (1) chronic wound healing in patients diagnosed with Venous Leg Ulcers (VLUs) and (2) antibiotic resistance in patients diagnosed with Skin and soft tissue infections (SSTIs). Our proposed medical GANs, EMR-TCWGAN and DermaGAN, can produce both continuous and categorical features from EMR. We utilized conditional training strategies to enhance training and generate classified data regarding healing vs. non-healing in EMR-TCWGAN and susceptibility vs. resistance in DermGAN. The ability of the proposed GAN models to generate realistic EMR data was evaluated by TSTR (test on the synthetic, train on the real), discriminative accuracy, and visualization. We analyzed the synthetic data augmentation technique\u27s practicality in improving the wound healing prognostic model and antibiotic resistance classifier. We achieved the area under the curve (AUC) of 0.875 in the wound healing prognosis model and an average AUC of 0.830 in the antibiotic resistance classifier by using the synthetic samples generated by GANs in the training process. These results suggest that GANs can be considered a data augmentation method to generate realistic EMR data

    The Significance of Machine Learning in Clinical Disease Diagnosis: A Review

    Full text link
    The global need for effective disease diagnosis remains substantial, given the complexities of various disease mechanisms and diverse patient symptoms. To tackle these challenges, researchers, physicians, and patients are turning to machine learning (ML), an artificial intelligence (AI) discipline, to develop solutions. By leveraging sophisticated ML and AI methods, healthcare stakeholders gain enhanced diagnostic and treatment capabilities. However, there is a scarcity of research focused on ML algorithms for enhancing the accuracy and computational efficiency. This research investigates the capacity of machine learning algorithms to improve the transmission of heart rate data in time series healthcare metrics, concentrating particularly on optimizing accuracy and efficiency. By exploring various ML algorithms used in healthcare applications, the review presents the latest trends and approaches in ML-based disease diagnosis (MLBDD). The factors under consideration include the algorithm utilized, the types of diseases targeted, the data types employed, the applications, and the evaluation metrics. This review aims to shed light on the prospects of ML in healthcare, particularly in disease diagnosis. By analyzing the current literature, the study provides insights into state-of-the-art methodologies and their performance metrics.Comment: 8 page

    Dual autoencoders modeling of electronic health records for adverse drug event preventability prediction

    Get PDF
    Background Elderly patients are at increased risk for Adverse Drug Events (ADEs). Proactively screening elderly people visiting the emergency department for the possibility of their hospital admission being drug-related helps to improve patient care as well as prevent potential unnecessary medical costs. Existing routine ADE assessment heavily relies on a rule-based checking process. Recently, machine learning methods have been shown to be effective in automating the detection of ADEs, however, most approaches used only either structured data or free texts for their feature engineering. How to better exploit all available EHRs data for better predictive modeling remains an important question. On the other hand, automated reasoning for the preventability of ADEs is still a nascent line of research. Methods Clinical information of 714 elderly ED-visit patients with ADE preventability labels was provided as ground truth data by Jeroen Bosch Ziekenhuis hospital, the Netherlands. Methods were developed to address the challenges of applying feature engineering to heterogeneous EHRs data. A Dual Autoencoders (2AE) model was proposed to solve the problem of imbalance embedded in the existing training data. Results Experimental results showed that 2AE can capture the patterns of the minority class without incorporating an extra process for class balancing. 2AE yields adequate performance and outperforms other more mainstream approaches, resulting in an AUPRC score of 0.481. Conclusions We have demonstrated how machine learning can be employed to analyze both structured and unstructured data from electronic health records for the purpose of preventable ADE prediction. The developed algorithm 2AE can be used to effectively learn minority group phenotype from imbalanced data
    • …
    corecore