4,230 research outputs found

    The ATEN Framework for Creating the Realistic Synthetic Electronic Health Record

    Get PDF
    Realistic synthetic data are increasingly being recognized as solutions to lack of data or privacy concerns in healthcare and other domains, yet little effort has been expended in establishing a generic framework for characterizing, achieving and validating realism in Synthetic Data Generation (SDG). The objectives of this paper are to: (1) present a characterization of the concept of realism as it applies to synthetic data; and (2) present and demonstrate application of the generic ATEN Framework for achieving and validating realism for SDG. The characterization of realism is developed through insights obtained from analysis of the literature on SDG. The development of the generic methods for achieving and validating realism for synthetic data was achieved by using knowledge discovery in databases (KDD), data mining enhanced with concept analysis and identification of characteristic, and classification rules. Application of this framework is demonstrated by using the synthetic Electronic Healthcare Record (EHR) for the domain of midwifery. The knowledge discovery process improves and expedites the generation process; having a more complex and complete understanding of the knowledge required to create the synthetic data significantly reduce the number of generation iterations. The validation process shows similar efficiencies through using the knowledge discovered as the elements for assessing the generated synthetic data. Successful validation supports claims of success and resolves whether the synthetic data is a sufficient replacement for real data. The ATEN Framework supports the researcher in identifying the knowledge elements that need to be synthesized, as well as supporting claims of sufficient realism through the use of that knowledge in a structured approach to validation. When used for SDG, the ATEN Framework enables a complete analysis of source data for knowledge necessary for correct generation. The ATEN Framework ensures the researcher that the synthetic data being created is realistic enough for the replacement of real data for a given use-case

    Synthetic data generator for electric vehicle charging sessions : modeling and evaluation using real-world data

    Get PDF
    Electric vehicle (EV) charging stations have become prominent in electricity grids in the past few years. Their increased penetration introduces both challenges and opportunities; they contribute to increased load, but also offer flexibility potential, e.g., in deferring the load in time. To analyze such scenarios, realistic EV data are required, which are hard to come by. Therefore, in this article we define a synthetic data generator (SDG) for EV charging sessions based on a large real-world dataset. Arrival times of EVs are modeled assuming that the inter-arrival times of EVs follow an exponential distribution. Connection time for EVs is dependent on the arrival time of EV, and can be described using a conditional probability distribution. This distribution is estimated using Gaussian mixture models, and departure times can calculated by sampling connection times for EV arrivals from this distribution. Our SDG is based on a novel method for the temporal modeling of EV sessions, and jointly models the arrival and departure times of EVs for a large number of charging stations. Our SDG was trained using real-world EV sessions, and used to generate synthetic samples of session data, which were statistically indistinguishable from the real-world data. We provide both (i) source code to train SDG models from new data, and (ii) trained models that reflect real-world datasets

    Methodological Considerations on EEG Electrical Reference: A Functional Brain-Heart Interplay Study

    Get PDF
    The growing interest in the study of functional brain-heart interplay (BHI) has motivated the development of novel methodological frameworks for its quantification. While a combination of electroencephalography (EEG) and heartbeat-derived series has been widely used, the role of EEG preprocessing on a BHI quantification is yet unknown. To this extent, here we investigate on four different EEG electrical referencing techniques associated with BHI quantifications over 4-minute resting-state in 15 healthy subjects. BHI methods include the synthetic data generation model, heartbeat-evoked potentials, heartbeat-evoked oscillations, and maximal information coefficient (MIC). EEG signals were offline referenced under the Cz channel, common average, mastoids average, and Laplacian method, and statistical comparisons were performed to assess similarities between references and between BHI techniques. Results show a topographical agreement between BHI estimation methods depending on the specific EEG reference. Major differences between BHI methods occur with the Laplacian reference, while major differences between EEG references are with the MIC analysis. We conclude that the choice of EEG electrical reference may significantly affect a functional BHI quantification

    The Generation of Synthetic Healthcare Data Using Deep Neural Networks

    Get PDF
    High-quality tabular data is a crucial requirement for developing data-driven applications, especially healthcare-related ones, because most of the data nowadays collected in this context is in tabular form. However, strict data protection laws introduced in Health Insurance Portability and Accountability (HIPAA) and General Data Protection Regulation (GDPR) present many obstacles to accessing and doing scientific research on healthcare datasets to protect patients’ privacy and confidentiality. Thus, synthetic data has become an ideal alternative for data scientists and healthcare professionals to circumvent such hurdles. Although many healthcare data providers still use the classical de-identification and anonymization techniques for generating synthetic data, deep learning-based generative models such as Generative Adversarial Networks (GANs) have shown a remarkable performance in generating tabular datasets with complex structures. Thus, this thesis examines the GANs’ potential and applicability within the healthcare industry, which often faces serious challenges with insufficient training data and patient records sensitivity. We investigate several state-of-the-art GAN-based models proposed for tabular synthetic data generation. Precisely, we assess the performance of TGAN, CTGAN, CTABGAN and WGAN-GP models on healthcare datasets with different sizes, numbers of variables, column data types, feature distributions, and inter-variable correlations. Moreover, a comprehensive evaluation framework is defined to evaluate the quality of the synthetic records and the viability of each model in preserving the patients’ privacy. After training the selected models and generating synthetic datasets, we evaluate the strengths and weaknesses of each model based on the statistical similarity metrics, machine learning-based evaluation scores, and distance-based privacy metrics. The results indicate that the proposed models can generate datasets that maintain the statistical characteristics, model compatibility, and privacy of the original ones. Moreover, synthetic tabular healthcare datasets can be a viable option in many data-driven applications. However, there is still room for further improvements in designing a perfect architecture for generating synthetic tabular data

    Methods for generating and evaluating synthetic longitudinal patient data: a systematic review

    Full text link
    The proliferation of data in recent years has led to the advancement and utilization of various statistical and deep learning techniques, thus expediting research and development activities. However, not all industries have benefited equally from the surge in data availability, partly due to legal restrictions on data usage and privacy regulations, such as in medicine. To address this issue, various statistical disclosure and privacy-preserving methods have been proposed, including the use of synthetic data generation. Synthetic data are generated based on some existing data, with the aim of replicating them as closely as possible and acting as a proxy for real sensitive data. This paper presents a systematic review of methods for generating and evaluating synthetic longitudinal patient data, a prevalent data type in medicine. The review adheres to the PRISMA guidelines and covers literature from five databases until the end of 2022. The paper describes 17 methods, ranging from traditional simulation techniques to modern deep learning methods. The collected information includes, but is not limited to, method type, source code availability, and approaches used to assess resemblance, utility, and privacy. Furthermore, the paper discusses practical guidelines and key considerations for developing synthetic longitudinal data generation methods

    Functional assessment of bidirectional cortical and peripheral neural control on heartbeat dynamics: A brain-heart study on thermal stress

    Get PDF
    The study of functional Brain-Heart Interplay (BHI) from non-invasive recordings has gained much interest in recent years. Previous endeavors aimed at understanding how the two dynamical systems exchange information, providing novel holistic biomarkers and important insights on essential cognitive aspects and neural system functioning. However, the interplay between cardiac sympathovagal and cortical oscillations still has much room for further investigation. In this study, we introduce a new computational framework for a functional BHI assessment, namely the Sympatho-Vagal Synthetic Data Generation Model, combining cortical (electroencephalography, EEG) and peripheral (cardiac sympathovagal) neural dynamics. The causal, bidirectional neural control on heartbeat dynamics was quantified on data gathered from 26 human volunteers undergoing a cold-pressor test. Results show that thermal stress induces heart-to-brain functional interplay sustained by EEG oscillations in the delta and gamma bands, primarily originating from sympathetic activity, whereas brain-to-heart interplay originates over central brain regions through sympathovagal control. The proposed methodology provides a viable computational tool for the functional assessment of the causal interplay between cortical and cardiac neural control

    Functional assessment of bidirectional cortical and peripheral neural control on heartbeat dynamics: A brain-heart study on thermal stress

    Get PDF
    The study of functional Brain-Heart Interplay (BHI) from non-invasive recordings has gained much interest in recent years. Previous endeavors aimed at understanding how the two dynamical systems exchange information, providing novel holistic biomarkers and important insights on essential cognitive aspects and neural system functioning. However, the interplay between cardiac sympathovagal and cortical oscillations still has much room for further investigation. In this study, we introduce a new computational framework for a functional BHI assessment, namely the Sympatho-Vagal Synthetic Data Generation Model, combining cortical (electroencephalography, EEG) and peripheral (cardiac sympathovagal) neural dynamics. The causal, bidirectional neural control on heartbeat dynamics was quantified on data gathered from 26 human volunteers undergoing a cold-pressor test. Results show that thermal stress induces heart-to-brain functional interplay sustained by EEG oscillations in the delta and gamma bands, primarily originating from sympathetic activity, whereas brain-to-heart interplay originates over central brain regions through sympathovagal control. The proposed methodology provides a viable computational tool for the functional assessment of the causal interplay between cortical and cardiac neural control
    • …
    corecore