4,230 research outputs found
The ATEN Framework for Creating the Realistic Synthetic Electronic Health Record
Realistic synthetic data are increasingly being recognized as solutions to lack of data or privacy concerns in healthcare and other domains, yet little effort has been expended in establishing a generic framework for characterizing, achieving and validating realism in Synthetic Data Generation (SDG). The objectives of this paper are to: (1) present a characterization of the concept of realism as it applies to synthetic data; and (2) present and demonstrate application of the generic ATEN Framework for achieving and validating realism for SDG. The characterization of realism is developed through insights obtained from analysis of the literature on SDG. The development of the generic methods for achieving and validating realism for synthetic data was achieved by using knowledge discovery in databases (KDD), data mining enhanced with concept analysis and identification of characteristic, and classification rules. Application of this framework is demonstrated by using the synthetic Electronic Healthcare Record (EHR) for the domain of midwifery. The knowledge discovery process improves and expedites the generation process; having a more complex and complete understanding of the knowledge required to create the synthetic data significantly reduce the number of generation iterations. The validation process shows similar efficiencies through using the knowledge discovered as the elements for assessing the generated synthetic data. Successful validation supports claims of success and resolves whether the synthetic data is a sufficient replacement for real data. The ATEN Framework supports the researcher in identifying the knowledge elements that need to be synthesized, as well as supporting claims of sufficient realism through the use of that knowledge in a structured approach to validation. When used for SDG, the ATEN Framework enables a complete analysis of source data for knowledge necessary for correct generation. The ATEN Framework ensures the researcher that the synthetic data being created is realistic enough for the replacement of real data for a given use-case
Synthetic data generator for electric vehicle charging sessions : modeling and evaluation using real-world data
Electric vehicle (EV) charging stations have become prominent in electricity grids in the past few years. Their increased penetration introduces both challenges and opportunities; they contribute to increased load, but also offer flexibility potential, e.g., in deferring the load in time. To analyze such scenarios, realistic EV data are required, which are hard to come by. Therefore, in this article we define a synthetic data generator (SDG) for EV charging sessions based on a large real-world dataset. Arrival times of EVs are modeled assuming that the inter-arrival times of EVs follow an exponential distribution. Connection time for EVs is dependent on the arrival time of EV, and can be described using a conditional probability distribution. This distribution is estimated using Gaussian mixture models, and departure times can calculated by sampling connection times for EV arrivals from this distribution. Our SDG is based on a novel method for the temporal modeling of EV sessions, and jointly models the arrival and departure times of EVs for a large number of charging stations. Our SDG was trained using real-world EV sessions, and used to generate synthetic samples of session data, which were statistically indistinguishable from the real-world data. We provide both (i) source code to train SDG models from new data, and (ii) trained models that reflect real-world datasets
Recommended from our members
Strategies for successful field deployment in a resource-poor region: Arsenic remediation technology for drinking water
Strong long-term international partnership in science, technology, finance and policy is critical for sustainable field experiments leading to successful commercial deployment of novel technology at community-scale. Although technologies already exist that can remediate arsenic in groundwater, most are too expensive or too complicated to operate on a sustained basis in resource-poor communities with the low technical skill common in rural South Asia. To address this specific problem, researchers at University of California-Berkeley (UCB) and Lawrence Berkeley National Laboratory (LBNL) invented a technology in 2006 called electrochemical arsenic remediation (ECAR). Since 2010, researchers at UCB and LBNL have collaborated with Global Change Program of Jadavpur University (GCP-JU) in West Bengal, India for its social embedding alongside a local private industry group, and with financial support from the Indo-US Technology Forum (IUSSTF) over 2012–2017. During the first 10 months of pilot plant operation (April 2016 to January 2017) a total of 540 m3 (540,000 L) of arsenic-safe water was produced, consistently and reliably reducing arsenic concentrations from initial 252 ± 29 to final 2.9 ± 1 parts per billion (ppb). This paper presents the critical strategies in taking a technology from a lab in the USA to the field in India for commercialization to address the technical, socio-economic, and political aspects of the arsenic public health crisis while targeting several sustainable development goals (SDGs). The lessons learned highlight the significance of designing a technology contextually, bridging the knowledge divide, supporting local livelihoods, and complying with local regulations within a defined Critical Effort Zone period with financial support from an insightful funding source focused on maturing inventions and turning them into novel technologies for commercial scale-up. Along the way, building trust with the community through repetitive direct interactions, and communication by the scientists, proved vital for bridging the technology-society gap at a critical stage of technology deployment. The information presented here fills a knowledge gap regarding successful case studies in which the arsenic remediation technology obtains social acceptance and sustains technical performance over time, while operating with financial viability
Methodological Considerations on EEG Electrical Reference: A Functional Brain-Heart Interplay Study
The growing interest in the study of functional brain-heart interplay (BHI) has motivated the development of novel methodological frameworks for its quantification. While a combination of electroencephalography (EEG) and heartbeat-derived series has been widely used, the role of EEG preprocessing on a BHI quantification is yet unknown. To this extent, here we investigate on four different EEG electrical referencing techniques associated with BHI quantifications over 4-minute resting-state in 15 healthy subjects. BHI methods include the synthetic data generation model, heartbeat-evoked potentials, heartbeat-evoked oscillations, and maximal information coefficient (MIC). EEG signals were offline referenced under the Cz channel, common average, mastoids average, and Laplacian method, and statistical comparisons were performed to assess similarities between references and between BHI techniques. Results show a topographical agreement between BHI estimation methods depending on the specific EEG reference. Major differences between BHI methods occur with the Laplacian reference, while major differences between EEG references are with the MIC analysis. We conclude that the choice of EEG electrical reference may significantly affect a functional BHI quantification
The Generation of Synthetic Healthcare Data Using Deep Neural Networks
High-quality tabular data is a crucial requirement for developing data-driven applications,
especially healthcare-related ones, because most of the data nowadays
collected in this context is in tabular form. However, strict data protection laws introduced
in Health Insurance Portability and Accountability (HIPAA) and General
Data Protection Regulation (GDPR) present many obstacles to accessing and doing
scientific research on healthcare datasets to protect patients’ privacy and confidentiality.
Thus, synthetic data has become an ideal alternative for data scientists and
healthcare professionals to circumvent such hurdles. Although many healthcare data
providers still use the classical de-identification and anonymization techniques for
generating synthetic data, deep learning-based generative models such as Generative
Adversarial Networks (GANs) have shown a remarkable performance in generating
tabular datasets with complex structures. Thus, this thesis examines the GANs’
potential and applicability within the healthcare industry, which often faces serious
challenges with insufficient training data and patient records sensitivity.
We investigate several state-of-the-art GAN-based models proposed for tabular synthetic
data generation. Precisely, we assess the performance of TGAN, CTGAN,
CTABGAN and WGAN-GP models on healthcare datasets with different sizes,
numbers of variables, column data types, feature distributions, and inter-variable
correlations. Moreover, a comprehensive evaluation framework is defined to evaluate
the quality of the synthetic records and the viability of each model in preserving
the patients’ privacy. After training the selected models and generating
synthetic datasets, we evaluate the strengths and weaknesses of each model based
on the statistical similarity metrics, machine learning-based evaluation scores, and
distance-based privacy metrics.
The results indicate that the proposed models can generate datasets that maintain
the statistical characteristics, model compatibility, and privacy of the original ones.
Moreover, synthetic tabular healthcare datasets can be a viable option in many
data-driven applications. However, there is still room for further improvements in
designing a perfect architecture for generating synthetic tabular data
Methods for generating and evaluating synthetic longitudinal patient data: a systematic review
The proliferation of data in recent years has led to the advancement and
utilization of various statistical and deep learning techniques, thus
expediting research and development activities. However, not all industries
have benefited equally from the surge in data availability, partly due to legal
restrictions on data usage and privacy regulations, such as in medicine. To
address this issue, various statistical disclosure and privacy-preserving
methods have been proposed, including the use of synthetic data generation.
Synthetic data are generated based on some existing data, with the aim of
replicating them as closely as possible and acting as a proxy for real
sensitive data. This paper presents a systematic review of methods for
generating and evaluating synthetic longitudinal patient data, a prevalent data
type in medicine. The review adheres to the PRISMA guidelines and covers
literature from five databases until the end of 2022. The paper describes 17
methods, ranging from traditional simulation techniques to modern deep learning
methods. The collected information includes, but is not limited to, method
type, source code availability, and approaches used to assess resemblance,
utility, and privacy. Furthermore, the paper discusses practical guidelines and
key considerations for developing synthetic longitudinal data generation
methods
Functional assessment of bidirectional cortical and peripheral neural control on heartbeat dynamics: A brain-heart study on thermal stress
The study of functional Brain-Heart Interplay (BHI) from non-invasive recordings has gained much interest in recent years. Previous endeavors aimed at understanding how the two dynamical systems exchange information, providing novel holistic biomarkers and important insights on essential cognitive aspects and neural system functioning. However, the interplay between cardiac sympathovagal and cortical oscillations still has much room for further investigation. In this study, we introduce a new computational framework for a functional BHI assessment, namely the Sympatho-Vagal Synthetic Data Generation Model, combining cortical (electroencephalography, EEG) and peripheral (cardiac sympathovagal) neural dynamics. The causal, bidirectional neural control on heartbeat dynamics was quantified on data gathered from 26 human volunteers undergoing a cold-pressor test. Results show that thermal stress induces heart-to-brain functional interplay sustained by EEG oscillations in the delta and gamma bands, primarily originating from sympathetic activity, whereas brain-to-heart interplay originates over central brain regions through sympathovagal control. The proposed methodology provides a viable computational tool for the functional assessment of the causal interplay between cortical and cardiac neural control
Functional assessment of bidirectional cortical and peripheral neural control on heartbeat dynamics: A brain-heart study on thermal stress
The study of functional Brain-Heart Interplay (BHI) from non-invasive recordings has gained much interest in recent years. Previous endeavors aimed at understanding how the two dynamical systems exchange information, providing novel holistic biomarkers and important insights on essential cognitive aspects and neural system functioning. However, the interplay between cardiac sympathovagal and cortical oscillations still has much room for further investigation. In this study, we introduce a new computational framework for a functional BHI assessment, namely the Sympatho-Vagal Synthetic Data Generation Model, combining cortical (electroencephalography, EEG) and peripheral (cardiac sympathovagal) neural dynamics. The causal, bidirectional neural control on heartbeat dynamics was quantified on data gathered from 26 human volunteers undergoing a cold-pressor test. Results show that thermal stress induces heart-to-brain functional interplay sustained by EEG oscillations in the delta and gamma bands, primarily originating from sympathetic activity, whereas brain-to-heart interplay originates over central brain regions through sympathovagal control. The proposed methodology provides a viable computational tool for the functional assessment of the causal interplay between cortical and cardiac neural control
- …