115 research outputs found
Real-valued (Medical) Time Series Generation with Recurrent Conditional GANs
Generative Adversarial Networks (GANs) have shown remarkable success as a
framework for training models to produce realistic-looking data. In this work,
we propose a Recurrent GAN (RGAN) and Recurrent Conditional GAN (RCGAN) to
produce realistic real-valued multi-dimensional time series, with an emphasis
on their application to medical data. RGANs make use of recurrent neural
networks in the generator and the discriminator. In the case of RCGANs, both of
these RNNs are conditioned on auxiliary information. We demonstrate our models
in a set of toy datasets, where we show visually and quantitatively (using
sample likelihood and maximum mean discrepancy) that they can successfully
generate realistic time-series. We also describe novel evaluation methods for
GANs, where we generate a synthetic labelled training dataset, and evaluate on
a real test set the performance of a model trained on the synthetic data, and
vice-versa. We illustrate with these metrics that RCGANs can generate
time-series data useful for supervised training, with only minor degradation in
performance on real test data. This is demonstrated on digit classification
from 'serialised' MNIST and by training an early warning system on a medical
dataset of 17,000 patients from an intensive care unit. We further discuss and
analyse the privacy concerns that may arise when using RCGANs to generate
realistic synthetic medical time series data.Comment: 13 pages, 4 figures, 3 tables (update with differential privacy
Generative Adversarial Networks for Electronic Health Records: A Framework for Exploring and Evaluating Methods for Predicting Drug-Induced Laboratory Test Trajectories
Generative Adversarial Networks (GANs) represent a promising class of
generative networks that combine neural networks with game theory. From
generating realistic images and videos to assisting musical creation, GANs are
transforming many fields of arts and sciences. However, their application to
healthcare has not been fully realized, more specifically in generating
electronic health records (EHR) data. In this paper, we propose a framework for
exploring the value of GANs in the context of continuous laboratory time series
data. We devise an unsupervised evaluation method that measures the predictive
power of synthetic laboratory test time series. Further, we show that when it
comes to predicting the impact of drug exposure on laboratory test data,
incorporating representation learning of the training cohorts prior to training
GAN models is beneficial.Comment: NIPS ML4H 201
Generating Synthetic but Plausible Healthcare Record Datasets
Generating datasets that "look like" given real ones is an interesting tasks
for healthcare applications of ML and many other fields of science and
engineering. In this paper we propose a new method of general application to
binary datasets based on a method for learning the parameters of a latent
variable moment that we have previously used for clustering patient datasets.
We compare our method with a recent proposal (MedGan) based on generative
adversarial methods and find that the synthetic datasets we generate are
globally more realistic in at least two senses: real and synthetic instances
are harder to tell apart by Random Forests, and the MMD statistic. The most
likely explanation is that our method does not suffer from the "mode collapse"
which is an admitted problem of GANs. Additionally, the generative models we
generate are easy to interpret, unlike the rather obscure GANs. Our experiments
are performed on two patient datasets containing ICD-9 diagnostic codes: the
publicly available MIMIC-III dataset and a dataset containing admissions for
congestive heart failure during 7 years at Hospital de Sant Pau in Barcelona.Comment: MLMH 2018 : 2018 KDD workshop on Machine Learning for Medicine and
Healthcar
tsBNgen: A Python Library to Generate Time Series Data from an Arbitrary Dynamic Bayesian Network Structure
Synthetic data is widely used in various domains. This is because many modern
algorithms require lots of data for efficient training, and data collection and
labeling usually are a time-consuming process and are prone to errors.
Furthermore, some real-world data, due to its nature, is confidential and
cannot be shared. Bayesian networks are a type of probabilistic graphical model
widely used to model the uncertainties in real-world processes. Dynamic
Bayesian networks are a special class of Bayesian networks that model temporal
and time series data. In this paper, we introduce the tsBNgen, a Python library
to generate time series and sequential data based on an arbitrary dynamic
Bayesian network. The package, documentation, and examples can be downloaded
from https://github.com/manitadayon/tsBNgen
Interaction-aware Multi-agent Tracking and Probabilistic Behavior Prediction via Adversarial Learning
In order to enable high-quality decision making and motion planning of
intelligent systems such as robotics and autonomous vehicles, accurate
probabilistic predictions for surrounding interactive objects is a crucial
prerequisite. Although many research studies have been devoted to making
predictions on a single entity, it remains an open challenge to forecast future
behaviors for multiple interactive agents simultaneously. In this work, we take
advantage of the Generative Adversarial Network (GAN) due to its capability of
distribution learning and propose a generic multi-agent probabilistic
prediction and tracking framework which takes the interactions among multiple
entities into account, in which all the entities are treated as a whole.
However, since GAN is very hard to train, we make an empirical research and
present the relationship between training performance and hyperparameter values
with a numerical case study. The results imply that the proposed model can
capture both the mean, variance and multi-modalities of the groundtruth
distribution. Moreover, we apply the proposed approach to a real-world task of
vehicle behavior prediction to demonstrate its effectiveness and accuracy. The
results illustrate that the proposed model trained by adversarial learning can
achieve a better prediction performance than other state-of-the-art models
trained by traditional supervised learning which maximizes the data likelihood.
The well-trained model can also be utilized as an implicit proposal
distribution for particle filtered based Bayesian state estimation.Comment: Accepted by 2019 International Conference on Robotics and Automation
(ICRA
Synthetic Event Time Series Health Data Generation
Synthetic medical data which preserves privacy while maintaining utility can
be used as an alternative to real medical data, which has privacy costs and
resource constraints associated with it. At present, most models focus on
generating cross-sectional health data which is not necessarily representative
of real data. In reality, medical data is longitudinal in nature, with a single
patient having multiple health events, non-uniformly distributed throughout
their lifetime. These events are influenced by patient covariates such as
comorbidities, age group, gender etc. as well as external temporal effects
(e.g. flu season). While there exist seminal methods to model time series data,
it becomes increasingly challenging to extend these methods to medical event
time series data. Due to the complexity of the real data, in which each patient
visit is an event, we transform the data by using summary statistics to
characterize the events for a fixed set of time intervals, to facilitate
analysis and interpretability. We then train a generative adversarial network
to generate synthetic data. We demonstrate this approach by generating human
sleep patterns, from a publicly available dataset. We empirically evaluate the
generated data and show close univariate resemblance between synthetic and real
data. However, we also demonstrate how stratification by covariates is required
to gain a deeper understanding of synthetic data quality.Comment: Machine Learning for Health (ML4H) at NeurIPS 2019 - Extended
Abstrac
Measuring the quality of Synthetic data for use in competitions
Machine learning has the potential to assist many communities in using the
large datasets that are becoming more and more available. Unfortunately, much
of that potential is not being realized because it would require sharing data
in a way that compromises privacy. In order to overcome this hurdle, several
methods have been proposed that generate synthetic data while preserving the
privacy of the real data. In this paper we consider a key characteristic that
synthetic data should have in order to be useful for machine learning
researchers - the relative performance of two algorithms (trained and tested)
on the synthetic dataset should be the same as their relative performance (when
trained and tested) on the original dataset.Comment: 3 pages, 1 figure, 2018 KDD Workshop on Machine Learning for Medicine
and Healthcar
Synthesizing Tabular Data using Generative Adversarial Networks
Generative adversarial networks (GANs) implicitly learn the probability
distribution of a dataset and can draw samples from the distribution. This
paper presents, Tabular GAN (TGAN), a generative adversarial network which can
generate tabular data like medical or educational records. Using the power of
deep neural networks, TGAN generates high-quality and fully synthetic tables
while simultaneously generating discrete and continuous variables. When we
evaluate our model on three datasets, we find that TGAN outperforms
conventional statistical generative models in both capturing the correlation
between columns and scaling up for large datasets
Feedback GAN (FBGAN) for DNA: a Novel Feedback-Loop Architecture for Optimizing Protein Functions
Generative Adversarial Networks (GANs) represent an attractive and novel
approach to generate realistic data, such as genes, proteins, or drugs, in
synthetic biology. Here, we apply GANs to generate synthetic DNA sequences
encoding for proteins of variable length. We propose a novel feedback-loop
architecture, called Feedback GAN (FBGAN), to optimize the synthetic gene
sequences for desired properties using an external function analyzer. The
proposed architecture also has the advantage that the analyzer need not be
differentiable. We apply the feedback-loop mechanism to two examples: 1)
generating synthetic genes coding for antimicrobial peptides, and 2) optimizing
synthetic genes for the secondary structure of their resulting peptides. A
suite of metrics demonstrate that the GAN generated proteins have desirable
biophysical properties. The FBGAN architecture can also be used to optimize
GAN-generated datapoints for useful properties in domains beyond genomics
Automated Treatment Planning in Radiation Therapy using Generative Adversarial Networks
Knowledge-based planning (KBP) is an automated approach to radiation therapy
treatment planning that involves predicting desirable treatment plans before
they are then corrected to deliverable ones. We propose a generative
adversarial network (GAN) approach for predicting desirable 3D dose
distributions that eschews the previous paradigms of site-specific feature
engineering and predicting low-dimensional representations of the plan.
Experiments on a dataset of oropharyngeal cancer patients show that our
approach significantly outperforms previous methods on several clinical
satisfaction criteria and similarity metrics.Comment: 15 pages. Accepted for publication in PMLR. Presented at Machine
Learning for Health Car
- …