4,431 research outputs found
Generating Multi-Categorical Samples with Generative Adversarial Networks
We propose a method to train generative adversarial networks on mutivariate
feature vectors representing multiple categorical values. In contrast to the
continuous domain, where GAN-based methods have delivered considerable results,
GANs struggle to perform equally well on discrete data. We propose and compare
several architectures based on multiple (Gumbel) softmax output layers taking
into account the structure of the data. We evaluate the performance of our
architecture on datasets with different sparsity, number of features, ranges of
categorical values, and dependencies among the features. Our proposed
architecture and method outperforms existing models
Generating Multi-Categorical Samples with Generative Adversarial Networks
We propose a method to train generative adversarial networks on mutivariate feature vectors representing multiple categorical values. In contrast to the continuous domain, where GAN-based methods have delivered considerable results, GANs struggle to perform equally well on discrete data. We propose and compare several architectures based on multiple (Gumbel) softmax output layers taking into account the structure of the data. We evaluate the performance of our architecture on datasets with different sparsity, number of features, ranges of categorical values, and dependencies among the features. Our proposed architecture and method outperforms existing models
Improving Missing Data Imputation with Deep Generative Models
Datasets with missing values are very common on industry applications, and
they can have a negative impact on machine learning models. Recent studies
introduced solutions to the problem of imputing missing values based on deep
generative models. Previous experiments with Generative Adversarial Networks
and Variational Autoencoders showed interesting results in this domain, but it
is not clear which method is preferable for different use cases. The goal of
this work is twofold: we present a comparison between missing data imputation
solutions based on deep generative models, and we propose improvements over
those methodologies. We run our experiments using known real life datasets with
different characteristics, removing values at random and reconstructing them
with several imputation techniques. Our results show that the presence or
absence of categorical variables can alter the selection of the best model, and
that some models are more stable than others after similar runs with different
random number generator seeds
Synthetic Observational Health Data with GANs: from slow adoption to a boom in medical research and ultimately digital twins?
After being collected for patient care, Observational Health Data (OHD) can
further benefit patient well-being by sustaining the development of health
informatics and medical research. Vast potential is unexploited because of the
fiercely private nature of patient-related data and regulations to protect it.
Generative Adversarial Networks (GANs) have recently emerged as a
groundbreaking way to learn generative models that produce realistic synthetic
data. They have revolutionized practices in multiple domains such as
self-driving cars, fraud detection, digital twin simulations in industrial
sectors, and medical imaging.
The digital twin concept could readily apply to modelling and quantifying
disease progression. In addition, GANs posses many capabilities relevant to
common problems in healthcare: lack of data, class imbalance, rare diseases,
and preserving privacy. Unlocking open access to privacy-preserving OHD could
be transformative for scientific research. In the midst of COVID-19, the
healthcare system is facing unprecedented challenges, many of which of are data
related for the reasons stated above.
Considering these facts, publications concerning GAN applied to OHD seemed to
be severely lacking. To uncover the reasons for this slow adoption, we broadly
reviewed the published literature on the subject. Our findings show that the
properties of OHD were initially challenging for the existing GAN algorithms
(unlike medical imaging, for which state-of-the-art model were directly
transferable) and the evaluation synthetic data lacked clear metrics.
We find more publications on the subject than expected, starting slowly in
2017, and since then at an increasing rate. The difficulties of OHD remain, and
we discuss issues relating to evaluation, consistency, benchmarking, data
modelling, and reproducibility.Comment: 31 pages (10 in previous version), not including references and
glossary, 51 in total. Inclusion of a large number of recent publications and
expansion of the discussion accordingl
- …