16,019 research outputs found
Synthetic Observational Health Data with GANs: from slow adoption to a boom in medical research and ultimately digital twins?
After being collected for patient care, Observational Health Data (OHD) can
further benefit patient well-being by sustaining the development of health
informatics and medical research. Vast potential is unexploited because of the
fiercely private nature of patient-related data and regulations to protect it.
Generative Adversarial Networks (GANs) have recently emerged as a
groundbreaking way to learn generative models that produce realistic synthetic
data. They have revolutionized practices in multiple domains such as
self-driving cars, fraud detection, digital twin simulations in industrial
sectors, and medical imaging.
The digital twin concept could readily apply to modelling and quantifying
disease progression. In addition, GANs posses many capabilities relevant to
common problems in healthcare: lack of data, class imbalance, rare diseases,
and preserving privacy. Unlocking open access to privacy-preserving OHD could
be transformative for scientific research. In the midst of COVID-19, the
healthcare system is facing unprecedented challenges, many of which of are data
related for the reasons stated above.
Considering these facts, publications concerning GAN applied to OHD seemed to
be severely lacking. To uncover the reasons for this slow adoption, we broadly
reviewed the published literature on the subject. Our findings show that the
properties of OHD were initially challenging for the existing GAN algorithms
(unlike medical imaging, for which state-of-the-art model were directly
transferable) and the evaluation synthetic data lacked clear metrics.
We find more publications on the subject than expected, starting slowly in
2017, and since then at an increasing rate. The difficulties of OHD remain, and
we discuss issues relating to evaluation, consistency, benchmarking, data
modelling, and reproducibility.Comment: 31 pages (10 in previous version), not including references and
glossary, 51 in total. Inclusion of a large number of recent publications and
expansion of the discussion accordingl
On Monetizing Personal Wearable Devices Data: A Blockchain-based Marketplace for Data Crowdsourcing and Federated Machine Learning in Healthcare
Machine learning advancements in healthcare have made data collected through smartphones and wearable devices a vital source of public health and medical insights. While wearable device data helps to monitor, detect, and predict diseases and health conditions, some data owners hesitate to share such sensitive data with companies or researchers due to privacy concerns. Moreover, wearable devices have been recently available as commercial products; thus large, diverse, and representative datasets are not available to most researchers. In this article, we propose an open marketplace where wearable device users securely monetize their wearable device records by sharing data with consumers (e.g., researchers) to make wearable device data more available to healthcare researchers. To secure the data transactions in a privacy-preserving manner, we use a decentralized approach using Blockchain and Non-Fungible Tokens (NFTs). To ensure data originality and integrity with secure validation, our marketplace uses Trusted Execution Environments (TEE) in wearable devices to verify the correctness of health data. The marketplace also allows researchers to train models using Federated Learning with a TEE-backed secure aggregation of data users may not be willing to share. To ensure user participation, we model incentive mechanisms for the Federated Learning-based and anonymized data-sharing approaches using NFTs. We also propose using payment channels and batching to reduce smart contact gas fees and optimize user profits. If widely adopted, we believe that TEE and Blockchain-based incentives will promote the ethical use of machine learning with validated wearable device data in healthcare and improve user participation due to incentives.
Provably-secure symmetric private information retrieval with quantum cryptography
Private information retrieval (PIR) is a database query protocol that
provides user privacy, in that the user can learn a particular entry of the
database of his interest but his query would be hidden from the data centre.
Symmetric private information retrieval (SPIR) takes PIR further by
additionally offering database privacy, where the user cannot learn any
additional entries of the database. Unconditionally secure SPIR solutions with
multiple databases are known classically, but are unrealistic because they
require long shared secret keys between the parties for secure communication
and shared randomness in the protocol. Here, we propose using quantum key
distribution (QKD) instead for a practical implementation, which can realise
both the secure communication and shared randomness requirements. We prove that
QKD maintains the security of the SPIR protocol and that it is also secure
against any external eavesdropper. We also show how such a classical-quantum
system could be implemented practically, using the example of a two-database
SPIR protocol with keys generated by measurement device-independent QKD.
Through key rate calculations, we show that such an implementation is feasible
at the metropolitan level with current QKD technology.Comment: 19 page
Mining Privacy-Preserving Association Rules based on Parallel Processing in Cloud Computing
With the onset of the Information Era and the rapid growth of information
technology, ample space for processing and extracting data has opened up.
However, privacy concerns may stifle expansion throughout this area. The
challenge of reliable mining techniques when transactions disperse across
sources is addressed in this study. This work looks at the prospect of creating
a new set of three algorithms that can obtain maximum privacy, data utility,
and time savings while doing so. This paper proposes a unique double encryption
and Transaction Splitter approach to alter the database to optimize the data
utility and confidentiality tradeoff in the preparation phase. This paper
presents a customized apriori approach for the mining process, which does not
examine the entire database to estimate the support for each attribute.
Existing distributed data solutions have a high encryption complexity and an
insufficient specification of many participants' properties. Proposed solutions
provide increased privacy protection against a variety of attack models.
Furthermore, in terms of communication cycles and processing complexity, it is
much simpler and quicker. Proposed work tests on top of a realworld transaction
database demonstrate that the aim of the proposed method is realistic
A review of Generative Adversarial Networks for Electronic Health Records: applications, evaluation measures and data sources
Electronic Health Records (EHRs) are a valuable asset to facilitate clinical
research and point of care applications; however, many challenges such as data
privacy concerns impede its optimal utilization. Deep generative models,
particularly, Generative Adversarial Networks (GANs) show great promise in
generating synthetic EHR data by learning underlying data distributions while
achieving excellent performance and addressing these challenges. This work aims
to review the major developments in various applications of GANs for EHRs and
provides an overview of the proposed methodologies. For this purpose, we
combine perspectives from healthcare applications and machine learning
techniques in terms of source datasets and the fidelity and privacy evaluation
of the generated synthetic datasets. We also compile a list of the metrics and
datasets used by the reviewed works, which can be utilized as benchmarks for
future research in the field. We conclude by discussing challenges in GANs for
EHRs development and proposing recommended practices. We hope that this work
motivates novel research development directions in the intersection of
healthcare and machine learning
- …