5 research outputs found
Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions
Limited data access is a longstanding barrier to data-driven research and
development in the networked systems community. In this work, we explore if and
how generative adversarial networks (GANs) can be used to incentivize data
sharing by enabling a generic framework for sharing synthetic datasets with
minimal expert knowledge. As a specific target, our focus in this paper is on
time series datasets with metadata (e.g., packet loss rate measurements with
corresponding ISPs). We identify key challenges of existing GAN approaches for
such workloads with respect to fidelity (e.g., long-term dependencies, complex
multidimensional relationships, mode collapse) and privacy (i.e., existing
guarantees are poorly understood and can sacrifice fidelity). To improve
fidelity, we design a custom workflow called DoppelGANger (DG) and demonstrate
that across diverse real-world datasets (e.g., bandwidth measurements, cluster
requests, web sessions) and use cases (e.g., structural characterization,
predictive modeling, algorithm comparison), DG achieves up to 43% better
fidelity than baseline models. Although we do not resolve the privacy problem
in this work, we identify fundamental challenges with both classical notions of
privacy and recent advances to improve the privacy properties of GANs, and
suggest a potential roadmap for addressing these challenges. By shedding light
on the promise and challenges, we hope our work can rekindle the conversation
on workflows for data sharing.Comment: Published in IMC 2020. 20 pages, 26 figure
Informatikai algoritmusok 2.
A könyv a Magyar Tudományos Akadémia támogatásával készül