3 research outputs found
Machine Learning for Synthetic Data Generation: A Review
Data plays a crucial role in machine learning. However, in real-world
applications, there are several problems with data, e.g., data are of low
quality; a limited number of data points lead to under-fitting of the machine
learning model; it is hard to access the data due to privacy, safety and
regulatory concerns. Synthetic data generation offers a promising new avenue,
as it can be shared and used in ways that real-world data cannot. This paper
systematically reviews the existing works that leverage machine learning models
for synthetic data generation. Specifically, we discuss the synthetic data
generation works from several perspectives: (i) applications, including
computer vision, speech, natural language, healthcare, and business; (ii)
machine learning methods, particularly neural network architectures and deep
generative models; (iii) privacy and fairness issue. In addition, we identify
the challenges and opportunities in this emerging field and suggest future
research directions
Synthetic Data Generation Using Wasserstein Conditional Gans With Gradient Penalty (WCGANS-GP)
With data protection requirements becoming stricter, the data privacy has become increasingly important and more crucial than ever. This has led to restrictions on the availability and dissemination of real-world datasets. Synthetic data offers a viable solution to overcome barriers of data access and sharing. Existing data generation methods require a great deal of user-defined rules, manual interactions and domainspecific knowledge. Moreover, they are not able to balance the trade-off between datausability and privacy. Deep learning based methods like GANs have seen remarkable success in synthesizing images by automatically learning the complicated distributions and patterns of real data. But they often suffer from instability during the training process