Search CORE

3 research outputs found

Machine Learning for Synthetic Data Generation: A Review

Author: Lu Yingzhou
Wang Huazheng
Wei Wenqi
Publication venue
Publication date: 28/03/2023
Field of study

Data plays a crucial role in machine learning. However, in real-world applications, there are several problems with data, e.g., data are of low quality; a limited number of data points lead to under-fitting of the machine learning model; it is hard to access the data due to privacy, safety and regulatory concerns. Synthetic data generation offers a promising new avenue, as it can be shared and used in ways that real-world data cannot. This paper systematically reviews the existing works that leverage machine learning models for synthetic data generation. Specifically, we discuss the synthetic data generation works from several perspectives: (i) applications, including computer vision, speech, natural language, healthcare, and business; (ii) machine learning methods, particularly neural network architectures and deep generative models; (iii) privacy and fairness issue. In addition, we identify the challenges and opportunities in this emerging field and suggest future research directions

arXiv.org e-Print Archive

Synthetic Data Generation Using Wasserstein Conditional Gans With Gradient Penalty (WCGANS-GP)

Author: Singh Walia Manhar
Publication venue: Dublin Institute of Technology
Publication date: 01/01/2020
Field of study

With data protection requirements becoming stricter, the data privacy has become increasingly important and more crucial than ever. This has led to restrictions on the availability and dissemination of real-world datasets. Synthetic data offers a viable solution to overcome barriers of data access and sharing. Existing data generation methods require a great deal of user-defined rules, manual interactions and domainspecific knowledge. Moreover, they are not able to balance the trade-off between datausability and privacy. Deep learning based methods like GANs have seen remarkable success in synthesizing images by automatically learning the complicated distributions and patterns of real data. But they often suffer from instability during the training process

Arrow@TUDublin