Differentially private data generation techniques have become a promising
solution to the data privacy challenge -- it enables sharing of data while
complying with rigorous privacy guarantees, which is essential for scientific
progress in sensitive domains. Unfortunately, restricted by the inherent
complexity of modeling high-dimensional distributions, existing private
generative models are struggling with the utility of synthetic samples.
In contrast to existing works that aim at fitting the complete data
distribution, we directly optimize for a small set of samples that are
representative of the distribution under the supervision of discriminative
information from downstream tasks, which is generally an easier task and more
suitable for private training. Our work provides an alternative view for
differentially private generation of high-dimensional data and introduces a
simple yet effective method that greatly improves the sample utility of
state-of-the-art approaches.Comment: NeurIPS 2022, 19 page