Search CORE

28 research outputs found

TAN without a burn: Scaling Laws of DP-SGD

Author: Sablayrolles Alexandre
Sander Tom
Stock Pierre
Publication venue
Publication date: 07/10/2022
Field of study

Differentially Private methods for training Deep Neural Networks (DNNs) have progressed recently, in particular with the use of massive batches and aggregated data augmentations for a large number of steps. These techniques require much more compute than their non-private counterparts, shifting the traditional privacy-accuracy trade-off to a privacy-accuracy-compute trade-off and making hyper-parameter search virtually impossible for realistic scenarios. In this work, we decouple privacy analysis and experimental behavior of noisy training to explore the trade-off with minimal computational requirements. We first use the tools of R\'enyi Differential Privacy (RDP) to show that the privacy budget, when not overcharged, only depends on the total amount of noise (TAN) injected throughout training. We then derive scaling laws for training models with DP-SGD to optimize hyper-parameters with more than a 100 reduction in computational budget. We apply the proposed method on CIFAR-10 and ImageNet and, in particular, strongly improve the state-of-the-art on ImageNet with a +9 points gain in accuracy for a privacy budget epsilon=8

arXiv.org e-Print Archive

Privacy-preserving data sharing via probabilistic modeling

Author: Haukka Jari
Honkela Antti
Jalko Joonas
Kaski Samuel
Lagerspetz Eemil
Tarkoma Sasu
Publication venue
Publication date: 01/03/2021
Field of study

Differential privacy allows quantifying privacy loss resulting from accession of sensitive personal data. Repeated accesses to underlying data incur increasing loss. Releasing data as privacy-preserving synthetic data would avoid this limitation but would leave open the problem of designing what kind of synthetic data. We propose formulating the problem of private data release through probabilistic modeling. This approach transforms the problem of designing the synthetic data into choosing a model for the data, allowing also the inclusion of prior knowledge, which improves the quality of the synthetic data. We demonstrate empirically, in an epidemiological study, that statistical discoveries can be reliably reproduced from the synthetic data. We expect the method to have broad use in creating high-quality anonymized data twins of key datasets for research.Peer reviewe

arXiv.org e-Print Archive

Directory of Open Access Journals

PubMed Central

Aaltodoc Publication Archive

Helsingin yliopiston digitaalinen arkisto

Selective Pre-training for Private Fine-tuning

Author: Gopi Sivakanth
Kulkarni Janardhan
Lin Zinan
Naik Saurabh
Religa Tomasz Lukasz
Yin Jian
Yu Da
Zhang Huishuai
Publication venue
Publication date: 23/05/2023
Field of study

Suppose we want to train text prediction models in email clients or word processors. The models must preserve the privacy of user data and adhere to a specific fixed size to meet memory and inference time requirements. We introduce a generic framework to solve this problem. Specifically, we are given a public dataset

D_\text{pub}

and a private dataset

D_\text{priv}

corresponding to a downstream task

T

. How should we pre-train a fixed-size model

M

D_\text{pub}

and fine-tune it on

D_\text{priv}

such that performance of

M

with respect to

T

is maximized and

M

satisfies differential privacy with respect to

D_\text{priv}

? We show that pre-training on a {\em subset} of dataset

D_\text{pub}

that brings the public distribution closer to the private distribution is a crucial ingredient to maximize the transfer learning abilities of

M

after pre-training, especially in the regimes where model sizes are relatively small. Besides performance improvements, our framework also shows that with careful pre-training and private fine-tuning, {\em smaller models} can match the performance of much larger models, highlighting the promise of differentially private training as a tool for model compression and efficiency

arXiv.org e-Print Archive