23 research outputs found

    PAC Security: Automatic Privacy Measurement and Control of Data Processing

    Full text link
    We propose and study a new privacy definition, termed Probably Approximately Correct (PAC) Security. PAC security characterizes the information-theoretic hardness to recover sensitive data given arbitrary information disclosure/leakage during/after any processing. Unlike the classic cryptographic definition and Differential Privacy (DP), which consider the adversarial (input-independent) worst case}, PAC security is a simulatable metric that accommodates priors and quantifies the instance-based impossibility of inference. A fully automatic analysis and proof generation framework is proposed, where security parameters can be produced with arbitrarily high confidence via Monte-Carlo simulation for any black-box data processing oracle. This appealing automation property enables analysis of complicated data processing, where the worst-case proof in the classic privacy regime could be loose or even intractable. Furthermore, we show that the magnitude of (necessary) perturbation required in PAC security is not explicitly dependent on dimensionality, which is in contrast to the worst-case information-theoretic lower bound. We also include practical applications of PAC security with comparisons

    Geometry of Sensitivity: Twice Sampling and Hybrid Clipping in Differential Privacy with Optimal Gaussian Noise and Application to Deep Learning

    Full text link
    We study the fundamental problem of the construction of optimal randomization in Differential Privacy. Depending on the clipping strategy or additional properties of the processing function, the corresponding sensitivity set theoretically determines the necessary randomization to produce the required security parameters. Towards the optimal utility-privacy tradeoff, finding the minimal perturbation for properly-selected sensitivity sets stands as a central problem in DP research. In practice, l_2/l_1-norm clippings with Gaussian/Laplace noise mechanisms are among the most common setups. However, they also suffer from the curse of dimensionality. For more generic clipping strategies, the understanding of the optimal noise for a high-dimensional sensitivity set remains limited. In this paper, we revisit the geometry of high-dimensional sensitivity sets and present a series of results to characterize the non-asymptotically optimal Gaussian noise for R\'enyi DP (RDP). Our results are both negative and positive: on one hand, we show the curse of dimensionality is tight for a broad class of sensitivity sets satisfying certain symmetry properties; but if, fortunately, the representation of the sensitivity set is asymmetric on some group of orthogonal bases, we show the optimal noise bounds need not be explicitly dependent on either dimension or rank. We also revisit sampling in the high-dimensional scenario, which is the key for both privacy amplification and computation efficiency in large-scale data processing. We propose a novel method, termed twice sampling, which implements both sample-wise and coordinate-wise sampling, to enable Gaussian noises to fit the sensitivity geometry more closely. With closed-form RDP analysis, we prove twice sampling produces asymptotic improvement of the privacy amplification given an additional infinity-norm restriction, especially for small sampling rate

    Differentially Private Deep Learning with ModelMix

    Full text link
    Training large neural networks with meaningful/usable differential privacy security guarantees is a demanding challenge. In this paper, we tackle this problem by revisiting the two key operations in Differentially Private Stochastic Gradient Descent (DP-SGD): 1) iterative perturbation and 2) gradient clipping. We propose a generic optimization framework, called {\em ModelMix}, which performs random aggregation of intermediate model states. It strengthens the composite privacy analysis utilizing the entropy of the training trajectory and improves the (ϵ,δ)(\epsilon, \delta) DP security parameters by an order of magnitude. We provide rigorous analyses for both the utility guarantees and privacy amplification of ModelMix. In particular, we present a formal study on the effect of gradient clipping in DP-SGD, which provides theoretical instruction on how hyper-parameters should be selected. We also introduce a refined gradient clipping method, which can further sharpen the privacy loss in private learning when combined with ModelMix. Thorough experiments with significant privacy/utility improvement are presented to support our theory. We train a Resnet-20 network on CIFAR10 with 70.4%70.4\% accuracy via ModelMix given (ϵ=8,δ=10−5)(\epsilon=8, \delta=10^{-5}) DP-budget, compared to the same performance but with (ϵ=145.8,δ=10−5)(\epsilon=145.8,\delta=10^{-5}) using regular DP-SGD; assisted with additional public low-dimensional gradient embedding, one can further improve the accuracy to 79.1%79.1\% with (ϵ=6.1,δ=10−5)(\epsilon=6.1, \delta=10^{-5}) DP-budget, compared to the same performance but with (ϵ=111.2,δ=10−5)(\epsilon=111.2, \delta=10^{-5}) without ModelMix

    Towards Understanding Practical Randomness Beyond Noise: Differential Privacy and Mixup

    Get PDF
    Information-theoretical privacy relies on randomness. Representatively, Differential Privacy (DP) has emerged as the gold standard to quantify the individual privacy preservation provided by given randomness. However, almost all randomness in existing differentially private optimization and learning algorithms is restricted to noise perturbation. In this paper, we set out to provide a privacy analysis framework to understand the privacy guarantee produced by other randomness commonly used in optimization and learning algorithms (e.g., parameter randomness). We take mixup: a random linear aggregation of inputs, as a concrete example. Our contributions are twofold. First, we develop a rigorous analysis on the privacy amplification provided by mixup either on samples or updates, where we find the hybrid structure of mixup and the Laplace Mechanism produces a new type of DP guarantee lying between Pure DP and Approximate DP. Such an average-case privacy amplification can produce tighter composition bounds. Second, both empirically and theoretically, we show that proper mixup comes almost free of utility compromise
    corecore