38,577 research outputs found
LDP-IDS: Local Differential Privacy for Infinite Data Streams
Streaming data collection is essential to real-time data analytics in various
IoTs and mobile device-based systems, which, however, may expose end users'
privacy. Local differential privacy (LDP) is a promising solution to
privacy-preserving data collection and analysis. However, existing few LDP
studies over streams are either applicable to finite streams only or suffering
from insufficient protection. This paper investigates this problem by proposing
LDP-IDS, a novel -event LDP paradigm to provide practical privacy guarantee
for infinite streams at users end, and adapting the popular budget division
framework in centralized differential privacy (CDP). By constructing a unified
error analysi for LDP, we first develop two adatpive budget division-based LDP
methods for LDP-IDS that can enhance data utility via leveraging the
non-deterministic sparsity in streams. Beyond that, we further propose a novel
population division framework that can not only avoid the high sensitivity of
LDP noise to budget division but also require significantly less communication.
Based on the framework, we also present two adaptive population division
methods for LDP-IDS with theoretical analysis. We conduct extensive experiments
on synthetic and real-world datasets to evaluate the effectiveness and
efficiency pf our proposed frameworks and methods. Experimental results
demonstrate that, despite the effectiveness of the adaptive budget division
methods, the proposed population division framework and methods can further
achieve much higher effectiveness and efficiency.Comment: accepted to SIGMOD'2
Renyi Differential Privacy of Propose-Test-Release and Applications to Private and Robust Machine Learning
Propose-Test-Release (PTR) is a differential privacy framework that works
with local sensitivity of functions, instead of their global sensitivity. This
framework is typically used for releasing robust statistics such as median or
trimmed mean in a differentially private manner. While PTR is a common
framework introduced over a decade ago, using it in applications such as robust
SGD where we need many adaptive robust queries is challenging. This is mainly
due to the lack of Renyi Differential Privacy (RDP) analysis, an essential
ingredient underlying the moments accountant approach for differentially
private deep learning. In this work, we generalize the standard PTR and derive
the first RDP bound for it when the target function has bounded global
sensitivity. We show that our RDP bound for PTR yields tighter DP guarantees
than the directly analyzed (\eps, \delta)-DP. We also derive the
algorithm-specific privacy amplification bound of PTR under subsampling. We
show that our bound is much tighter than the general upper bound and close to
the lower bound. Our RDP bounds enable tighter privacy loss calculation for the
composition of many adaptive runs of PTR. As an application of our analysis, we
show that PTR and our theoretical results can be used to design differentially
private variants for byzantine robust training algorithms that use robust
statistics for gradients aggregation. We conduct experiments on the settings of
label, feature, and gradient corruption across different datasets and
architectures. We show that PTR-based private and robust training algorithm
significantly improves the utility compared with the baseline.Comment: NeurIPS 202
DP-HyPO: An Adaptive Private Hyperparameter Optimization Framework
Hyperparameter optimization, also known as hyperparameter tuning, is a widely
recognized technique for improving model performance. Regrettably, when
training private ML models, many practitioners often overlook the privacy risks
associated with hyperparameter optimization, which could potentially expose
sensitive information about the underlying dataset. Currently, the sole
existing approach to allow privacy-preserving hyperparameter optimization is to
uniformly and randomly select hyperparameters for a number of runs,
subsequently reporting the best-performing hyperparameter. In contrast, in
non-private settings, practitioners commonly utilize "adaptive" hyperparameter
optimization methods such as Gaussian process-based optimization, which select
the next candidate based on information gathered from previous outputs. This
substantial contrast between private and non-private hyperparameter
optimization underscores a critical concern. In our paper, we introduce
DP-HyPO, a pioneering framework for "adaptive" private hyperparameter
optimization, aiming to bridge the gap between private and non-private
hyperparameter optimization. To accomplish this, we provide a comprehensive
differential privacy analysis of our framework. Furthermore, we empirically
demonstrate the effectiveness of DP-HyPO on a diverse set of real-world and
synthetic datasets
Accelerated Federated Learning with Decoupled Adaptive Optimization
The federated learning (FL) framework enables edge clients to collaboratively
learn a shared inference model while keeping privacy of training data on
clients. Recently, many heuristics efforts have been made to generalize
centralized adaptive optimization methods, such as SGDM, Adam, AdaGrad, etc.,
to federated settings for improving convergence and accuracy. However, there is
still a paucity of theoretical principles on where to and how to design and
utilize adaptive optimization methods in federated settings. This work aims to
develop novel adaptive optimization methods for FL from the perspective of
dynamics of ordinary differential equations (ODEs). First, an analytic
framework is established to build a connection between federated optimization
methods and decompositions of ODEs of corresponding centralized optimizers.
Second, based on this analytic framework, a momentum decoupling adaptive
optimization method, FedDA, is developed to fully utilize the global momentum
on each local iteration and accelerate the training convergence. Last but not
least, full batch gradients are utilized to mimic centralized optimization in
the end of the training process to ensure the convergence and overcome the
possible inconsistency caused by adaptive optimization methods
Leveraging Privacy In Data Analysis
Data analysis is inherently adaptive, where previous results may influence which tests are carried out on a single dataset as part of a series of exploratory analyses. Unfortunately, classical statistical tools break down once the choice of analysis may depend on the dataset, which leads to overfitting and spurious conclusions. In this dissertation we put constraints on what type of analyses can be used adaptively on the same dataset in order to ensure valid conclusions are made. Following a line of work initiated from Dwork et al. [2015], we focus on extending the connection between differential privacy and adaptive data analysis.
Our first contribution follows work presented in Rogers et al. [2016]. We generalize and unify previous works in the area by showing that the generalization properties of (approximately) differentially private algorithms can be used to give valid p-value corrections in adaptive hypothesis testing while recovering results for statistical and low-sensitivity queries. One of the main benefits of differential privacy is that it composes, i.e. the combination of several differentially private algorithms is itself differentially private and the privacy parameters degrade sublinearly. However, we can only apply the composition theorems when the privacy parameters are all fixed up front. Our second contribution then presents a framework for obtaining composition theorems when the privacy parameters, along with the number of procedures that are to be used, need not be fixed up front and can be adjusted adaptively Rogers et al. [2016]. These contributions are only useful if there actually exists some differentially private procedures that a data analyst would want to use. Hence, we present differentially private hypothesis tests for categorical data based on the classical chi-square hypothesis tests (Gaboardi et al. [2016], Kifer Rogers [2017])
Bounded-Leakage Differential Privacy
We introduce and study a relaxation of differential privacy [Dwork et al., 2006] that accounts for mechanisms that leak some additional, bounded information about the database. We apply this notion to reason about two distinct settings where the notion of differential privacy is of limited use. First, we consider cases, such as in the 2020 US Census [Abowd, 2018], in which some information about the database is released exactly or with small noise. Second, we consider the accumulation of privacy harms for an individual across studies that may not even include the data of this individual. The tools that we develop for bounded-leakage differential privacy allow us reason about privacy loss in these settings, and to show that individuals preserve some meaningful protections
Private Learning Implies Online Learning: An Efficient Reduction
We study the relationship between the notions of differentially private
learning and online learning in games. Several recent works have shown that
differentially private learning implies online learning, but an open problem of
Neel, Roth, and Wu \cite{NeelAaronRoth2018} asks whether this implication is
{\it efficient}. Specifically, does an efficient differentially private learner
imply an efficient online learner? In this paper we resolve this open question
in the context of pure differential privacy. We derive an efficient black-box
reduction from differentially private learning to online learning from expert
advice
- …