50,366 research outputs found
LOCKS: User Differentially Private and Federated Optimal Client Sampling
With changes in privacy laws, there is often a hard requirement for client
data to remain on the device rather than being sent to the server. Therefore,
most processing happens on the device, and only an altered element is sent to
the server. Such mechanisms are developed by leveraging differential privacy
and federated learning. Differential privacy adds noise to the client outputs
and thus deteriorates the quality of each iteration. This distributed setting
adds a layer of complexity and additional communication and performance
overhead. These costs are additive per round, so we need to reduce the number
of iterations. In this work, we provide an analytical framework for studying
the convergence guarantees of gradient-based distributed algorithms. We show
that our private algorithm minimizes the expected gradient variance by
approximately rounds, where d is the dimensionality of the model. We
discuss and suggest novel ways to improve the convergence rate to minimize the
overhead using Importance Sampling (IS) and gradient diversity. Finally, we
provide alternative frameworks that might be better suited to exploit client
sampling techniques like IS and gradient diversity
TernaryVote: Differentially Private, Communication Efficient, and Byzantine Resilient Distributed Optimization on Heterogeneous Data
Distributed training of deep neural networks faces three critical challenges:
privacy preservation, communication efficiency, and robustness to fault and
adversarial behaviors. Although significant research efforts have been devoted
to addressing these challenges independently, their synthesis remains less
explored. In this paper, we propose TernaryVote, which combines a ternary
compressor and the majority vote mechanism to realize differential privacy,
gradient compression, and Byzantine resilience simultaneously. We theoretically
quantify the privacy guarantee through the lens of the emerging f-differential
privacy (DP) and the Byzantine resilience of the proposed algorithm.
Particularly, in terms of privacy guarantees, compared to the existing
sign-based approach StoSign, the proposed method improves the dimension
dependence on the gradient size and enjoys privacy amplification by mini-batch
sampling while ensuring a comparable convergence rate. We also prove that
TernaryVote is robust when less than 50% of workers are blind attackers, which
matches that of SIGNSGD with majority vote. Extensive experimental results
validate the effectiveness of the proposed algorithm
Data Analytics with Differential Privacy
Differential privacy is the state-of-the-art definition for privacy,
guaranteeing that any analysis performed on a sensitive dataset leaks no
information about the individuals whose data are contained therein. In this
thesis, we develop differentially private algorithms to analyze distributed and
streaming data. In the distributed model, we consider the particular problem of
learning -- in a distributed fashion -- a global model of the data, that can
subsequently be used for arbitrary analyses. We build upon PrivBayes, a
differentially private method that approximates the high-dimensional
distribution of a centralized dataset as a product of low-order distributions,
utilizing a Bayesian Network model. We examine three novel approaches to
learning a global Bayesian Network from distributed data, while offering the
differential privacy guarantee to all local datasets. Our work includes a
detailed theoretical analysis of the distributed, differentially private
entropy estimator which we use in one of our algorithms, as well as a detailed
experimental evaluation, using both synthetic and real-world data. In the
streaming model, we focus on the problem of estimating the density of a stream
of users, which expresses the fraction of all users that actually appear in the
stream. We offer one of the strongest privacy guarantees for the streaming
model, user-level pan-privacy, which ensures that the privacy of any user is
protected, even against an adversary that observes the internal state of the
algorithm. We provide a detailed analysis of an existing, sampling-based
algorithm for the problem and propose two novel modifications that
significantly improve it, both theoretically and experimentally, by optimally
using all the allocated "privacy budget."Comment: Diploma Thesis, School of Electrical and Computer Engineering,
Technical University of Crete, Chania, Greece, 201
- …