50,366 research outputs found

    LOCKS: User Differentially Private and Federated Optimal Client Sampling

    Full text link
    With changes in privacy laws, there is often a hard requirement for client data to remain on the device rather than being sent to the server. Therefore, most processing happens on the device, and only an altered element is sent to the server. Such mechanisms are developed by leveraging differential privacy and federated learning. Differential privacy adds noise to the client outputs and thus deteriorates the quality of each iteration. This distributed setting adds a layer of complexity and additional communication and performance overhead. These costs are additive per round, so we need to reduce the number of iterations. In this work, we provide an analytical framework for studying the convergence guarantees of gradient-based distributed algorithms. We show that our private algorithm minimizes the expected gradient variance by approximately d2d^2 rounds, where d is the dimensionality of the model. We discuss and suggest novel ways to improve the convergence rate to minimize the overhead using Importance Sampling (IS) and gradient diversity. Finally, we provide alternative frameworks that might be better suited to exploit client sampling techniques like IS and gradient diversity

    TernaryVote: Differentially Private, Communication Efficient, and Byzantine Resilient Distributed Optimization on Heterogeneous Data

    Full text link
    Distributed training of deep neural networks faces three critical challenges: privacy preservation, communication efficiency, and robustness to fault and adversarial behaviors. Although significant research efforts have been devoted to addressing these challenges independently, their synthesis remains less explored. In this paper, we propose TernaryVote, which combines a ternary compressor and the majority vote mechanism to realize differential privacy, gradient compression, and Byzantine resilience simultaneously. We theoretically quantify the privacy guarantee through the lens of the emerging f-differential privacy (DP) and the Byzantine resilience of the proposed algorithm. Particularly, in terms of privacy guarantees, compared to the existing sign-based approach StoSign, the proposed method improves the dimension dependence on the gradient size and enjoys privacy amplification by mini-batch sampling while ensuring a comparable convergence rate. We also prove that TernaryVote is robust when less than 50% of workers are blind attackers, which matches that of SIGNSGD with majority vote. Extensive experimental results validate the effectiveness of the proposed algorithm

    Data Analytics with Differential Privacy

    Full text link
    Differential privacy is the state-of-the-art definition for privacy, guaranteeing that any analysis performed on a sensitive dataset leaks no information about the individuals whose data are contained therein. In this thesis, we develop differentially private algorithms to analyze distributed and streaming data. In the distributed model, we consider the particular problem of learning -- in a distributed fashion -- a global model of the data, that can subsequently be used for arbitrary analyses. We build upon PrivBayes, a differentially private method that approximates the high-dimensional distribution of a centralized dataset as a product of low-order distributions, utilizing a Bayesian Network model. We examine three novel approaches to learning a global Bayesian Network from distributed data, while offering the differential privacy guarantee to all local datasets. Our work includes a detailed theoretical analysis of the distributed, differentially private entropy estimator which we use in one of our algorithms, as well as a detailed experimental evaluation, using both synthetic and real-world data. In the streaming model, we focus on the problem of estimating the density of a stream of users, which expresses the fraction of all users that actually appear in the stream. We offer one of the strongest privacy guarantees for the streaming model, user-level pan-privacy, which ensures that the privacy of any user is protected, even against an adversary that observes the internal state of the algorithm. We provide a detailed analysis of an existing, sampling-based algorithm for the problem and propose two novel modifications that significantly improve it, both theoretically and experimentally, by optimally using all the allocated "privacy budget."Comment: Diploma Thesis, School of Electrical and Computer Engineering, Technical University of Crete, Chania, Greece, 201
    • …
    corecore