2,785 research outputs found

    How to Incentivize Data-Driven Collaboration Among Competing Parties

    Full text link
    The availability of vast amounts of data is changing how we can make medical discoveries, predict global market trends, save energy, and develop educational strategies. In some settings such as Genome Wide Association Studies or deep learning, sheer size of data seems critical. When data is held distributedly by many parties, they must share it to reap its full benefits. One obstacle to this revolution is the lack of willingness of different parties to share data, due to reasons such as loss of privacy or competitive edge. Cryptographic works address privacy aspects, but shed no light on individual parties' losses/gains when access to data carries tangible rewards. Even if it is clear that better overall conclusions can be drawn from collaboration, are individual collaborators better off by collaborating? Addressing this question is the topic of this paper. * We formalize a model of n-party collaboration for computing functions over private inputs in which participants receive their outputs in sequence, and the order depends on their private inputs. Each output "improves" on preceding outputs according to a score function. * We say a mechanism for collaboration achieves collaborative equilibrium if it ensures higher reward for all participants when collaborating (rather than working alone). We show that in general, computing a collaborative equilibrium is NP-complete, yet we design efficient algorithms to compute it in a range of natural model settings. Our collaboration mechanisms are in the standard model, and thus require a central trusted party; however, we show this assumption is unnecessary under standard cryptographic assumptions. We show how to implement the mechanisms in a decentralized way with new extensions of secure multiparty computation that impose order/timing constraints on output delivery to different players, as well as privacy and correctness

    Advances in privacy-preserving machine learning

    Get PDF
    Building useful predictive models often involves learning from personal data. For instance, companies use customer data to target advertisements, online education platforms collect student data to recommend content and improve user engagement, and medical researchers fit diagnostic models to patient data. A recent line of research aims to design learning algorithms that provide rigorous privacy guarantees for user data, in the sense that their outputs---models or predictions---leak as little information as possible about individuals in the training data. The goal of this dissertation is to design private learning algorithms with performance comparable to the best possible non-private ones. We quantify privacy using \emph{differential privacy}, a well-studied privacy notion that limits how much information is leaked about an individual by the output of an algorithm. Training a model using a differentially private algorithm prevents an adversary from confidently determining whether a specific person's data was used for training the model. We begin by presenting a technique for practical differentially private convex optimization that can leverage any off-the-shelf optimizer as a black box. We also perform an extensive empirical evaluation of the state-of-the-art algorithms on a range of publicly available datasets, as well as in an industry application. Next, we present a learning algorithm that outputs a private classifier when given black-box access to a non-private learner and a limited amount of unlabeled public data. We prove that the accuracy guarantee of our private algorithm in the PAC model of learning is comparable to that of the underlying non-private learner. Such a guarantee is not possible, in general, without public data. Lastly, we consider building recommendation systems, which we model using matrix completion. We present the first algorithm for matrix completion with provable user-level privacy and accuracy guarantees. Our algorithm consistently outperforms the state-of-the-art private algorithms on a suite of datasets. Along the way, we give an optimal algorithm for differentially private singular vector computation which leads to significant savings in terms of space and time when operating on sparse matrices. It can also be used for private low-rank approximation

    Efficient Federated Learning over Multiple Access Channel with Differential Privacy Constraints

    Full text link
    In this paper, the problem of federated learning (FL) through digital communication between clients and a parameter server (PS) over a multiple access channel (MAC), also subject to differential privacy (DP) constraints, is studied. More precisely, we consider the setting in which clients in a centralized network are prompted to train a machine learning model using their local datasets. The information exchange between the clients and the PS takes places over a MAC channel and must also preserve the DP of the local datasets. Accordingly, the objective of the clients is to minimize the training loss subject to (i) rate constraints for reliable communication over the MAC and (ii) DP constraint over the local datasets. For this optimization scenario, we proposed a novel consensus scheme in which digital distributed stochastic gradient descent (D-DSGD) is performed by each client. To preserve DP, a digital artificial noise is also added by the users to the locally quantized gradients. The performance of the scheme is evaluated in terms of the convergence rate and DP level for a given MAC capacity. The performance is optimized over the choice of the quantization levels and the artificial noise parameters. Numerical evaluations are presented to validate the performance of the proposed scheme
    • …
    corecore