322 research outputs found

    Secure Computation for Machine Learning With SPDZ

    Full text link
    Secure Multi-Party Computation (MPC) is an area of cryptography that enables computation on sensitive data from multiple sources while maintaining privacy guarantees. However, theoretical MPC protocols often do not scale efficiently to real-world data. This project investigates the efficiency of the SPDZ framework, which provides an implementation of an MPC protocol with malicious security, in the context of popular machine learning (ML) algorithms. In particular, we chose applications such as linear regression and logistic regression, which have been implemented and evaluated using semi-honest MPC techniques. We demonstrate that the SPDZ framework outperforms these previous implementations while providing stronger security.Comment: 32nd Conference on Neural Information Processing Systems (NIPS 2018

    A generic framework for privacy preserving deep learning

    Full text link
    We detail a new framework for privacy preserving deep learning and discuss its assets. The framework puts a premium on ownership and secure processing of data and introduces a valuable representation based on chains of commands and tensors. This abstraction allows one to implement complex privacy preserving constructs such as Federated Learning, Secure Multiparty Computation, and Differential Privacy while still exposing a familiar deep learning API to the end-user. We report early results on the Boston Housing and Pima Indian Diabetes datasets. While the privacy features apart from Differential Privacy do not impact the prediction accuracy, the current implementation of the framework introduces a significant overhead in performance, which will be addressed at a later stage of the development. We believe this work is an important milestone introducing the first reliable, general framework for privacy preserving deep learning.Comment: PPML 2018, 5 page

    Private Machine Learning in TensorFlow using Secure Computation

    Full text link
    We present a framework for experimenting with secure multi-party computation directly in TensorFlow. By doing so we benefit from several properties valuable to both researchers and practitioners, including tight integration with ordinary machine learning processes, existing optimizations for distributed computation in TensorFlow, high-level abstractions for expressing complex algorithms and protocols, and an expanded set of familiar tooling. We give an open source implementation of a state-of-the-art protocol and report on concrete benchmarks using typical models from private machine learning

    STAR: Statistical Tests with Auditable Results

    Full text link
    We present STAR: a novel system aimed at solving the complex issue of "p-hacking" and false discoveries in scientific studies. STAR provides a concrete way for ensuring the application of false discovery control procedures in hypothesis testing, using mathematically provable guarantees, with the goal of reducing the risk of data dredging. STAR generates an efficiently auditable certificate which attests to the validity of each statistical test performed on a dataset. STAR achieves this by using several cryptographic techniques which are combined specifically for this purpose. Under-the-hood, STAR uses a decentralized set of authorities (e.g., research institutions), secure computation techniques, and an append-only ledger which together enable auditing of scientific claims by 3rd parties and matches real world trust assumptions. We implement and evaluate a construction of STAR using the Microsoft SEAL encryption library and SPDZ multi-party computation protocol. Our experimental evaluation demonstrates the practicality of STAR in multiple real world scenarios as a system for certifying scientific discoveries in a tamper-proof way

    A Practical Scheme for Two-Party Private Linear Least Squares

    Full text link
    Privacy-preserving machine learning is learning from sensitive datasets that are typically distributed across multiple data owners. Private machine learning is a remarkable challenge in a large number of realistic scenarios where no trusted third party can play the role of a mediator. The strong decentralization aspect of these scenarios requires tools from cryptography as well as from distributed systems communities. In this paper, we present a practical scheme that is suitable for a subclass of machine learning algorithms and investigate the possibility of conducting future research. We present a scheme to learn a linear least squares model across two parties using a gradient descent approach and additive homomorphic encryption. The protocol requires two rounds of communication per step of gradient descent. We detail our approach including a fixed point encoding scheme, and one time random pads for hiding intermediate results

    Helen: Maliciously Secure Coopetitive Learning for Linear Models

    Full text link
    Many organizations wish to collaboratively train machine learning models on their combined datasets for a common benefit (e.g., better medical research, or fraud detection). However, they often cannot share their plaintext datasets due to privacy concerns and/or business competition. In this paper, we design and build Helen, a system that allows multiple parties to train a linear model without revealing their data, a setting we call coopetitive learning. Compared to prior secure training systems, Helen protects against a much stronger adversary who is malicious and can compromise m-1 out of m parties. Our evaluation shows that Helen can achieve up to five orders of magnitude of performance improvement when compared to training using an existing state-of-the-art secure multi-party computation framework

    Privacy Preserving Vertical Federated Learning for Tree-based Models

    Full text link
    Federated learning (FL) is an emerging paradigm that enables multiple organizations to jointly train a model without revealing their private data to each other. This paper studies {\it vertical} federated learning, which tackles the scenarios where (i) collaborating organizations own data of the same set of users but with disjoint features, and (ii) only one organization holds the labels. We propose Pivot, a novel solution for privacy preserving vertical decision tree training and prediction, ensuring that no intermediate information is disclosed other than those the clients have agreed to release (i.e., the final tree model and the prediction output). Pivot does not rely on any trusted third party and provides protection against a semi-honest adversary that may compromise m−1m-1 out of mm clients. We further identify two privacy leakages when the trained decision tree model is released in plaintext and propose an enhanced protocol to mitigate them. The proposed solution can also be extended to tree ensemble models, e.g., random forest (RF) and gradient boosting decision tree (GBDT) by treating single decision trees as building blocks. Theoretical and experimental analysis suggest that Pivot is efficient for the privacy achieved.Comment: Proc. VLDB Endow. 13(11): 2090-2103 (2020

    Secure and Efficient Federated Transfer Learning

    Full text link
    Machine Learning models require a vast amount of data for accurate training. In reality, most data is scattered across different organizations and cannot be easily integrated under many legal and practical constraints. Federated Transfer Learning (FTL) was introduced in [1] to improve statistical models under a data federation that allow knowledge to be shared without compromising user privacy, and enable complementary knowledge to be transferred in the network. As a result, a target-domain party can build more flexible and powerful models by leveraging rich labels from a source-domain party. However, the excessive computational overhead of the security protocol involved in this model rendered it impractical. In this work, we aim towards enhancing the efficiency and security of existing models for practical collaborative training under a data federation by incorporating Secret Sharing (SS). In literature, only the semi-honest model for Federated Transfer Learning has been considered. In this paper, we improve upon the previous solution, and also allow malicious players who can arbitrarily deviate from the protocol in our FTL model. This is much stronger than the semi-honest model where we assume that parties follow the protocol precisely. We do so using the one of the practical MPC protocol called SPDZ, thus our model can be efficiently extended to any number of parties even in the case of a dishonest majority. In addition, the models evaluated in our setting significantly outperform the previous work, in terms of both runtime and communication cost. A single iteration in our model executes in 0.8 seconds for the semi-honest case and 1.4 seconds for the malicious case for 500 samples, as compared to 35 seconds taken by the previous implementation.Comment: Special Track on Federated Machine Learning in IEEE BigData 201

    Data Querying and Access Control for Secure Multiparty Computation

    Full text link
    In the Internet of Things and smart environments data, collected from distributed sensors, is typically stored and processed by a central middleware. This allows applications to query the data they need for providing further services. However, centralization of data causes several privacy threats: The middleware becomes a third party which has to be trusted, linkage and correlation of data from different context becomes possible and data subject lose control over their data. Hence, other approaches than centralized processing should be considered. Here, Secure Multiparty Computation is a promising candidate for secure and privacy-preserving computation happening close to the sources of the data. In order to make SMC fit for application in these contexts, we extend SMC to act as a service: We provide elements which allow third parties to query computed data from a group of peers performing SMC. Furthermore, we establish fine-granular access control on the level of individual data queries, yielding data protection of the computed results. By adding measures to inform data sources about requests and the usage of their data, we show how a fully privacy-preserving service can be built on the foundation of SMC

    Accelerating 2PC-based ML with Limited Trusted Hardware

    Full text link
    This paper describes the design, implementation, and evaluation of Otak, a system that allows two non-colluding cloud providers to run machine learning (ML) inference without knowing the inputs to inference. Prior work for this problem mostly relies on advanced cryptography such as two-party secure computation (2PC) protocols that provide rigorous guarantees but suffer from high resource overhead. Otak improves efficiency via a new 2PC protocol that (i) tailors recent primitives such as function and homomorphic secret sharing to ML inference, and (ii) uses trusted hardware in a limited capacity to bootstrap the protocol. At the same time, Otak reduces trust assumptions on trusted hardware by running a small code inside the hardware, restricting its use to a preprocessing step, and distributing trust over heterogeneous trusted hardware platforms from different vendors. An implementation and evaluation of Otak demonstrates that its CPU and network overhead converted to a dollar amount is 5.4−-385×\times lower than state-of-the-art 2PC-based works. Besides, Otak's trusted computing base (code inside trusted hardware) is only 1,300 lines of code, which is 14.6−-29.2×\times lower than the code-size in prior trusted hardware-based works.Comment: 19 page
    • …
    corecore