322 research outputs found
Secure Computation for Machine Learning With SPDZ
Secure Multi-Party Computation (MPC) is an area of cryptography that enables
computation on sensitive data from multiple sources while maintaining privacy
guarantees. However, theoretical MPC protocols often do not scale efficiently
to real-world data. This project investigates the efficiency of the SPDZ
framework, which provides an implementation of an MPC protocol with malicious
security, in the context of popular machine learning (ML) algorithms. In
particular, we chose applications such as linear regression and logistic
regression, which have been implemented and evaluated using semi-honest MPC
techniques. We demonstrate that the SPDZ framework outperforms these previous
implementations while providing stronger security.Comment: 32nd Conference on Neural Information Processing Systems (NIPS 2018
A generic framework for privacy preserving deep learning
We detail a new framework for privacy preserving deep learning and discuss
its assets. The framework puts a premium on ownership and secure processing of
data and introduces a valuable representation based on chains of commands and
tensors. This abstraction allows one to implement complex privacy preserving
constructs such as Federated Learning, Secure Multiparty Computation, and
Differential Privacy while still exposing a familiar deep learning API to the
end-user. We report early results on the Boston Housing and Pima Indian
Diabetes datasets. While the privacy features apart from Differential Privacy
do not impact the prediction accuracy, the current implementation of the
framework introduces a significant overhead in performance, which will be
addressed at a later stage of the development. We believe this work is an
important milestone introducing the first reliable, general framework for
privacy preserving deep learning.Comment: PPML 2018, 5 page
Private Machine Learning in TensorFlow using Secure Computation
We present a framework for experimenting with secure multi-party computation
directly in TensorFlow. By doing so we benefit from several properties valuable
to both researchers and practitioners, including tight integration with
ordinary machine learning processes, existing optimizations for distributed
computation in TensorFlow, high-level abstractions for expressing complex
algorithms and protocols, and an expanded set of familiar tooling. We give an
open source implementation of a state-of-the-art protocol and report on
concrete benchmarks using typical models from private machine learning
STAR: Statistical Tests with Auditable Results
We present STAR: a novel system aimed at solving the complex issue of
"p-hacking" and false discoveries in scientific studies. STAR provides a
concrete way for ensuring the application of false discovery control procedures
in hypothesis testing, using mathematically provable guarantees, with the goal
of reducing the risk of data dredging. STAR generates an efficiently auditable
certificate which attests to the validity of each statistical test performed on
a dataset. STAR achieves this by using several cryptographic techniques which
are combined specifically for this purpose. Under-the-hood, STAR uses a
decentralized set of authorities (e.g., research institutions), secure
computation techniques, and an append-only ledger which together enable
auditing of scientific claims by 3rd parties and matches real world trust
assumptions. We implement and evaluate a construction of STAR using the
Microsoft SEAL encryption library and SPDZ multi-party computation protocol.
Our experimental evaluation demonstrates the practicality of STAR in multiple
real world scenarios as a system for certifying scientific discoveries in a
tamper-proof way
A Practical Scheme for Two-Party Private Linear Least Squares
Privacy-preserving machine learning is learning from sensitive datasets that
are typically distributed across multiple data owners. Private machine learning
is a remarkable challenge in a large number of realistic scenarios where no
trusted third party can play the role of a mediator. The strong
decentralization aspect of these scenarios requires tools from cryptography as
well as from distributed systems communities. In this paper, we present a
practical scheme that is suitable for a subclass of machine learning algorithms
and investigate the possibility of conducting future research. We present a
scheme to learn a linear least squares model across two parties using a
gradient descent approach and additive homomorphic encryption. The protocol
requires two rounds of communication per step of gradient descent. We detail
our approach including a fixed point encoding scheme, and one time random pads
for hiding intermediate results
Helen: Maliciously Secure Coopetitive Learning for Linear Models
Many organizations wish to collaboratively train machine learning models on
their combined datasets for a common benefit (e.g., better medical research, or
fraud detection). However, they often cannot share their plaintext datasets due
to privacy concerns and/or business competition. In this paper, we design and
build Helen, a system that allows multiple parties to train a linear model
without revealing their data, a setting we call coopetitive learning. Compared
to prior secure training systems, Helen protects against a much stronger
adversary who is malicious and can compromise m-1 out of m parties. Our
evaluation shows that Helen can achieve up to five orders of magnitude of
performance improvement when compared to training using an existing
state-of-the-art secure multi-party computation framework
Privacy Preserving Vertical Federated Learning for Tree-based Models
Federated learning (FL) is an emerging paradigm that enables multiple
organizations to jointly train a model without revealing their private data to
each other. This paper studies {\it vertical} federated learning, which tackles
the scenarios where (i) collaborating organizations own data of the same set of
users but with disjoint features, and (ii) only one organization holds the
labels. We propose Pivot, a novel solution for privacy preserving vertical
decision tree training and prediction, ensuring that no intermediate
information is disclosed other than those the clients have agreed to release
(i.e., the final tree model and the prediction output). Pivot does not rely on
any trusted third party and provides protection against a semi-honest adversary
that may compromise out of clients. We further identify two privacy
leakages when the trained decision tree model is released in plaintext and
propose an enhanced protocol to mitigate them. The proposed solution can also
be extended to tree ensemble models, e.g., random forest (RF) and gradient
boosting decision tree (GBDT) by treating single decision trees as building
blocks. Theoretical and experimental analysis suggest that Pivot is efficient
for the privacy achieved.Comment: Proc. VLDB Endow. 13(11): 2090-2103 (2020
Secure and Efficient Federated Transfer Learning
Machine Learning models require a vast amount of data for accurate training.
In reality, most data is scattered across different organizations and cannot be
easily integrated under many legal and practical constraints. Federated
Transfer Learning (FTL) was introduced in [1] to improve statistical models
under a data federation that allow knowledge to be shared without compromising
user privacy, and enable complementary knowledge to be transferred in the
network. As a result, a target-domain party can build more flexible and
powerful models by leveraging rich labels from a source-domain party. However,
the excessive computational overhead of the security protocol involved in this
model rendered it impractical. In this work, we aim towards enhancing the
efficiency and security of existing models for practical collaborative training
under a data federation by incorporating Secret Sharing (SS). In literature,
only the semi-honest model for Federated Transfer Learning has been considered.
In this paper, we improve upon the previous solution, and also allow malicious
players who can arbitrarily deviate from the protocol in our FTL model. This is
much stronger than the semi-honest model where we assume that parties follow
the protocol precisely. We do so using the one of the practical MPC protocol
called SPDZ, thus our model can be efficiently extended to any number of
parties even in the case of a dishonest majority. In addition, the models
evaluated in our setting significantly outperform the previous work, in terms
of both runtime and communication cost. A single iteration in our model
executes in 0.8 seconds for the semi-honest case and 1.4 seconds for the
malicious case for 500 samples, as compared to 35 seconds taken by the previous
implementation.Comment: Special Track on Federated Machine Learning in IEEE BigData 201
Data Querying and Access Control for Secure Multiparty Computation
In the Internet of Things and smart environments data, collected from
distributed sensors, is typically stored and processed by a central middleware.
This allows applications to query the data they need for providing further
services. However, centralization of data causes several privacy threats: The
middleware becomes a third party which has to be trusted, linkage and
correlation of data from different context becomes possible and data subject
lose control over their data.
Hence, other approaches than centralized processing should be considered.
Here, Secure Multiparty Computation is a promising candidate for secure and
privacy-preserving computation happening close to the sources of the data.
In order to make SMC fit for application in these contexts, we extend SMC to
act as a service: We provide elements which allow third parties to query
computed data from a group of peers performing SMC. Furthermore, we establish
fine-granular access control on the level of individual data queries, yielding
data protection of the computed results. By adding measures to inform data
sources about requests and the usage of their data, we show how a fully
privacy-preserving service can be built on the foundation of SMC
Accelerating 2PC-based ML with Limited Trusted Hardware
This paper describes the design, implementation, and evaluation of Otak, a
system that allows two non-colluding cloud providers to run machine learning
(ML) inference without knowing the inputs to inference. Prior work for this
problem mostly relies on advanced cryptography such as two-party secure
computation (2PC) protocols that provide rigorous guarantees but suffer from
high resource overhead. Otak improves efficiency via a new 2PC protocol that
(i) tailors recent primitives such as function and homomorphic secret sharing
to ML inference, and (ii) uses trusted hardware in a limited capacity to
bootstrap the protocol. At the same time, Otak reduces trust assumptions on
trusted hardware by running a small code inside the hardware, restricting its
use to a preprocessing step, and distributing trust over heterogeneous trusted
hardware platforms from different vendors. An implementation and evaluation of
Otak demonstrates that its CPU and network overhead converted to a dollar
amount is 5.4385 lower than state-of-the-art 2PC-based works.
Besides, Otak's trusted computing base (code inside trusted hardware) is only
1,300 lines of code, which is 14.629.2 lower than the code-size in
prior trusted hardware-based works.Comment: 19 page
- …