Search CORE

19 research outputs found

Secure Computation Protocols for Privacy-Preserving Machine Learning

Author: Schoppmann Phillipp
Publication venue: Humboldt-Universität zu Berlin
Publication date: 08/10/2021
Field of study

Machine Learning (ML) profitiert erheblich von der Verfügbarkeit großer Mengen an Trainingsdaten, sowohl im Bezug auf die Anzahl an Datenpunkten, als auch auf die Anzahl an Features pro Datenpunkt. Es ist allerdings oft weder möglich, noch gewollt, mehr Daten unter zentraler Kontrolle zu aggregieren. Multi-Party-Computation (MPC)-Protokolle stellen eine Lösung dieses Dilemmas in Aussicht, indem sie es mehreren Parteien erlauben, ML-Modelle auf der Gesamtheit ihrer Daten zu trainieren, ohne die Eingabedaten preiszugeben. Generische MPC-Ansätze bringen allerdings erheblichen Mehraufwand in der Kommunikations- und Laufzeitkomplexität mit sich, wodurch sie sich nur beschränkt für den Einsatz in der Praxis eignen. Das Ziel dieser Arbeit ist es, Privatsphäreerhaltendes Machine Learning mittels MPC praxistauglich zu machen. Zuerst fokussieren wir uns auf zwei Anwendungen, lineare Regression und Klassifikation von Dokumenten. Hier zeigen wir, dass sich der Kommunikations- und Rechenaufwand erheblich reduzieren lässt, indem die aufwändigsten Teile der Berechnung durch Sub-Protokolle ersetzt werden, welche auf die Zusammensetzung der Parteien, die Verteilung der Daten, und die Zahlendarstellung zugeschnitten sind. Insbesondere das Ausnutzen dünnbesetzter Datenrepräsentationen kann die Effizienz der Protokolle deutlich verbessern. Diese Beobachtung verallgemeinern wir anschließend durch die Entwicklung einer Datenstruktur für solch dünnbesetzte Daten, sowie dazugehöriger Zugriffsprotokolle. Aufbauend auf dieser Datenstruktur implementieren wir verschiedene Operationen der Linearen Algebra, welche in einer Vielzahl von Anwendungen genutzt werden. Insgesamt zeigt die vorliegende Arbeit, dass MPC ein vielversprechendes Werkzeug auf dem Weg zu Privatsphäre-erhaltendem Machine Learning ist, und die von uns entwickelten Protokolle stellen einen wesentlichen Schritt in diese Richtung dar.Machine learning (ML) greatly benefits from the availability of large amounts of training data, both in terms of the number of samples, and the number of features per sample. However, aggregating more data under centralized control is not always possible, nor desirable, due to security and privacy concerns, regulation, or competition. Secure multi-party computation (MPC) protocols promise a solution to this dilemma, allowing multiple parties to train ML models on their joint datasets while provably preserving the confidentiality of the inputs. However, generic approaches to MPC result in large computation and communication overheads, which limits the applicability in practice. The goal of this thesis is to make privacy-preserving machine learning with secure computation practical. First, we focus on two high-level applications, linear regression and document classification. We show that communication and computation overhead can be greatly reduced by identifying the costliest parts of the computation, and replacing them with sub-protocols that are tailored to the number and arrangement of parties, the data distribution, and the number representation used. One of our main findings is that exploiting sparsity in the data representation enables considerable efficiency improvements. We go on to generalize this observation, and implement a low-level data structure for sparse data, with corresponding secure access protocols. On top of this data structure, we develop several linear algebra algorithms that can be used in a wide range of applications. Finally, we turn to improving a cryptographic primitive named vector-OLE, for which we propose a novel protocol that helps speed up a wide range of secure computation tasks, within private machine learning and beyond. Overall, our work shows that MPC indeed offers a promising avenue towards practical privacy-preserving machine learning, and the protocols we developed constitute a substantial step in that direction

Dokumenten-Publikationsserver der Humboldt-Universität zu Berlin

VOLE-PSI: Fast OPRF and Circuit-PSI from Vector-OLE

Author: Peter Rindal
Phillipp Schoppmann
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 10/05/2021
Field of study

In this work we present a new construction for a batched Oblivious Pseudorandom Function (OPRF) based on Vector-OLE and the PaXoS data structure. We then use it in the standard transformation for achieving Private Set Intersection (PSI) from an OPRF. Our overall construction is highly efficient with

O(n)

communication and computation. We demonstrate that our protocol can achieve malicious security at only a very small overhead compared to the semi-honest variant. For input sizes

n = 2^{20}

, our malicious protocol needs 6.2 seconds and less than 59 MB communication. This corresponds to under 450 bits per element, which is the lowest number for any published PSI protocol (semi-honest or malicious) to date. Moreover, in theory our semi-honest (resp. malicious) protocol can achieve as low as 219 (resp. 260) bits per element for

n=2^{20}

at the added cost of interpolating a polynomial over

n

elements. As a second contribution, we present an extension where the output of the PSI is secret-shared between the two parties. This functionality is generally referred to as Circuit-PSI. It allows the parties to perform a subsequent MPC protocol on the secret-shared outputs, e.g., train a machine learning model. Our circuit PSI protocol builds on our OPRF construction along with another application of the PaXoS data structure. It achieves semi-honest security and allows for a highly efficient implementation, up to 3x faster than previous work

Cryptology ePrint Archive

Distributed Vector-OLE: Improved Constructions and Implementation

Author: Adrià Gascón
Leonie Reichert
Mariana Raykova
Phillipp Schoppmann
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 13/12/2019
Field of study

We investigate concretely efficient protocols for distributed oblivious linear evaluation over vectors (Vector-OLE). Boyle et al. (CCS 2018) proposed a protocol for secure distributed pseudorandom Vector-OLE generation using sublinear communication, but they did not provide an implementation. Their construction is based on a variant of the LPN assumption and assumes a distributed key generation protocol for single-point Function Secret Sharing (FSS), as well as an efficient batching scheme to obtain multi-point FSS. We show that this requirement can be relaxed, resulting in a weaker variant of FSS, for which we give an efficient protocol. This allows us to use efficient probabilistic batch codes that were also recently used for batched PIR by Angel et al. (S&P 2018). We construct a full Vector-OLE generator from our protocols, and compare it experimentally with alternative approaches. Our implementation parallelizes very well, and has low communication overhead in practice. For generating a VOLE of size

2^{20}

, our implementation only takes

0.52

s on 32 cores

Cryptology ePrint Archive

Verifiable Distributed Aggregation Functions

Author: Christopher Patton
Hannah Davis
Mike Rosulek
Phillipp Schoppmann
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 13/07/2023
Field of study

The modern Internet is built on systems that incentivize collection of information about users. In order to minimize privacy loss, it is desirable to prevent these systems from collecting more information than is required for the application. The promise of multi-party computation is that data can be aggregated without revealing individual measurements to the data collector. This work offers a provable security treatment for Verifiable Distributed Aggregation Functions (VDAFs) , a class of multi-party computation protocols being considered for standardization by the IETF. We propose a formal framework for the analysis of VDAFs and apply it to two constructions. The first is Prio3, one of the candidates for standardization. This VDAF is based on the Prio system of Corrigan-Gibbs and Boneh (NSDI 2017). We prove that Prio3 achieves our security goals with only minor changes to the draft. The second construction, called Doplar, is introduced by this paper. Doplar is a round-reduced variant of the Poplar system of Boneh et al. (IEEE S&P 2021), itself a candidate for standardization. The cost of this improvement is a modest increase in overall bandwidth and computation

Cryptology ePrint Archive

Make Some ROOM for the Zeros: Data Sparsity in Secure Distributed Machine Learning

Author: Adria Gascon
Benny Pinkas
Mariana Raykova
Phillipp Schoppmann
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 13/12/2019
Field of study

Exploiting data sparsity is crucial for the scalability of many data analysis tasks. However, while there is an increasing interest in efficient secure computation protocols for distributed machine learning, data sparsity has so far not been considered in a principled way in that setting. We propose sparse data structures together with their corresponding secure computation protocols to address common data analysis tasks while utilizing data sparsity. In particular, we define a Read-Only Oblivious Map primitive (ROOM) for accessing elements in sparse structures, and present several instantiations of this primitive with different trade-offs. Then, using ROOM as a building block, we propose protocols for basic linear algebra operations such as Gather, Scatter, and multiple variants of sparse matrix multiplication. Our protocols are easily composable by using secret sharing. We leverage this, at the highest level of abstraction, to build secure end-to-end protocols for non-parametric models (

k

-nearest neighbors and naive Bayes classification) and parametric models (logistic regression) that enable secure analysis on high-dimensional datasets. The experimental evaluation of our protocol implementations demonstrates a manyfold improvement in the efficiency over state-of-the-art techniques across all applications. Our system is designed and built mirroring the modular architecture in scientific computing and machine learning frameworks, and inspired by the Sparse BLAS standard

Cryptology ePrint Archive

Secure and Scalable Document Similarity on Distributed Databases: Differential Privacy to the Rescue

Author: Adrià Gascón
Borja Balle
Lennart Vogelsang
Phillipp Schoppmann
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 18/04/2020
Field of study

Privacy-preserving collaborative data analysis enables richer models than what each party can learn with their own data. Secure Multi-Party Computation (MPC) offers a robust cryptographic approach to this problem, and in fact several protocols have been proposed for various data analysis and machine learning tasks. In this work, we focus on secure similarity computation between text documents, and the application to

k

-nearest neighbors (\knn) classification. Due to its non-parametric nature, \knn presents scalability challenges in the MPC setting. Previous work addresses these by introducing non-standard assumptions about the abilities of an attacker, for example by relying on non-colluding servers. In this work, we tackle the scalability challenge from a different angle, and instead introduce a secure preprocessing phase that reveals differentially private (DP) statistics about the data. This allows us to exploit the inherent sparsity of text data and significantly speed up all subsequent classifications

Cryptology ePrint Archive

Communication-Efficient Secure Logistic Regression

Author: Amit Agarwal
Karn Seth
Mariana Raykova
Phillipp Schoppmann
Stanislav Peceny
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 26/01/2023
Field of study

We present a new construction for secure logistic regression training, which enables two parties to train a model on private secret-shared data. Our goal is to minimize online communication and round complexity, while still allowing for an efficient offline phase. As part of our construction we develop many building blocks of independent interest. These include a new approximation technique for the sigmoid function, which results in a secure protocol with better communication; secure spline evaluation and secure powers computation protocols for fixed-point values; and a new comparison protocol that optimizes online communication. We also present a new two-party protocol for generating keys for distributed point functions (DPFs) over arithmetic sharing, where previous constructions do this only for Boolean outputs. We implement our protocol in an end-to-end system and benchmark its efficiency. We can securely evaluate a sigmoid in

18

ms online time and

0.5

KB of online communication. Our system can train a model over a database with

70,000

samples and

15

features with online communication of

208.09

MB and online time of

2.24

hours at the cost of

6.11

c over WAN. Our benchmarks demonstrate that we reduce online communication over state of the art by

\approx 10 \times

for sigmoid and

\approx38\times

for logistic regression training

Cryptology ePrint Archive

Distributed, Private, Sparse Histograms in the Two-Server Model

Author: Adria Gascon
Badih Ghazi
James Bell
Mariana Raykova
Pasin Manurangsi
Phillipp Schoppmann
Ravi Kumar
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 08/09/2022
Field of study

We consider the computation of sparse,

(\varepsilon, \delta)

-differentially private~(DP) histograms in the two-server model of secure multi-party computation~(MPC), which has recently gained traction in the context of privacy-preserving measurements of aggregate user data. We introduce protocols that enable two semi-honest non-colluding servers to compute histograms over the data held by multiple users, while only learning a private view of the data. Our solution achieves the same asymptotic

\ell_\infty

-error of

O\left(\frac{\log(1/\delta)}{\varepsilon}\right)

as in the central model of DP, but \emph{without} relying on a trusted curator. The server communication and computation costs of our protocol are independent of the number of histogram buckets, and are linear in the number of users, while the client cost is independent of the number of users,

\varepsilon

, and

\delta

. Its linear dependence on the number of users lets our protocol scale well, which we confirm using microbenchmarks: for a billion users,

\varepsilon = 0.5

, and

\delta = 10^{-11}

, the per-user cost of our protocol is only

1.08

ms of server computation and

339

bytes of communication. In contrast, a baseline protocol using garbled circuits only allows up to

10^6

users, where it requires 600 KB communication per user

Cryptology ePrint Archive

Communication--Computation Trade-offs in PIR

Author: Asra Ali
Karn Seth
Kevin Yeo
Mariana Raykova
Phillipp Schoppmann
Sarvar Patel
Tancrède Lepoint
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 03/08/2021
Field of study

We study the computation and communication costs and their possible trade-offs in various constructions for private information retrieval (PIR), including schemes based on homomorphic encryption and the Gentry--Ramzan PIR (ICALP\u2705). We improve over the construction of SealPIR (S&P\u2718) using compression techniques and a new oblivious expansion, which reduce the communication bandwidth by 60% while preserving essentially the same computation cost. We then present MulPIR, a PIR protocol leveraging multiplicative homomorphism to implement the recursion steps in PIR. This eliminates the exponential dependence of PIR communication on the recursion depth due to the ciphertext expansion, at the cost of an increased computational cost for the server. Additionally, MulPIR outputs a regular homomorphic encryption ciphertext, which can be homomorphically post-processed. As a side result, we describe how to do conjunctive and disjunctive PIR queries. On the other end of the communication--computation spectrum, we take a closer look at Gentry--Ramzan PIR, a scheme with asymptotically optimal communication rate. Here, the bottleneck is the server\u27s computation, which we manage to reduce significantly. Our optimizations enable a tunable trade-off between communication and computation, which allows us to reduce server computation by as much as 85%, at the cost of an increased query size. We further show how to efficiently construct PIR for sparse databases. Our constructions support batched queries, as well as symmetric PIR. We implement all of our PIR constructions, and compare their communication and computation overheads with respect to each other for several application scenarios

Cryptology ePrint Archive

QUOTIENT: Two-Party Secure Neural Network Training and Prediction

Author: Abadi Mart'in
Alistarh Dan
Bernstein Jeremy
Chi-Chih Yao Andrew
Cruz-Roa Angel
Demmler Daniel
Gilad-Bachrach Ran
Gupta Suyog
Hou Lu
Kilbertus Niki
Kingma Diederik P
Kolesnikov Vladimir
Quinlan J. Ross
Reddi Sashank J.
Riazi M Sadegh
Sanyal Amartya
Schoppmann Phillipp
Wagh Sameer
Wen Wei
Publication venue
Publication date: 07/07/2019
Field of study

Recently, there has been a wealth of effort devoted to the design of secure protocols for machine learning tasks. Much of this is aimed at enabling secure prediction from highly-accurate Deep Neural Networks (DNNs). However, as DNNs are trained on data, a key question is how such models can be also trained securely. The few prior works on secure DNN training have focused either on designing custom protocols for existing training algorithms, or on developing tailored training algorithms and then applying generic secure protocols. In this work, we investigate the advantages of designing training algorithms alongside a novel secure protocol, incorporating optimizations on both fronts. We present QUOTIENT, a new method for discretized training of DNNs, along with a customized secure two-party protocol for it. QUOTIENT incorporates key components of state-of-the-art DNN training such as layer normalization and adaptive gradient methods, and improves upon the state-of-the-art in DNN training in two-party computation. Compared to prior work, we obtain an improvement of 50X in WAN time and 6% in absolute accuracy

arXiv.org e-Print Archive

Crossref

UCL Discovery

Oxford University Research Archive