127 research outputs found
PEPSI: Practically Efficient Private Set Intersection in the Unbalanced Setting
Two parties with private data sets can find shared elements using a Private
Set Intersection (PSI) protocol without revealing any information beyond the
intersection. Circuit PSI protocols privately compute an arbitrary function of
the intersection - such as its cardinality, and are often employed in an
unbalanced setting where one party has more data than the other. Existing
protocols are either computationally inefficient or require extensive
server-client communication on the order of the larger set. We introduce
Practically Efficient PSI or PEPSI, a non-interactive solution where only the
client sends its encrypted data. PEPSI can process an intersection of 1024
client items with a million server items in under a second, using less than 5
MB of communication. Our work is over 4 orders of magnitude faster than an
existing non-interactive circuit PSI protocol and requires only 10% of the
communication. It is also up to 20 times faster than the work of Ion et al.,
which computes a limited set of functions and has communication costs
proportional to the larger set. Our work is the first to demonstrate that
non-interactive circuit PSI can be practically applied in an unbalanced
setting
Private set intersection: A systematic literature review
Secure Multi-party Computation (SMPC) is a family of protocols which allow some parties to compute a function on their private inputs, obtaining the output at the end and nothing more. In this work, we focus on a particular SMPC problem named Private Set Intersection (PSI). The challenge in PSI is how two or more parties can compute the intersection of their private input sets, while the elements that are not in the intersection remain private. This problem has attracted the attention of many researchers because of its wide variety of applications, contributing to the proliferation of many different approaches. Despite that, current PSI protocols still require heavy cryptographic assumptions that may be unrealistic in some scenarios. In this paper, we perform a Systematic Literature Review of PSI solutions, with the objective of analyzing the main scenarios where PSI has been studied and giving the reader a general taxonomy of the problem together with a general understanding of the most common tools used to solve it. We also analyze the performance using different metrics, trying to determine if PSI is mature enough to be used in realistic scenarios, identifying the pros and cons of each protocol and the remaining open problems.This work has been partially supported by the projects: BIGPrivDATA (UMA20-FEDERJA-082) from the FEDER Andalucía 2014–
2020 Program and SecTwin 5.0 funded by the Ministry of Science and Innovation, Spain, and the European Union (Next Generation EU) (TED2021-129830B-I00). The first author has been funded by the Spanish Ministry of Education under the National F.P.U. Program (FPU19/01118). Funding for open access charge: Universidad de Málaga/CBU
Secure and Scalable Circuit-based Protocol for Multi-Party Private Set Intersection
We propose a novel protocol for computing a circuit which implements the
multi-party private set intersection functionality (PSI). Circuit-based
approach has advantages over using custom protocols to achieve this task, since
many applications of PSI do not require the computation of the intersection
itself, but rather specific functional computations over the items in the
intersection.
Our protocol represents the pioneering circuit-based multi-party PSI
protocol, which builds upon and optimizes the two-party SCS
\cite{huang2012private} protocol. By using secure computation between two
parties, our protocol sidesteps the complexities associated with multi-party
interactions and demonstrates good scalability.
In order to mitigate the high overhead associated with circuit-based
constructions, we have further enhanced our protocol by utilizing simple
hashing scheme and permutation-based hash functions. These tricks have enabled
us to minimize circuit size by employing bucketing techniques while
simultaneously attaining noteworthy reductions in both computation and
communication expenses
Secure Computation Protocols for Privacy-Preserving Machine Learning
Machine Learning (ML) profitiert erheblich von der Verfügbarkeit großer Mengen an Trainingsdaten, sowohl im Bezug auf die Anzahl an Datenpunkten, als auch auf die Anzahl an Features pro Datenpunkt. Es ist allerdings oft weder möglich, noch gewollt, mehr Daten unter zentraler Kontrolle zu aggregieren. Multi-Party-Computation (MPC)-Protokolle stellen eine Lösung dieses Dilemmas in Aussicht, indem sie es mehreren Parteien erlauben, ML-Modelle auf der Gesamtheit ihrer Daten zu trainieren, ohne die Eingabedaten preiszugeben. Generische MPC-Ansätze bringen allerdings erheblichen Mehraufwand in der Kommunikations- und Laufzeitkomplexität mit sich, wodurch sie sich nur beschränkt für den Einsatz in der Praxis eignen.
Das Ziel dieser Arbeit ist es, Privatsphäreerhaltendes Machine Learning mittels MPC praxistauglich zu machen. Zuerst fokussieren wir uns auf zwei Anwendungen, lineare Regression und Klassifikation von Dokumenten. Hier zeigen wir, dass sich der Kommunikations- und Rechenaufwand erheblich reduzieren lässt, indem die aufwändigsten Teile der Berechnung durch Sub-Protokolle ersetzt werden, welche auf die Zusammensetzung der Parteien, die Verteilung der Daten, und die Zahlendarstellung zugeschnitten sind. Insbesondere das Ausnutzen dünnbesetzter Datenrepräsentationen kann die Effizienz der Protokolle deutlich verbessern. Diese Beobachtung verallgemeinern wir anschließend durch die Entwicklung einer Datenstruktur für solch dünnbesetzte Daten, sowie dazugehöriger Zugriffsprotokolle. Aufbauend auf dieser Datenstruktur implementieren wir verschiedene Operationen der Linearen Algebra, welche in einer Vielzahl von Anwendungen genutzt werden.
Insgesamt zeigt die vorliegende Arbeit, dass MPC ein vielversprechendes Werkzeug auf dem Weg zu Privatsphäre-erhaltendem Machine Learning ist, und die von uns entwickelten Protokolle stellen einen wesentlichen Schritt in diese Richtung dar.Machine learning (ML) greatly benefits from the availability of large amounts of training data, both in terms of the number of samples, and the number of features per sample. However, aggregating more data under centralized control is not always possible, nor desirable, due to security and privacy concerns, regulation, or competition. Secure multi-party computation (MPC) protocols promise a solution to this dilemma, allowing multiple parties to train ML models on their joint datasets while provably preserving the confidentiality of the inputs. However, generic approaches to MPC result in large computation and communication overheads, which limits the applicability in practice.
The goal of this thesis is to make privacy-preserving machine learning with secure computation practical. First, we focus on two high-level applications, linear regression and document classification. We show that communication and computation overhead can be greatly reduced by identifying the costliest parts of the computation, and replacing them with sub-protocols that are tailored to the number and arrangement of parties, the data distribution, and the number representation used. One of our main findings is that exploiting sparsity in the data representation enables considerable efficiency improvements. We go on to generalize this observation, and implement a low-level data structure for sparse data, with corresponding secure access protocols. On top of this data structure, we develop several linear algebra algorithms that can be used in a wide range of applications. Finally, we turn to improving a cryptographic primitive named vector-OLE, for which we propose a novel protocol that helps speed up a wide range of secure computation tasks, within private machine learning and beyond.
Overall, our work shows that MPC indeed offers a promising avenue towards practical privacy-preserving machine learning, and the protocols we developed constitute a substantial step in that direction
Circuit-PSI with Linear Complexity via Relaxed Batch OPPRF
In -party Circuit-based Private Set Intersection (Circuit-PSI), and hold sets and respectively and wish to securely compute a function over the set (e.g., cardinality, sum over associated attributes, or threshold intersection). Following a long line of work, Pinkas et al. (, Eurocrypt 2019) showed how to construct a concretely efficient Circuit-PSI protocol with linear communication complexity. However, their protocol requires super-linear computation.
In this work, we construct concretely efficient Circuit-PSI protocols with linear computational and communication cost. Further, our protocols are more performant than the state-of-the-art, -- we are more communication efficient and are up to faster. We obtain our improvements through a new primitive called Relaxed Batch Oblivious Programmable Pseudorandom Functions () that can be seen as a strict generalization of Batch s that were used in .
We believe that this primitive could be of independent interest
Fast Private Set Intersection from Homomorphic Encryption
Private Set Intersection (PSI) is a cryptographic technique that allows two parties to compute the intersection of their sets without revealing anything except the intersection. We use fully homomorphic encryption to construct a fast PSI protocol with a small communication overhead that works particularly well when one of the two sets is much smaller than the other, and is secure against semi-honest adversaries.
The most computationally efficient PSI protocols have been constructed using tools such as hash functions and oblivious transfer, but a potential limitation with these approaches is the communication complexity, which scales linearly with the size of the larger set. This is of particular concern when performing PSI between a constrained device (cellphone) holding a small set, and a large service provider (e.g. \emph{WhatsApp}), such as in the Private Contact Discovery application.
Our protocol has communication complexity linear in the size of the smaller set, and logarithmic in the larger set. More precisely, if the set sizes are , we achieve a communication overhead of . Our running-time-optimized benchmarks show that it takes seconds of online-computation, seconds of non-interactive (receiver-independent) pre-processing, and only MB of round trip communication to intersect five thousand -bit strings with million -bit strings. Compared to prior works, this is roughly a -- reduction in communication with minimal difference in computational overhead
Private Set Intersection with Linear Communication from General Assumptions
This work presents a hashing-based algorithm for Private Set Intersection (PSI) in
the honest-but-curious setting. The protocol is generic, modular and provides both asymptotic
and concrete efficiency improvements over existing PSI protocols.
If each player has elements, our scheme requires only O(m \secpar) communication between the parties,
where \secpar is a security parameter.
Our protocol builds on the hashing-based PSI protocol of Pinkas et al. (USENIX 2014, USENIX 2015),
but we replace one of the sub-protocols (handling the cuckoo ``stash\u27\u27) with a special-purpose PSI protocol
that is optimized for comparing sets of unbalanced size.
This brings the asymptotic communication complexity of the overall protocol down from \omega(m \secpar) to O(m\secpar),
and provides concrete performance improvements (10-15\% reduction in communication costs) over Kolesnikov et al. (CCS 2016)
under real-world parameter choices.
Our protocol is simple, generic and benefits from the permutation-hashing optimizations of Pinkas et al. (USENIX 2015) and the
Batched, Relaxed Oblivious Pseudo Random Functions of Kolesnikov et al. (CCS 2016)
- …