98 research outputs found

    PEPSI: Practically Efficient Private Set Intersection in the Unbalanced Setting

    Full text link
    Two parties with private data sets can find shared elements using a Private Set Intersection (PSI) protocol without revealing any information beyond the intersection. Circuit PSI protocols privately compute an arbitrary function of the intersection - such as its cardinality, and are often employed in an unbalanced setting where one party has more data than the other. Existing protocols are either computationally inefficient or require extensive server-client communication on the order of the larger set. We introduce Practically Efficient PSI or PEPSI, a non-interactive solution where only the client sends its encrypted data. PEPSI can process an intersection of 1024 client items with a million server items in under a second, using less than 5 MB of communication. Our work is over 4 orders of magnitude faster than an existing non-interactive circuit PSI protocol and requires only 10% of the communication. It is also up to 20 times faster than the work of Ion et al., which computes a limited set of functions and has communication costs proportional to the larger set. Our work is the first to demonstrate that non-interactive circuit PSI can be practically applied in an unbalanced setting

    Private set intersection: A systematic literature review

    Get PDF
    Secure Multi-party Computation (SMPC) is a family of protocols which allow some parties to compute a function on their private inputs, obtaining the output at the end and nothing more. In this work, we focus on a particular SMPC problem named Private Set Intersection (PSI). The challenge in PSI is how two or more parties can compute the intersection of their private input sets, while the elements that are not in the intersection remain private. This problem has attracted the attention of many researchers because of its wide variety of applications, contributing to the proliferation of many different approaches. Despite that, current PSI protocols still require heavy cryptographic assumptions that may be unrealistic in some scenarios. In this paper, we perform a Systematic Literature Review of PSI solutions, with the objective of analyzing the main scenarios where PSI has been studied and giving the reader a general taxonomy of the problem together with a general understanding of the most common tools used to solve it. We also analyze the performance using different metrics, trying to determine if PSI is mature enough to be used in realistic scenarios, identifying the pros and cons of each protocol and the remaining open problems.This work has been partially supported by the projects: BIGPrivDATA (UMA20-FEDERJA-082) from the FEDER Andalucía 2014– 2020 Program and SecTwin 5.0 funded by the Ministry of Science and Innovation, Spain, and the European Union (Next Generation EU) (TED2021-129830B-I00). The first author has been funded by the Spanish Ministry of Education under the National F.P.U. Program (FPU19/01118). Funding for open access charge: Universidad de Málaga/CBU

    Secure and Scalable Circuit-based Protocol for Multi-Party Private Set Intersection

    Full text link
    We propose a novel protocol for computing a circuit which implements the multi-party private set intersection functionality (PSI). Circuit-based approach has advantages over using custom protocols to achieve this task, since many applications of PSI do not require the computation of the intersection itself, but rather specific functional computations over the items in the intersection. Our protocol represents the pioneering circuit-based multi-party PSI protocol, which builds upon and optimizes the two-party SCS \cite{huang2012private} protocol. By using secure computation between two parties, our protocol sidesteps the complexities associated with multi-party interactions and demonstrates good scalability. In order to mitigate the high overhead associated with circuit-based constructions, we have further enhanced our protocol by utilizing simple hashing scheme and permutation-based hash functions. These tricks have enabled us to minimize circuit size by employing bucketing techniques while simultaneously attaining noteworthy reductions in both computation and communication expenses

    Secure Computation Protocols for Privacy-Preserving Machine Learning

    Get PDF
    Machine Learning (ML) profitiert erheblich von der Verfügbarkeit großer Mengen an Trainingsdaten, sowohl im Bezug auf die Anzahl an Datenpunkten, als auch auf die Anzahl an Features pro Datenpunkt. Es ist allerdings oft weder möglich, noch gewollt, mehr Daten unter zentraler Kontrolle zu aggregieren. Multi-Party-Computation (MPC)-Protokolle stellen eine Lösung dieses Dilemmas in Aussicht, indem sie es mehreren Parteien erlauben, ML-Modelle auf der Gesamtheit ihrer Daten zu trainieren, ohne die Eingabedaten preiszugeben. Generische MPC-Ansätze bringen allerdings erheblichen Mehraufwand in der Kommunikations- und Laufzeitkomplexität mit sich, wodurch sie sich nur beschränkt für den Einsatz in der Praxis eignen. Das Ziel dieser Arbeit ist es, Privatsphäreerhaltendes Machine Learning mittels MPC praxistauglich zu machen. Zuerst fokussieren wir uns auf zwei Anwendungen, lineare Regression und Klassifikation von Dokumenten. Hier zeigen wir, dass sich der Kommunikations- und Rechenaufwand erheblich reduzieren lässt, indem die aufwändigsten Teile der Berechnung durch Sub-Protokolle ersetzt werden, welche auf die Zusammensetzung der Parteien, die Verteilung der Daten, und die Zahlendarstellung zugeschnitten sind. Insbesondere das Ausnutzen dünnbesetzter Datenrepräsentationen kann die Effizienz der Protokolle deutlich verbessern. Diese Beobachtung verallgemeinern wir anschließend durch die Entwicklung einer Datenstruktur für solch dünnbesetzte Daten, sowie dazugehöriger Zugriffsprotokolle. Aufbauend auf dieser Datenstruktur implementieren wir verschiedene Operationen der Linearen Algebra, welche in einer Vielzahl von Anwendungen genutzt werden. Insgesamt zeigt die vorliegende Arbeit, dass MPC ein vielversprechendes Werkzeug auf dem Weg zu Privatsphäre-erhaltendem Machine Learning ist, und die von uns entwickelten Protokolle stellen einen wesentlichen Schritt in diese Richtung dar.Machine learning (ML) greatly benefits from the availability of large amounts of training data, both in terms of the number of samples, and the number of features per sample. However, aggregating more data under centralized control is not always possible, nor desirable, due to security and privacy concerns, regulation, or competition. Secure multi-party computation (MPC) protocols promise a solution to this dilemma, allowing multiple parties to train ML models on their joint datasets while provably preserving the confidentiality of the inputs. However, generic approaches to MPC result in large computation and communication overheads, which limits the applicability in practice. The goal of this thesis is to make privacy-preserving machine learning with secure computation practical. First, we focus on two high-level applications, linear regression and document classification. We show that communication and computation overhead can be greatly reduced by identifying the costliest parts of the computation, and replacing them with sub-protocols that are tailored to the number and arrangement of parties, the data distribution, and the number representation used. One of our main findings is that exploiting sparsity in the data representation enables considerable efficiency improvements. We go on to generalize this observation, and implement a low-level data structure for sparse data, with corresponding secure access protocols. On top of this data structure, we develop several linear algebra algorithms that can be used in a wide range of applications. Finally, we turn to improving a cryptographic primitive named vector-OLE, for which we propose a novel protocol that helps speed up a wide range of secure computation tasks, within private machine learning and beyond. Overall, our work shows that MPC indeed offers a promising avenue towards practical privacy-preserving machine learning, and the protocols we developed constitute a substantial step in that direction

    Private Set Intersection with Linear Communication from General Assumptions

    Get PDF
    This work presents a hashing-based algorithm for Private Set Intersection (PSI) in the honest-but-curious setting. The protocol is generic, modular and provides both asymptotic and concrete efficiency improvements over existing PSI protocols. If each player has mm elements, our scheme requires only O(m \secpar) communication between the parties, where \secpar is a security parameter. Our protocol builds on the hashing-based PSI protocol of Pinkas et al. (USENIX 2014, USENIX 2015), but we replace one of the sub-protocols (handling the cuckoo ``stash\u27\u27) with a special-purpose PSI protocol that is optimized for comparing sets of unbalanced size. This brings the asymptotic communication complexity of the overall protocol down from \omega(m \secpar) to O(m\secpar), and provides concrete performance improvements (10-15\% reduction in communication costs) over Kolesnikov et al. (CCS 2016) under real-world parameter choices. Our protocol is simple, generic and benefits from the permutation-hashing optimizations of Pinkas et al. (USENIX 2015) and the Batched, Relaxed Oblivious Pseudo Random Functions of Kolesnikov et al. (CCS 2016)

    Unbalanced Circuit-PSI from Oblivious Key-Value Retrieval

    Get PDF
    Circuit-based Private Set Intersection (circuit-PSI) enables two parties, a client and a server, with their input sets XX and YY respectively, to securely compute a function ff on the intersection XYX \cap Y, while keeping XYX \cap Y secret from both parties. Although several computationally efficient circuit-PSI protocols have been proposed recently, they most focus on the balanced scenario where X|X| is similar to Y|Y|. However, in many realistic scenarios, a circuit-PSI protocol may be performed in the unbalanced case where X|X| is remarkably smaller than Y|Y| (e.g., the client is a constrained device holding a small set, while the server is a service provider holding a large set). Directly applying existing protocols to this scenario will lead to significant efficiency issues because the communication complexity of the protocols scales at least linearly with the size of the larger set, i.e., max(X,Y)\max(|X|, |Y|). In this work, we put forth efficient constructions for unbalanced circuit-PSI with sublinear communication complexity in the size of the larger set. The main insight is that we formalize unbalanced circuit-PSI as obliviously retrieving values corresponding to keys from a set of key-value pairs. To this end, we present a new functionality called Oblivious Key-Value Retrieval (OKVR) and design the OKVR protocol from a new notion called sparse Oblivious Key-Value Stores (sparse OKVS). We conduct extensive experiments and the results show that our constructions remarkably outperform the state-of-the-art circuit-PSI schemes (EUROCRYPT\u2719, PETs\u2722, CCS\u2722), i.e., 1.8448.86×1.84 \sim 48.86 \times communication improvement and 1.5039.81×1.50 \sim39.81 \times faster computation. Very recently, Son and Jeong (AsiaCCS\u2723) also present unbalanced circuit-PSI protocols, and our constructions outperform them by 1.1815.99×1.18 \sim 15.99 \times and 1.2210.44×1.22 \sim 10.44 \times in communication and computation overhead, respectively, depending on set sizes and network environments

    PSI from PaXoS: Fast, Malicious Private Set Intersection

    Get PDF
    We present a 2-party private set intersection (PSI) protocol which provides security against malicious participants, yet is almost as fast as the fastest known semi-honest PSI protocol of Kolesnikov et al. (CCS 2016). Our protocol is based on a new approach for two-party PSI, which can be instantiated to provide security against either malicious or semi-honest adversaries. The protocol is unique in that the only difference between the semi-honest and malicious versions is an instantiation with different parameters for a linear error-correction code. It is also the first PSI protocol which is concretely efficient while having linear communication and security against malicious adversaries, while running in the OT-hybrid model (assuming a non-programmable random oracle). State of the art semi-honest PSI protocols take advantage of cuckoo hashing, but it has proven a challenge to use cuckoo hashing for malicious security. Our protocol is the first to use cuckoo hashing for malicious-secure PSI. We do so via a new data structure, called a probe-and-XOR of strings (PaXoS), which may be of independent interest. This abstraction captures important properties of previous data structures, most notably garbled Bloom filters. While an encoding by a garbled Bloom filter is larger by a factor of O(λ)O(\lambda) than the original data, we describe a significantly improved PaXoS based on cuckoo hashing that achieves constant rate while being no worse in other relevant efficiency measures
    corecore