95 research outputs found
Private set intersection: A systematic literature review
Secure Multi-party Computation (SMPC) is a family of protocols which allow some parties to compute a function on their private inputs, obtaining the output at the end and nothing more. In this work, we focus on a particular SMPC problem named Private Set Intersection (PSI). The challenge in PSI is how two or more parties can compute the intersection of their private input sets, while the elements that are not in the intersection remain private. This problem has attracted the attention of many researchers because of its wide variety of applications, contributing to the proliferation of many different approaches. Despite that, current PSI protocols still require heavy cryptographic assumptions that may be unrealistic in some scenarios. In this paper, we perform a Systematic Literature Review of PSI solutions, with the objective of analyzing the main scenarios where PSI has been studied and giving the reader a general taxonomy of the problem together with a general understanding of the most common tools used to solve it. We also analyze the performance using different metrics, trying to determine if PSI is mature enough to be used in realistic scenarios, identifying the pros and cons of each protocol and the remaining open problems.This work has been partially supported by the projects: BIGPrivDATA (UMA20-FEDERJA-082) from the FEDER Andalucía 2014–
2020 Program and SecTwin 5.0 funded by the Ministry of Science and Innovation, Spain, and the European Union (Next Generation EU) (TED2021-129830B-I00). The first author has been funded by the Spanish Ministry of Education under the National F.P.U. Program (FPU19/01118). Funding for open access charge: Universidad de Málaga/CBU
Secure and Efficient Comparisons between Untrusted Parties
A vast number of online services is based on users contributing their personal information. Examples are manifold, including social networks, electronic commerce, sharing websites, lodging platforms, and genealogy. In all cases user privacy depends on a collective trust upon all involved intermediaries, like service providers, operators, administrators or even help desk staff. A single adversarial party in the whole chain of trust voids user privacy. Even more, the number of intermediaries is ever growing. Thus, user privacy must be preserved at every time and stage, independent of the intrinsic goals any involved party. Furthermore, next to these new services, traditional offline analytic systems are replaced by online services run in large data centers. Centralized processing of electronic medical records, genomic data or other health-related information is anticipated due to advances in medical research, better analytic results based on large amounts of medical information and lowered costs. In these scenarios privacy is of utmost concern due to the large amount of personal information contained within the centralized data.
We focus on the challenge of privacy-preserving processing on genomic data, specifically comparing genomic sequences. The problem that arises is how to efficiently compare private sequences of two parties while preserving confidentiality of the compared data. It follows that the privacy of the data owner must be preserved, which means that as little information as possible must be leaked to any party participating in the comparison. Leakage can happen at several points during a comparison. The secured inputs for the comparing party might leak some information about the original input, or the output might leak information about the inputs. In the latter case, results of several comparisons can be combined to infer information about the confidential input of the party under observation. Genomic sequences serve as a use-case, but the proposed solutions are more general and can be applied to the generic field of privacy-preserving comparison of sequences. The solution should be efficient such that performing a comparison yields runtimes linear in the length of the input sequences and thus producing acceptable costs for a typical use-case. To tackle the problem of efficient, privacy-preserving sequence comparisons, we propose a framework consisting of three main parts.
a) The basic protocol presents an efficient sequence comparison algorithm, which transforms a sequence into a set representation, allowing to approximate distance measures over input sequences using distance measures over sets. The sets are then represented by an efficient data structure - the Bloom filter -, which allows evaluation of certain set operations without storing the actual elements of the possibly large set. This representation yields low distortion for comparing similar sequences. Operations upon the set representation are carried out using efficient, partially homomorphic cryptographic systems for data confidentiality of the inputs. The output can be adjusted to either return the actual approximated distance or the result of an in-range check of the approximated distance.
b) Building upon this efficient basic protocol we introduce the first mechanism to reduce the success of inference attacks by detecting and rejecting similar queries in a privacy-preserving way. This is achieved by generating generalized commitments for inputs. This generalization is done by treating inputs as messages received from a noise channel, upon which error-correction from coding theory is applied. This way similar inputs are defined as inputs having a hamming distance of their generalized inputs below a certain predefined threshold. We present a protocol to perform a zero-knowledge proof to assess if the generalized input is indeed a generalization of the actual input. Furthermore, we generalize a very efficient inference attack on privacy-preserving sequence comparison protocols and use it to evaluate our inference-control mechanism.
c) The third part of the framework lightens the computational load of the client taking part in the comparison protocol by presenting a compression mechanism for partially homomorphic cryptographic schemes. It reduces the transmission and storage overhead induced by the semantically secure homomorphic encryption schemes, as well as encryption latency. The compression is achieved by constructing an asymmetric stream cipher such that the generated ciphertext can be converted into a ciphertext of an associated homomorphic encryption scheme without revealing any information about the plaintext. This is the first compression scheme available for partially homomorphic encryption schemes. Compression of ciphertexts of fully homomorphic encryption schemes are several orders of magnitude slower at the conversion from the transmission ciphertext to the homomorphically encrypted ciphertext. Indeed our compression scheme achieves optimal conversion performance. It further allows to generate keystreams offline and thus supports offloading to trusted devices. This way transmission-, storage- and power-efficiency is improved.
We give security proofs for all relevant parts of the proposed protocols and algorithms to evaluate their security. A performance evaluation of the core components demonstrates the practicability of our proposed solutions including a theoretical analysis and practical experiments to show the accuracy as well as efficiency of approximations and probabilistic algorithms. Several variations and configurations to detect similar inputs are studied during an in-depth discussion of the inference-control mechanism. A human mitochondrial genome database is used for the practical evaluation to compare genomic sequences and detect similar inputs as described by the use-case.
In summary we show that it is indeed possible to construct an efficient and privacy-preserving (genomic) sequences comparison, while being able to control the amount of information that leaves the comparison. To the best of our knowledge we also contribute to the field by proposing the first efficient privacy-preserving inference detection and control mechanism, as well as the first ciphertext compression system for partially homomorphic cryptographic systems
Recommended from our members
Keeping your Friends Secret: Improving the Security, Effciency and Usability of Private Set Intersection
Private set intersection (PSI) allows two parties, who each hold a set of items, to compute the intersection of those sets without revealing anything about other items. Recent advances in PSI have significantly improved its performance for the case of semi-honest security, making semi-honest PSI a practical alternative to insecure methods for computing intersections. However, these protocols have two major drawbacks: 1) the amount of data required to be communicated can be orders of magnitude larger than an insecure solution and 2) when in the presence of malicious parties the security of these protocols breaks down.
In this work, four malicious secure PSI protocols are introduced along with three semi-honest protocols which have sublinear communication. These protocols are based on a combination of fast symmetric-key primitives and fully homomorphic encryption. Three of these protocols represent the current state of the art for their respective settings.
The practicality of these protocols are demonstrated with prototype implementations. To securely compute the intersection of two sets of size 2²⁰ in the malicious setting requires only 13 seconds, which is ~ 450x faster than the previous best malicious-secure protocol (De Cristofaro et al, Asiacrypt 2010), and only 3x slower than the best semi-honest protocol (Kolesnikov et al., CCS 2016). Alternatively, when computing the intersection between set sizes of 2¹⁰ and 2²⁸, our fastest protocol require just 6 seconds and 5MB of communication
Privacy-aware Security Applications in the Era of Internet of Things
In this dissertation, we introduce several novel privacy-aware security applications. We split these contributions into three main categories: First, to strengthen the current authentication mechanisms, we designed two novel privacy-aware alternative complementary authentication mechanisms, Continuous Authentication (CA) and Multi-factor Authentication (MFA). Our first system is Wearable-assisted Continuous Authentication (WACA), where we used the sensor data collected from a wrist-worn device to authenticate users continuously. Then, we improved WACA by integrating a noise-tolerant template matching technique called NTT-Sec to make it privacy-aware as the collected data can be sensitive. We also designed a novel, lightweight, Privacy-aware Continuous Authentication (PACA) protocol. PACA is easily applicable to other biometric authentication mechanisms when feature vectors are represented as fixed-length real-valued vectors. In addition to CA, we also introduced a privacy-aware multi-factor authentication method, called PINTA. In PINTA, we used fuzzy hashing and homomorphic encryption mechanisms to protect the users\u27 sensitive profiles while providing privacy-preserving authentication. For the second privacy-aware contribution, we designed a multi-stage privacy attack to smart home users using the wireless network traffic generated during the communication of the devices. The attack works even on the encrypted data as it is only using the metadata of the network traffic. Moreover, we also designed a novel solution based on the generation of spoofed traffic. Finally, we introduced two privacy-aware secure data exchange mechanisms, which allow sharing the data between multiple parties (e.g., companies, hospitals) while preserving the privacy of the individual in the dataset. These mechanisms were realized with the combination of Secure Multiparty Computation (SMC) and Differential Privacy (DP) techniques. In addition, we designed a policy language, called Curie Policy Language (CPL), to handle the conflicting relationships among parties.
The novel methods, attacks, and countermeasures in this dissertation were verified with theoretical analysis and extensive experiments with real devices and users. We believe that the research in this dissertation has far-reaching implications on privacy-aware alternative complementary authentication methods, smart home user privacy research, as well as the privacy-aware and secure data exchange methods
Recommended from our members
Secure Computation in Heterogeneous Environments: How to Bring Multiparty Computation Closer to Practice?
Many services that people use daily require computation that depends on the private data of multiple parties. While the utility of the final result of such interactions outweighs the privacy concerns related to output release, the inputs for such computations are much more sensitive and need to be protected. Secure multiparty computation (MPC) considers the question of constructing computation protocols that reveal nothing more about their inputs than what is inherently leaked by the output. There have been strong theoretical results that demonstrate that every functionality can be computed securely. However, these protocols remain unused in practical solutions since they introduce efficiency overhead prohibitive for most applications. Generic multiparty computation techniques address homogeneous setups with respect to the resources available to the participants and the adversarial model. On the other hand, realistic scenarios present a wide diversity of heterogeneous environments where different participants have different available resources and different incentives to misbehave and collude. In this thesis we introduce techniques for multiparty computation that focus on heterogeneous settings. We present solutions tailored to address different types of asymmetric constraints and improve the efficiency of existing approaches in these scenarios. We tackle the question from three main directions: New Computational Models for MPC - We explore different computational models that enable us to overcome inherent inefficiencies of generic MPC solutions using circuit representation for the evaluated functionality. First, we show how we can use random access machines to construct MPC protocols that add only polylogarithmic overhead to the running time of the insecure version of the underlying functionality. This allows to achieve MPC constructions with computational complexity sublinear in the size for their inputs, which is very important for computations that use large databases. We also consider multivariate polynomials which yield more succinct representations for the functionalities they implement than circuits, and at the same time a large collection of problems are naturally and efficiently expressed as multivariate polynomials. We construct an MPC protocol for multivariate polynomials, which improves the communication complexity of corresponding circuit solutions, and provides currently the most efficient solution for multiparty set intersection in the fully malicious case. Outsourcing Computation - The goal in this setting is to utilize the resources of a single powerful service provider for the work that computationally weak clients need to perform on their data. We present a new paradigm for constructing verifiable computation (VC) schemes, which enables a computationally limited client to verify efficiently the result of a large computation. Our construction is based on attribute-based encryption and avoids expensive primitives such as fully homomorphic encryption andprobabilistically checkable proofs underlying existing VC schemes. Additionally our solution enjoys two new useful properties: public delegation and verification. We further introduce the model of server-aided computation where we utilize the computational power of an outsourcing party to assist the execution and improve the efficiency of MPC protocols. For this purpose we define a new adversarial model of non-collusion, which provides room for more efficient constructions that rely almost completely only on symmetric key operations, and at the same time captures realistic settings for adversarial behavior. In this model we propose protocols for generic secure computation that offload the work of most of the parties to the computation server. We also construct a specialized server-aided two party set intersection protocol that achieves better efficiencies for the two participants than existing solutions. Outsourcing in many cases concerns only data storage and while outsourcing the data of a single party is useful, providing a way for data sharing among different clients of the service is the more interesting and useful setup. However, this scenario brings new challenges for access control since the access control rules and data accesses become private data for the clients with respect to the service provide. We propose an approach that offers trade-offs between the privacy provided for the clients and the communication overhead incurred for each data access. Efficient Private Search in Practice - We consider the question of private search from a different perspective compared to traditional settings for MPC. We start with strict efficiency requirements motivated by speeds of available hardware and what is considered acceptable overhead from practical point of view. Then we adopt relaxed definitions of privacy, which still provide meaningful security guarantees while allowing us to meet the efficiency requirements. In this setting we design a security architecture and implement a system for data sharing based on encrypted search, which achieves only 30% overhead compared to non-secure solutions on realistic workloads
Structure-Aware Private Set Intersection, With Applications to Fuzzy Matching
In two-party private set intersection (PSI), Alice holds a set , Bob holds a set , and they learn (only) the contents of .
We introduce structure-aware PSI protocols, which take advantage of situations where Alice\u27s set is publicly known to have a certain structure.
The goal of structure-aware PSI is to have communication that scales with the description size of Alice\u27s set, rather its cardinality.
We introduce a new generic paradigm for structure-aware PSI based on function secret-sharing (FSS).
In short, if there exists compact FSS for a class of structured sets, then there exists a semi-honest PSI protocol that supports this class of input sets, with communication cost proportional only to the FSS share size.
Several prior protocols for efficient (plain) PSI can be viewed as special cases of our new paradigm, with an implicit FSS for unstructured sets.
Our PSI protocol can be instantiated from a significantly weaker flavor of FSS, which has not been previously studied.
We develop several improved FSS techniques that take advantage of these relaxed requirements, and which are in some cases exponentially better than existing FSS.
Finally, we explore in depth a natural application of structure-aware PSI.
If Alice\u27s set is the union of many radius- balls in some metric space, then an intersection between and corresponds to fuzzy PSI, in which the parties learn which of their points are within distance .
In structure-aware PSI, the communication cost scales with the number of balls in Alice\u27s set, rather than their total volume.
Our techniques lead to efficient fuzzy PSI for and metrics (and approximations of metric) in high dimensions.
We implemented this fuzzy PSI protocol for 2-dimensional metrics.
For reasonable input sizes, our protocol requires 45--60% less time and 85% less communication than competing approaches that simply reduce the problem to plain PSI
The Communication Complexity of Threshold Private Set Intersection
Threshold private set intersection enables Alice and Bob who hold sets and of size to compute the intersection if the sets do not differ by more than some threshold parameter .
In this work, we investigate the communication complexity of this problem and we establish the first upper and lower bounds.
We show that any protocol has to have a communication complexity of .
We show that an almost matching upper bound of can be obtained via fully homomorphic encryption.
We present a computationally more efficient protocol based on weaker assumptions, namely additively homomorphic encryption, with a communication complexity of .
We show how our protocols can be extended to the multiparty setting.
For applications like biometric authentication, where a given fingerprint has to have a large intersection with a fingerprint from a database, our protocols may result in significant communication savings.
We, furthermore, show how to extend all of our protocols to the multiparty setting.
Prior to this work, all previous protocols had a communication complexity of .
Our protocols are the first ones with communication complexities that mainly depend on the threshold parameter and only logarithmically on the set size
Polysaccharide utilization loci and associated genes in marine Bacteroidetes - compositional diversity and ecological relevance
The synthesis of marine organic carbon compounds by photosynthetic macroalgae, microalgae (phytoplankton) and bacteria provide a basis for life in the ocean. In marine surface waters this primary production is largely dominated by microalgae and is especially pronounced during spring phytoplankton blooms. During and after these often diatom-dominated blooms, increased amounts of organic matter are released into the surrounding waters. Here, the organic matter, rich in polysaccharides, can trigger blooms of heterotrophic bacteria. Marine members of the Bacteroidetes are consistently found related to such bloom events. These bacteria are regularly detected as the first responders to thrive after phytoplankton spring blooms in temperate coastal regions and are often equipped with a variety of polysaccharide utilization gene clusters. These gene clusters, termed polysaccharide utilization loci (PULs), encode enzymes for the extracellular hydrolysis of polysaccharides and the subsequent uptake of oligosaccharides into the periplasm, where they are shielded from competing bacteria. This mechanism allows for rapid uptake and substrate hoarding, and thus could be one reason why Bacteroidetes are often seen as the first responders of the bacterioplankton community. The investigation of the so far largely unknown diversity and the ecological relevance of PULs in marine Bacteroidetes was the major goal of the work presented here. We could show that genomes of Bacteroidetes isolates from the North Sea, with free-living to micro- and macro-algae associated lifestyles, harboured a variety of these loci predicted to target in total 18 different substrate classes. Overall PUL repertoires of these isolates showed considerable intra-genus and inter-genus, variations suggesting that Bacteroidetes species harbour distinct glycan niches, independent of their phylogenetic relationships. By investigating the PUL repertoires of uncultured free-living Bacteroidetes during three consecutive years of spring phytoplankton blooms at the North Sea island of Helgoland, I could further reveal that the set of targeted substrates during these bloom events was dominated by only five of the substrate classes targeted by the isolates. These were the diatom storage polysaccharide laminarin, alpha-glucans, alginates, as well as substrates rich in alpha-mannans and sulfated xylans. In addition to this constrained set of substrate classes targeted by the free-living Bacteroidetes community, I could show that the species diversity during these blooms was limited and dominated by only 27 abundant and recurrent species that carried a limited number of abundant PULs. The majority of these PULs were targeting laminarin and alpha-glucan substrates, which were likely targeted during the entire time of the blooms. The less frequent PULs, targeting alpha-mannans and sulfated xylans, were predominantly detected during mid- and late- bloom phases, suggesting a relevance of these two substrate classes in the later phases of phytoplankton blooms. Overall these findings highlight the recurrence of a few specialized Bacteroidetes species and the environmental relevance of specific polysaccharide substrate classes during spring phytoplankton blooms. However, for some of these substrate classes the origin, structural details and their abundance during blooms are as yet largely unknown. To further shed light on the polysaccharide niches of abundant key-players, these findings can serve as a guide for future laboratory studies
- …