95 research outputs found

    Private set intersection: A systematic literature review

    Get PDF
    Secure Multi-party Computation (SMPC) is a family of protocols which allow some parties to compute a function on their private inputs, obtaining the output at the end and nothing more. In this work, we focus on a particular SMPC problem named Private Set Intersection (PSI). The challenge in PSI is how two or more parties can compute the intersection of their private input sets, while the elements that are not in the intersection remain private. This problem has attracted the attention of many researchers because of its wide variety of applications, contributing to the proliferation of many different approaches. Despite that, current PSI protocols still require heavy cryptographic assumptions that may be unrealistic in some scenarios. In this paper, we perform a Systematic Literature Review of PSI solutions, with the objective of analyzing the main scenarios where PSI has been studied and giving the reader a general taxonomy of the problem together with a general understanding of the most common tools used to solve it. We also analyze the performance using different metrics, trying to determine if PSI is mature enough to be used in realistic scenarios, identifying the pros and cons of each protocol and the remaining open problems.This work has been partially supported by the projects: BIGPrivDATA (UMA20-FEDERJA-082) from the FEDER Andalucía 2014– 2020 Program and SecTwin 5.0 funded by the Ministry of Science and Innovation, Spain, and the European Union (Next Generation EU) (TED2021-129830B-I00). The first author has been funded by the Spanish Ministry of Education under the National F.P.U. Program (FPU19/01118). Funding for open access charge: Universidad de Málaga/CBU

    Secure and Efficient Comparisons between Untrusted Parties

    Get PDF
    A vast number of online services is based on users contributing their personal information. Examples are manifold, including social networks, electronic commerce, sharing websites, lodging platforms, and genealogy. In all cases user privacy depends on a collective trust upon all involved intermediaries, like service providers, operators, administrators or even help desk staff. A single adversarial party in the whole chain of trust voids user privacy. Even more, the number of intermediaries is ever growing. Thus, user privacy must be preserved at every time and stage, independent of the intrinsic goals any involved party. Furthermore, next to these new services, traditional offline analytic systems are replaced by online services run in large data centers. Centralized processing of electronic medical records, genomic data or other health-related information is anticipated due to advances in medical research, better analytic results based on large amounts of medical information and lowered costs. In these scenarios privacy is of utmost concern due to the large amount of personal information contained within the centralized data. We focus on the challenge of privacy-preserving processing on genomic data, specifically comparing genomic sequences. The problem that arises is how to efficiently compare private sequences of two parties while preserving confidentiality of the compared data. It follows that the privacy of the data owner must be preserved, which means that as little information as possible must be leaked to any party participating in the comparison. Leakage can happen at several points during a comparison. The secured inputs for the comparing party might leak some information about the original input, or the output might leak information about the inputs. In the latter case, results of several comparisons can be combined to infer information about the confidential input of the party under observation. Genomic sequences serve as a use-case, but the proposed solutions are more general and can be applied to the generic field of privacy-preserving comparison of sequences. The solution should be efficient such that performing a comparison yields runtimes linear in the length of the input sequences and thus producing acceptable costs for a typical use-case. To tackle the problem of efficient, privacy-preserving sequence comparisons, we propose a framework consisting of three main parts. a) The basic protocol presents an efficient sequence comparison algorithm, which transforms a sequence into a set representation, allowing to approximate distance measures over input sequences using distance measures over sets. The sets are then represented by an efficient data structure - the Bloom filter -, which allows evaluation of certain set operations without storing the actual elements of the possibly large set. This representation yields low distortion for comparing similar sequences. Operations upon the set representation are carried out using efficient, partially homomorphic cryptographic systems for data confidentiality of the inputs. The output can be adjusted to either return the actual approximated distance or the result of an in-range check of the approximated distance. b) Building upon this efficient basic protocol we introduce the first mechanism to reduce the success of inference attacks by detecting and rejecting similar queries in a privacy-preserving way. This is achieved by generating generalized commitments for inputs. This generalization is done by treating inputs as messages received from a noise channel, upon which error-correction from coding theory is applied. This way similar inputs are defined as inputs having a hamming distance of their generalized inputs below a certain predefined threshold. We present a protocol to perform a zero-knowledge proof to assess if the generalized input is indeed a generalization of the actual input. Furthermore, we generalize a very efficient inference attack on privacy-preserving sequence comparison protocols and use it to evaluate our inference-control mechanism. c) The third part of the framework lightens the computational load of the client taking part in the comparison protocol by presenting a compression mechanism for partially homomorphic cryptographic schemes. It reduces the transmission and storage overhead induced by the semantically secure homomorphic encryption schemes, as well as encryption latency. The compression is achieved by constructing an asymmetric stream cipher such that the generated ciphertext can be converted into a ciphertext of an associated homomorphic encryption scheme without revealing any information about the plaintext. This is the first compression scheme available for partially homomorphic encryption schemes. Compression of ciphertexts of fully homomorphic encryption schemes are several orders of magnitude slower at the conversion from the transmission ciphertext to the homomorphically encrypted ciphertext. Indeed our compression scheme achieves optimal conversion performance. It further allows to generate keystreams offline and thus supports offloading to trusted devices. This way transmission-, storage- and power-efficiency is improved. We give security proofs for all relevant parts of the proposed protocols and algorithms to evaluate their security. A performance evaluation of the core components demonstrates the practicability of our proposed solutions including a theoretical analysis and practical experiments to show the accuracy as well as efficiency of approximations and probabilistic algorithms. Several variations and configurations to detect similar inputs are studied during an in-depth discussion of the inference-control mechanism. A human mitochondrial genome database is used for the practical evaluation to compare genomic sequences and detect similar inputs as described by the use-case. In summary we show that it is indeed possible to construct an efficient and privacy-preserving (genomic) sequences comparison, while being able to control the amount of information that leaves the comparison. To the best of our knowledge we also contribute to the field by proposing the first efficient privacy-preserving inference detection and control mechanism, as well as the first ciphertext compression system for partially homomorphic cryptographic systems

    Privacy-aware Security Applications in the Era of Internet of Things

    Get PDF
    In this dissertation, we introduce several novel privacy-aware security applications. We split these contributions into three main categories: First, to strengthen the current authentication mechanisms, we designed two novel privacy-aware alternative complementary authentication mechanisms, Continuous Authentication (CA) and Multi-factor Authentication (MFA). Our first system is Wearable-assisted Continuous Authentication (WACA), where we used the sensor data collected from a wrist-worn device to authenticate users continuously. Then, we improved WACA by integrating a noise-tolerant template matching technique called NTT-Sec to make it privacy-aware as the collected data can be sensitive. We also designed a novel, lightweight, Privacy-aware Continuous Authentication (PACA) protocol. PACA is easily applicable to other biometric authentication mechanisms when feature vectors are represented as fixed-length real-valued vectors. In addition to CA, we also introduced a privacy-aware multi-factor authentication method, called PINTA. In PINTA, we used fuzzy hashing and homomorphic encryption mechanisms to protect the users\u27 sensitive profiles while providing privacy-preserving authentication. For the second privacy-aware contribution, we designed a multi-stage privacy attack to smart home users using the wireless network traffic generated during the communication of the devices. The attack works even on the encrypted data as it is only using the metadata of the network traffic. Moreover, we also designed a novel solution based on the generation of spoofed traffic. Finally, we introduced two privacy-aware secure data exchange mechanisms, which allow sharing the data between multiple parties (e.g., companies, hospitals) while preserving the privacy of the individual in the dataset. These mechanisms were realized with the combination of Secure Multiparty Computation (SMC) and Differential Privacy (DP) techniques. In addition, we designed a policy language, called Curie Policy Language (CPL), to handle the conflicting relationships among parties. The novel methods, attacks, and countermeasures in this dissertation were verified with theoretical analysis and extensive experiments with real devices and users. We believe that the research in this dissertation has far-reaching implications on privacy-aware alternative complementary authentication methods, smart home user privacy research, as well as the privacy-aware and secure data exchange methods

    Structure-Aware Private Set Intersection, With Applications to Fuzzy Matching

    Get PDF
    In two-party private set intersection (PSI), Alice holds a set XX, Bob holds a set YY, and they learn (only) the contents of XYX \cap Y. We introduce structure-aware PSI protocols, which take advantage of situations where Alice\u27s set XX is publicly known to have a certain structure. The goal of structure-aware PSI is to have communication that scales with the description size of Alice\u27s set, rather its cardinality. We introduce a new generic paradigm for structure-aware PSI based on function secret-sharing (FSS). In short, if there exists compact FSS for a class of structured sets, then there exists a semi-honest PSI protocol that supports this class of input sets, with communication cost proportional only to the FSS share size. Several prior protocols for efficient (plain) PSI can be viewed as special cases of our new paradigm, with an implicit FSS for unstructured sets. Our PSI protocol can be instantiated from a significantly weaker flavor of FSS, which has not been previously studied. We develop several improved FSS techniques that take advantage of these relaxed requirements, and which are in some cases exponentially better than existing FSS. Finally, we explore in depth a natural application of structure-aware PSI. If Alice\u27s set XX is the union of many radius-δ\delta balls in some metric space, then an intersection between XX and YY corresponds to fuzzy PSI, in which the parties learn which of their points are within distance δ\delta. In structure-aware PSI, the communication cost scales with the number of balls in Alice\u27s set, rather than their total volume. Our techniques lead to efficient fuzzy PSI for \ell_\infty and 1\ell_1 metrics (and approximations of 2\ell_2 metric) in high dimensions. We implemented this fuzzy PSI protocol for 2-dimensional \ell_\infty metrics. For reasonable input sizes, our protocol requires 45--60% less time and 85% less communication than competing approaches that simply reduce the problem to plain PSI

    The Communication Complexity of Threshold Private Set Intersection

    Get PDF
    Threshold private set intersection enables Alice and Bob who hold sets AA and BB of size nn to compute the intersection ABA \cap B if the sets do not differ by more than some threshold parameter tt. In this work, we investigate the communication complexity of this problem and we establish the first upper and lower bounds. We show that any protocol has to have a communication complexity of Ω(t)\Omega(t). We show that an almost matching upper bound of O~(t)\tilde{\mathcal{O}}(t) can be obtained via fully homomorphic encryption. We present a computationally more efficient protocol based on weaker assumptions, namely additively homomorphic encryption, with a communication complexity of O~(t2)\tilde{\mathcal{O}}(t^2). We show how our protocols can be extended to the multiparty setting. For applications like biometric authentication, where a given fingerprint has to have a large intersection with a fingerprint from a database, our protocols may result in significant communication savings. We, furthermore, show how to extend all of our protocols to the multiparty setting. Prior to this work, all previous protocols had a communication complexity of Ω(n)\Omega(n). Our protocols are the first ones with communication complexities that mainly depend on the threshold parameter tt and only logarithmically on the set size nn

    Polysaccharide utilization loci and associated genes in marine Bacteroidetes - compositional diversity and ecological relevance

    Get PDF
    The synthesis of marine organic carbon compounds by photosynthetic macroalgae, microalgae (phytoplankton) and bacteria provide a basis for life in the ocean. In marine surface waters this primary production is largely dominated by microalgae and is especially pronounced during spring phytoplankton blooms. During and after these often diatom-dominated blooms, increased amounts of organic matter are released into the surrounding waters. Here, the organic matter, rich in polysaccharides, can trigger blooms of heterotrophic bacteria. Marine members of the Bacteroidetes are consistently found related to such bloom events. These bacteria are regularly detected as the first responders to thrive after phytoplankton spring blooms in temperate coastal regions and are often equipped with a variety of polysaccharide utilization gene clusters. These gene clusters, termed polysaccharide utilization loci (PULs), encode enzymes for the extracellular hydrolysis of polysaccharides and the subsequent uptake of oligosaccharides into the periplasm, where they are shielded from competing bacteria. This mechanism allows for rapid uptake and substrate hoarding, and thus could be one reason why Bacteroidetes are often seen as the first responders of the bacterioplankton community. The investigation of the so far largely unknown diversity and the ecological relevance of PULs in marine Bacteroidetes was the major goal of the work presented here. We could show that genomes of Bacteroidetes isolates from the North Sea, with free-living to micro- and macro-algae associated lifestyles, harboured a variety of these loci predicted to target in total 18 different substrate classes. Overall PUL repertoires of these isolates showed considerable intra-genus and inter-genus, variations suggesting that Bacteroidetes species harbour distinct glycan niches, independent of their phylogenetic relationships. By investigating the PUL repertoires of uncultured free-living Bacteroidetes during three consecutive years of spring phytoplankton blooms at the North Sea island of Helgoland, I could further reveal that the set of targeted substrates during these bloom events was dominated by only five of the substrate classes targeted by the isolates. These were the diatom storage polysaccharide laminarin, alpha-glucans, alginates, as well as substrates rich in alpha-mannans and sulfated xylans. In addition to this constrained set of substrate classes targeted by the free-living Bacteroidetes community, I could show that the species diversity during these blooms was limited and dominated by only 27 abundant and recurrent species that carried a limited number of abundant PULs. The majority of these PULs were targeting laminarin and alpha-glucan substrates, which were likely targeted during the entire time of the blooms. The less frequent PULs, targeting alpha-mannans and sulfated xylans, were predominantly detected during mid- and late- bloom phases, suggesting a relevance of these two substrate classes in the later phases of phytoplankton blooms. Overall these findings highlight the recurrence of a few specialized Bacteroidetes species and the environmental relevance of specific polysaccharide substrate classes during spring phytoplankton blooms. However, for some of these substrate classes the origin, structural details and their abundance during blooms are as yet largely unknown. To further shed light on the polysaccharide niches of abundant key-players, these findings can serve as a guide for future laboratory studies
    corecore