9 research outputs found

    Oblivious PRF on Committed Vector Inputs and Application to Deduplication of Encrypted Data

    Get PDF
    Ensuring secure deduplication of encrypted data is a very active topic of research because deduplication is effective at reducing storage costs. Schemes supporting deduplication of encrypted data that are not vulnerable to content guessing attacks (such as Message Locked Encryption) have been proposed recently [Bellare et al. 2013, Li et al. 2015]. However in all these schemes, there is a key derivation phase that solely depends on a short hash of the data and not the data itself. Therefore, a file specofic key can be obtained by anyone possessing the hash. Since hash values are usually not meant to be secret, a desired solution will be a more robust oblivious key generation protocol where file hashes need not be kept private. Motivated by this use-case, we propose a new primitive for oblivious pseudorandom function (OPRF) on committed vector inputs in the universal composable (UC) framework. We formalize this functionality as FOOPRF\mathcal{F}_\mathsf{OOPRF}, where OOPRF\mathsf{OOPRF} stands for Ownership-based Oblivious PRF. FOOPRF\mathcal{F}_\mathsf{OOPRF} produces a unique random key on input a vector digest provided the client proves knowledge of a (parametrisable) number of random positions of the input vector. To construct an efficient OOPRF\mathsf{OOPRF} protocol, we carefully combine a hiding vector commitment scheme, a variant of the PRF scheme of Dodis- Yampolskiy [Dodis et al. 2005] and a homomorphic encryption scheme glued together with concrete, efficient instantiations of proofs of knowledge. To the best of our knowledge, our work shows for the first time how these primitives can be combined in a secure, efficient and useful way. We also propose a new vector commitment scheme with constant sized public parameters but (log⁥n)(\log n) size witnesses where n is the length of the committed vector. This can be of independent interest

    SoK: Oblivious Pseudorandom Functions

    Get PDF
    In recent years, oblivious pseudorandom functions (OPRFs) have become a ubiquitous primitive used in cryptographic protocols and privacy-preserving technologies. The growing interest in OPRFs, both theoretical and applied, has produced a vast number of different constructions and functionality variations. In this paper, we provide a systematic overview of how to build and use OPRFs. We first categorize existing OPRFs into essentially four families based on their underlying PRF (Naor-Reingold, Dodis-Yampolskiy, Hashed Diffie-Hellman, and generic constructions). This categorization allows us to give a unified presentation of all oblivious evaluation methods in the literature, and to understand which properties OPRFs can (or cannot) have. We further demonstrate the theoretical and practical power of OPRFs by visualizing them in the landscape of cryptographic primitives, and by providing a comprehensive overview of how OPRFs are leveraged for improving the privacy of internet users. Our work systematizes 15 years of research on OPRFs and provides inspiration for new OPRF constructions and applications thereof

    Secure data storage and retrieval in cloud computing

    Get PDF
    Nowadays cloud computing has been widely recognised as one of the most inuential information technologies because of its unprecedented advantages. In spite of its widely recognised social and economic benefits, in cloud computing customers lose the direct control of their data and completely rely on the cloud to manage their data and computation, which raises significant security and privacy concerns and is one of the major barriers to the adoption of public cloud by many organisations and individuals. Therefore, it is desirable to apply practical security approaches to address the security risks for the wide adoption of cloud computing

    Secure and efficient processing of outsourced data structures using trusted execution environments

    Full text link
    In recent years, more and more companies make use of cloud computing; in other words, they outsource data storage and data processing to a third party, the cloud provider. From cloud computing, the companies expect, for example, cost reductions, fast deployment time, and improved security. However, security also presents a significant challenge as demonstrated by many cloud computing–related data breaches. Whether it is due to failing security measures, government interventions, or internal attackers, data leakages can have severe consequences, e.g., revenue loss, damage to brand reputation, and loss of intellectual property. A valid strategy to mitigate these consequences is data encryption during storage, transport, and processing. Nevertheless, the outsourced data processing should combine the following three properties: strong security, high efficiency, and arbitrary processing capabilities. Many approaches for outsourced data processing based purely on cryptography are available. For instance, encrypted storage of outsourced data, property-preserving encryption, fully homomorphic encryption, searchable encryption, and functional encryption. However, all of these approaches fail in at least one of the three mentioned properties. Besides approaches purely based on cryptography, some approaches use a trusted execution environment (TEE) to process data at a cloud provider. TEEs provide an isolated processing environment for user-defined code and data, i.e., the confidentiality and integrity of code and data processed in this environment are protected against other software and physical accesses. Additionally, TEEs promise efficient data processing. Various research papers use TEEs to protect objects at different levels of granularity. On the one end of the range, TEEs can protect entire (legacy) applications. This approach facilitates the development effort for protected applications as it requires only minor changes. However, the downsides of this approach are that the attack surface is large, it is difficult to capture the exact leakage, and it might not even be possible as the isolated environment of commercially available TEEs is limited. On the other end of the range, TEEs can protect individual, stateless operations, which are called from otherwise unchanged applications. This approach does not suffer from the problems stated before, but it leaks the (encrypted) result of each operation and the detailed control flow through the application. It is difficult to capture the leakage of this approach, because it depends on the processed operation and the operation’s location in the code. In this dissertation, we propose a trade-off between both approaches: the TEE-based processing of data structures. In this approach, otherwise unchanged applications call a TEE for self-contained data structure operations and receive encrypted results. We examine three data structures: TEE-protected B+-trees, TEE-protected database dictionaries, and TEE-protected file systems. Using these data structures, we design three secure and efficient systems: an outsourced system for index searches; an outsourced, dictionary-encoding–based, column-oriented, in-memory database supporting analytic queries on large datasets; and an outsourced system for group file sharing supporting large and dynamic groups. Due to our approach, the systems have a small attack surface, a low likelihood of security-relevant bugs, and a data owner can easily perform a (formal) code verification of the sensitive code. At the same time, we prevent low-level leakage of individual operation results. For all systems, we present a thorough security evaluation showing lower bounds of security. Additionally, we use prototype implementations to present upper bounds on performance. For our implementations, we use a widely available TEE that has a limited isolated environment—Intel Software Guard Extensions. By comparing our systems to related work, we show that they provide a favorable trade-off regarding security and efficiency

    A Taxonomy of Privacy-Preserving Record Linkage Techniques

    Get PDF
    The process of identifying which records in two or more databases correspond to the same entity is an important aspect of data quality activities such as data pre-processing and data integration. Known as record linkage, data matching or entity resolution, this process has attracted interest from researchers in fields such as databases and data warehousing, data mining, information systems, and machine learning. Record linkage has various challenges, including scalability to large databases, accurate matching and classification, and privacy and confidentiality. The latter challenge arises because commonly personal identifying data, such as names, addresses and dates of birth of individuals, are used in the linkage process. When databases are linked across organizations, the issue of how to protect the privacy and confidentiality of such sensitive information is crucial to successful application of record linkage. In this paper we present an overview of techniques that allow the linking of databases between organizations while at the same time preserving the privacy of these data. Known as 'privacy-preserving record linkage' (PPRL), various such techniques have been developed. We present a taxonomy of PPRL techniques to characterize these techniques along 15 dimensions, and conduct a survey of PPRL techniques. We then highlight shortcomings of current techniques and discuss avenues for future research

    Scalable and approximate privacy-preserving record linkage

    No full text
    Record linkage, the task of linking multiple databases with the aim to identify records that refer to the same entity, is occurring increasingly in many application areas. Generally, unique entity identifiers are not available in all the databases to be linked. Therefore, record linkage requires the use of personal identifying attributes, such as names and addresses, to identify matching records that need to be reconciled to the same entity. Often, it is not permissible to exchange personal identifying data across different organizations due to privacy and confidentiality concerns or regulations. This has led to the novel research area of privacy-preserving record linkage (PPRL). PPRL addresses the problem of how to link different databases to identify records that correspond to the same real-world entities, without revealing the identities of these entities or any private or confidential information to any party involved in the process, or to any external party, such as a researcher. The three key challenges that a PPRL solution in a real-world context needs to address are (1) scalability to largedatabases by efficiently conducting linkage; (2) achieving high quality of linkage through the use of approximate (string) matching and effective classification of the compared record pairs into matches (i.e. pairs of records that refer to the same entity) and non-matches (i.e. pairs of records that refer to different entities); and (3) provision of sufficient privacy guarantees such that the interested parties only learn the actual values of certain attributes of the records that were classified as matches, and the process is secure with regard to any internal or external adversary. In this thesis, we present extensive research in PPRL, where we have addressed several gaps and problems identified in existing PPRL approaches. First, we begin the thesis with a review of the literature and we propose a taxonomy of PPRL to characterize existing techniques. This allows us to identify gaps and research directions. In the remainder of the thesis, we address several of the identified shortcomings. One main shortcoming we address is a framework for empirical and comparative evaluation of different PPRL solutions, which has not been studied in the literature so far. Second, we propose several novel algorithms for scalable and approximate PPRL by addressing the three main challenges of PPRL. We propose efficient private blocking techniques, for both three-party and two-party scenarios, based on sorted neighborhood clustering to address the scalability challenge. Following, we propose two efficient two-party techniques for private matching and classification to address the linkage quality challenge in terms of approximate matching and effective classification. Privacy is addressed in these approaches using efficient data perturbation techniques including k-anonymous mapping, reference values, and Bloom filters. Finally, the thesis reports on an extensive comparative evaluation of our proposed solutions with several other state-of-the-art techniques on real-world datasets, which shows that our solutions outperform others in terms of all three key challenges
    corecore