464 research outputs found

    Privacy-Preserving Record Linkage for Big Data: Current Approaches and Research Challenges

    Get PDF
    The growth of Big Data, especially personal data dispersed in multiple data sources, presents enormous opportunities and insights for businesses to explore and leverage the value of linked and integrated data. However, privacy concerns impede sharing or exchanging data for linkage across different organizations. Privacy-preserving record linkage (PPRL) aims to address this problem by identifying and linking records that correspond to the same real-world entity across several data sources held by different parties without revealing any sensitive information about these entities. PPRL is increasingly being required in many real-world application areas. Examples range from public health surveillance to crime and fraud detection, and national security. PPRL for Big Data poses several challenges, with the three major ones being (1) scalability to multiple large databases, due to their massive volume and the flow of data within Big Data applications, (2) achieving high quality results of the linkage in the presence of variety and veracity of Big Data, and (3) preserving privacy and confidentiality of the entities represented in Big Data collections. In this chapter, we describe the challenges of PPRL in the context of Big Data, survey existing techniques for PPRL, and provide directions for future research.This work was partially funded by the Australian Research Council under Discovery Project DP130101801, the German Academic Exchange Service (DAAD) and Universities Australia (UA) under the Joint Research Co-operation Scheme, and also funded by the German Federal Ministry of Education and Research within the project Competence Center for Scalable Data Services and Solutions (ScaDS) Dresden/Leipzig (BMBF 01IS14014B)

    A Scalable Blocking Framework for Multidatabase Privacy-preserving Record Linkage

    No full text
    Today many application domains, such as national statistics, healthcare, business analytic, fraud detection, and national security, require data to be integrated from multiple databases. Record linkage (RL) is a process used in data integration which links multiple databases to identify matching records that belong to the same entity. RL enriches the usefulness of data by removing duplicates, errors, and inconsistencies which improves the effectiveness of decision making in data analytic applications. Often, organisations are not willing or authorised to share the sensitive information in their databases with any other party due to privacy and confidentiality regulations. The linkage of databases of different organisations is an emerging research area known as privacy-preserving record linkage (PPRL). PPRL facilitates the linkage of databases by ensuring the privacy of the entities in these databases. In multidatabase (MD) context, PPRL is significantly challenged by the intrinsic exponential growth in the number of potential record pair comparisons. Such linkage often requires significant time and computational resources to produce the resulting matching sets of records. Due to increased risk of collusion, preserving the privacy of the data is more problematic with an increase of number of parties involved in the linkage process. Blocking is commonly used to scale the linkage of large databases. The aim of blocking is to remove those record pairs that correspond to non-matches (refer to different entities). Many techniques have been proposed for RL and PPRL for blocking two databases. However, many of these techniques are not suitable for blocking multiple databases. This creates a need to develop blocking technique for the multidatabase linkage context as real-world applications increasingly require more than two databases. This thesis is the first to conduct extensive research on blocking for multidatabase privacy-preserved record linkage (MD-PPRL). We consider several research problems in blocking of MD-PPRL. First, we start with a broad background literature on PPRL. This allow us to identify the main research gaps that need to be investigated in MD-PPRL. Second, we introduce a blocking framework for MD-PPRL which provides more flexibility and control to database owners in the block generation process. Third, we propose different techniques that are used in our framework for (1) blocking of multiple databases, (2) identifying blocks that need to be compared across subgroups of these databases, and (3) filtering redundant record pair comparisons by the efficient scheduling of block comparisons to improve the scalability of MD-PPRL. Each of these techniques covers an important aspect of blocking in real-world MD-PPRL applications. Finally, this thesis reports on an extensive evaluation of the combined application of these methods with real datasets, which illustrates that they outperform existing approaches in term of scalability, accuracy, and privacy

    Contextual Social Networking

    Get PDF
    The thesis centers around the multi-faceted research question of how contexts may be detected and derived that can be used for new context aware Social Networking services and for improving the usefulness of existing Social Networking services, giving rise to the notion of Contextual Social Networking. In a first foundational part, we characterize the closely related fields of Contextual-, Mobile-, and Decentralized Social Networking using different methods and focusing on different detailed aspects. A second part focuses on the question of how short-term and long-term social contexts as especially interesting forms of context for Social Networking may be derived. We focus on NLP based methods for the characterization of social relations as a typical form of long-term social contexts and on Mobile Social Signal Processing methods for deriving short-term social contexts on the basis of geometry of interaction and audio. We furthermore investigate, how personal social agents may combine such social context elements on various levels of abstraction. The third part discusses new and improved context aware Social Networking service concepts. We investigate special forms of awareness services, new forms of social information retrieval, social recommender systems, context aware privacy concepts and services and platforms supporting Open Innovation and creative processes. This version of the thesis does not contain the included publications because of copyrights of the journals etc. Contact in terms of the version with all included publications: Georg Groh, [email protected] zentrale Gegenstand der vorliegenden Arbeit ist die vielschichtige Frage, wie Kontexte detektiert und abgeleitet werden können, die dazu dienen können, neuartige kontextbewusste Social Networking Dienste zu schaffen und bestehende Dienste in ihrem Nutzwert zu verbessern. Die (noch nicht abgeschlossene) erfolgreiche Umsetzung dieses Programmes fĂĽhrt auf ein Konzept, das man als Contextual Social Networking bezeichnen kann. In einem grundlegenden ersten Teil werden die eng zusammenhängenden Gebiete Contextual Social Networking, Mobile Social Networking und Decentralized Social Networking mit verschiedenen Methoden und unter Fokussierung auf verschiedene Detail-Aspekte näher beleuchtet und in Zusammenhang gesetzt. Ein zweiter Teil behandelt die Frage, wie soziale Kurzzeit- und Langzeit-Kontexte als fĂĽr das Social Networking besonders interessante Formen von Kontext gemessen und abgeleitet werden können. Ein Fokus liegt hierbei auf NLP Methoden zur Charakterisierung sozialer Beziehungen als einer typischen Form von sozialem Langzeit-Kontext. Ein weiterer Schwerpunkt liegt auf Methoden aus dem Mobile Social Signal Processing zur Ableitung sinnvoller sozialer Kurzzeit-Kontexte auf der Basis von Interaktionsgeometrien und Audio-Daten. Es wird ferner untersucht, wie persönliche soziale Agenten Kontext-Elemente verschiedener Abstraktionsgrade miteinander kombinieren können. Der dritte Teil behandelt neuartige und verbesserte Konzepte fĂĽr kontextbewusste Social Networking Dienste. Es werden spezielle Formen von Awareness Diensten, neue Formen von sozialem Information Retrieval, Konzepte fĂĽr kontextbewusstes Privacy Management und Dienste und Plattformen zur UnterstĂĽtzung von Open Innovation und Kreativität untersucht und vorgestellt. Diese Version der Habilitationsschrift enthält die inkludierten Publikationen zurVermeidung von Copyright-Verletzungen auf Seiten der Journals u.a. nicht. Kontakt in Bezug auf die Version mit allen inkludierten Publikationen: Georg Groh, [email protected]

    Semantic discovery and reuse of business process patterns

    Get PDF
    Patterns currently play an important role in modern information systems (IS) development and their use has mainly been restricted to the design and implementation phases of the development lifecycle. Given the increasing significance of business modelling in IS development, patterns have the potential of providing a viable solution for promoting reusability of recurrent generalized models in the very early stages of development. As a statement of research-in-progress this paper focuses on business process patterns and proposes an initial methodological framework for the discovery and reuse of business process patterns within the IS development lifecycle. The framework borrows ideas from the domain engineering literature and proposes the use of semantics to drive both the discovery of patterns as well as their reuse

    Early Modern Privacy

    Get PDF
    An examination of instances, experiences, and spaces of early modern privacy. It opens new avenues to understanding the structures and dynamics that shape early modern societies through examination of a wide array of sources, discourses, practices, and spatial programmes.; Readership: Because of its comprehensive disciplinary scope, this volume is of interest to scholars and students of early modern culture in all its facets. Keywords: early modern, intimacy, legal history, religious history, history of art, history of architecture, secrecy, theology, ego-documents, history of science, literary studies, China, Europe, private life, privacy, Jewish history, theory

    Analyzing and Applying Cryptographic Mechanisms to Protect Privacy in Applications

    Get PDF
    Privacy-Enhancing Technologies (PETs) emerged as a technology-based response to the increased collection and storage of data as well as the associated threats to individuals' privacy in modern applications. They rely on a variety of cryptographic mechanisms that allow to perform some computation without directly obtaining knowledge of plaintext information. However, many challenges have so far prevented effective real-world usage in many existing applications. For one, some mechanisms leak some information or have been proposed outside of security models established within the cryptographic community, leaving open how effective they are at protecting privacy in various applications. Additionally, a major challenge causing PETs to remain largely academic is their practicality-in both efficiency and usability. Cryptographic mechanisms introduce a lot of overhead, which is mostly prohibitive, and due to a lack of high-level tools are very hard to integrate for outsiders. In this thesis, we move towards making PETs more effective and practical in protecting privacy in numerous applications. We take a two-sided approach of first analyzing the effective security (cryptanalysis) of candidate mechanisms and then building constructions and tools (cryptographic engineering) for practical use in specified emerging applications in the domain of machine learning crucial to modern use cases. In the process, we incorporate an interdisciplinary perspective for analyzing mechanisms and by collaboratively building privacy-preserving architectures with requirements from the application domains' experts. Cryptanalysis. While mechanisms like Homomorphic Encryption (HE) or Secure Multi-Party Computation (SMPC) provably leak no additional information, Encrypted Search Algorithms (ESAs) and Randomization-only Two-Party Computation (RoTPC) possess additional properties that require cryptanalysis to determine effective privacy protection. ESAs allow for search on encrypted data, an important functionality in many applications. Most efficient ESAs possess some form of well-defined information leakage, which is cryptanalyzed via a breadth of so-called leakage attacks proposed in the literature. However, it is difficult to assess their practical effectiveness given that previous evaluations were closed-source, used restricted data, and made assumptions about (among others) the query distribution because real-world query data is very hard to find. For these reasons, we re-implement known leakage attacks in an open-source framework and perform a systematic empirical re-evaluation of them using a variety of new data sources that, for the first time, contain real-world query data. We obtain many more complete and novel results where attacks work much better or much worse than what was expected based on previous evaluations. RoTPC mechanisms require cryptanalysis as they do not rely on established techniques and security models, instead obfuscating messages using only randomizations. A prominent protocol is a privacy-preserving scalar product protocol by Lu et al. (IEEE TPDS'13). We show that this protocol is formally insecure and that this translates to practical insecurity by presenting attacks that even allow to test for certain inputs, making the case for more scrutiny of RoTPC protocols used as PETs. This part of the thesis is based on the following two publications: [KKM+22] S. KAMARA, A. KATI, T. MOATAZ, T. SCHNEIDER, A. TREIBER, M. YONLI. “SoK: Cryptanalysis of Encrypted Search with LEAKER - A framework for LEakage AttacK Evaluation on Real-world data”. In: 7th IEEE European Symposium on Security and Privacy (EuroS&P’22). Full version: https://ia.cr/2021/1035. Code: https://encrypto.de/code/LEAKER. IEEE, 2022, pp. 90–108. Appendix A. [ST20] T. SCHNEIDER , A. TREIBER. “A Comment on Privacy-Preserving Scalar Product Protocols as proposed in “SPOC””. In: IEEE Transactions on Parallel and Distributed Systems (TPDS) 31.3 (2020). Full version: https://arxiv.org/abs/1906.04862. Code: https://encrypto.de/code/SPOCattack, pp. 543–546. CORE Rank A*. Appendix B. Cryptographic Engineering. Given the above results about cryptanalysis, we investigate using the leakage-free and provably-secure cryptographic mechanisms of HE and SMPC to protect privacy in machine learning applications. As much of the cryptographic community has focused on PETs for neural network applications, we focus on two other important applications and models: Speaker recognition and sum product networks. We particularly show the efficiency of our solutions in possible real-world scenarios and provide tools usable for non-domain experts. In speaker recognition, a user's voice data is matched with reference data stored at the service provider. Using HE and SMPC, we build the first privacy-preserving speaker recognition system that includes the state-of-the-art technique of cohort score normalization using cohort pruning via SMPC. Then, we build a privacy-preserving speaker recognition system relying solely on SMPC, which we show outperforms previous solutions based on HE by a factor of up to 4000x. We show that both our solutions comply with specific standards for biometric information protection and, thus, are effective and practical PETs for speaker recognition. Sum Product Networks (SPNs) are noteworthy probabilistic graphical models that-like neural networks-also need efficient methods for privacy-preserving inference as a PET. We present CryptoSPN, which uses SMPC for privacy-preserving inference of SPNs that (due to a combination of machine learning and cryptographic techniques and contrary to most works on neural networks) even hides the network structure. Our implementation is integrated into the prominent SPN framework SPFlow and evaluates medium-sized SPNs within seconds. This part of the thesis is based on the following three publications: [NPT+19] A. NAUTSCH, J. PATINO, A. TREIBER, T. STAFYLAKIS, P. MIZERA, M. TODISCO, T. SCHNEIDER, N. EVANS. Privacy-Preserving Speaker Recognition with Cohort Score Normalisation”. In: 20th Conference of the International Speech Communication Association (INTERSPEECH’19). Online: https://arxiv.org/abs/1907.03454. International Speech Communication Association (ISCA), 2019, pp. 2868–2872. CORE Rank A. Appendix C. [TNK+19] A. TREIBER, A. NAUTSCH , J. KOLBERG , T. SCHNEIDER , C. BUSCH. “Privacy-Preserving PLDA Speaker Verification using Outsourced Secure Computation”. In: Speech Communication 114 (2019). Online: https://encrypto.de/papers/TNKSB19.pdf. Code: https://encrypto.de/code/PrivateASV, pp. 60–71. CORE Rank B. Appendix D. [TMW+20] A. TREIBER , A. MOLINA , C. WEINERT , T. SCHNEIDER , K. KERSTING. “CryptoSPN: Privacy-preserving Sum-Product Network Inference”. In: 24th European Conference on Artificial Intelligence (ECAI’20). Full version: https://arxiv.org/abs/2002.00801. Code: https://encrypto.de/code/CryptoSPN. IOS Press, 2020, pp. 1946–1953. CORE Rank A. Appendix E. Overall, this thesis contributes to a broader security analysis of cryptographic mechanisms and new systems and tools to effectively protect privacy in various sought-after applications
    • …