104 research outputs found

    The Crypto-democracy and the Trustworthy

    Full text link
    In the current architecture of the Internet, there is a strong asymmetry in terms of power between the entities that gather and process personal data (e.g., major Internet companies, telecom operators, cloud providers, ...) and the individuals from which this personal data is issued. In particular, individuals have no choice but to blindly trust that these entities will respect their privacy and protect their personal data. In this position paper, we address this issue by proposing an utopian crypto-democracy model based on existing scientific achievements from the field of cryptography. More precisely, our main objective is to show that cryptographic primitives, including in particular secure multiparty computation, offer a practical solution to protect privacy while minimizing the trust assumptions. In the crypto-democracy envisioned, individuals do not have to trust a single physical entity with their personal data but rather their data is distributed among several institutions. Together these institutions form a virtual entity called the Trustworthy that is responsible for the storage of this data but which can also compute on it (provided first that all the institutions agree on this). Finally, we also propose a realistic proof-of-concept of the Trustworthy, in which the roles of institutions are played by universities. This proof-of-concept would have an important impact in demonstrating the possibilities offered by the crypto-democracy paradigm.Comment: DPM 201

    MedCo: Enabling Secure and Privacy-Preserving Exploration of Distributed Clinical and Genomic Data

    Get PDF
    The increasing number of health-data breaches is creating a complicated environment for medical-data sharing and, consequently, for medical progress. Therefore, the development of new solutions that can reassure clinical sites by enabling privacy-preserving sharing of sensitive medical data in compliance with stringent regulations (e.g., HIPAA, GDPR) is now more urgent than ever. In this work, we introduce MedCo, the first operational system that enables a group of clinical sites to federate and collectively protect their data in order to share them with external investigators without worrying about security and privacy concerns. MedCo uses (a) collective homomorphic encryption to provide trust decentralization and end-to-end confidentiality protection, and (b) obfuscation techniques to achieve formal notions of privacy, such as differential privacy. A critical feature of MedCo is that it is fully integrated within the i2b2 (Informatics for Integrating Biology and the Bedside) framework, currently used in more than 300 hospitals worldwide. Therefore, it is easily adoptable by clinical sites. We demonstrate MedCo’s practicality by testing it on data from The Cancer Genome Atlas in a simulated network of three institutions. Its performance is comparable to the ones of SHRINE (networked i2b2), which, in contrast, does not provide any data protection guarantee

    Exploring Machine Learning Models for Federated Learning: A Review of Approaches, Performance, and Limitations

    Full text link
    In the growing world of artificial intelligence, federated learning is a distributed learning framework enhanced to preserve the privacy of individuals' data. Federated learning lays the groundwork for collaborative research in areas where the data is sensitive. Federated learning has several implications for real-world problems. In times of crisis, when real-time decision-making is critical, federated learning allows multiple entities to work collectively without sharing sensitive data. This distributed approach enables us to leverage information from multiple sources and gain more diverse insights. This paper is a systematic review of the literature on privacy-preserving machine learning in the last few years based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. Specifically, we have presented an extensive review of supervised/unsupervised machine learning algorithms, ensemble methods, meta-heuristic approaches, blockchain technology, and reinforcement learning used in the framework of federated learning, in addition to an overview of federated learning applications. This paper reviews the literature on the components of federated learning and its applications in the last few years. The main purpose of this work is to provide researchers and practitioners with a comprehensive overview of federated learning from the machine learning point of view. A discussion of some open problems and future research directions in federated learning is also provided

    GenoPPML – a framework for genomic privacy-preserving machine learning

    Get PDF
    We present a framework GenoPPML for privacy-preserving machine learning in the context of sensitive genomic data processing. The technology combines secure multiparty computation techniques based on the recently proposed Manticore secure multiparty computation framework for model training and fully homomorphic encryption based on TFHE for model inference. The framework was successfully used to solve breast cancer prediction problems on gene expression datasets coming from distinct private sources while preserving their privacy - the solution winning 1st place for both Tracks I and III of the genomic privacy competition iDASH\u272020. Extensive benchmarks and comparisons to existing works are performed. Our 2-party logistic regression computation is 11×11\times faster than the one in De Cock et al. on the same dataset and it uses only a single CPU core

    Harnessing the Power of Distributed Computing: Advancements in Scientific Applications, Homomorphic Encryption, and Federated Learning Security

    Get PDF
    Data explosion poses lot of challenges to the state-of-the art systems, applications, and methodologies. It has been reported that 181 zettabytes of data are expected to be generated in 2025 which is over 150\% increase compared to the data that is expected to be generated in 2023. However, while system manufacturers are consistently developing devices with larger storage spaces and providing alternative storage capacities in the cloud at affordable rates, another key challenge experienced is how to effectively process the fraction of large scale of stored data in time-critical conventional systems. One transformative paradigm revolutionizing the processing and management of these large data is distributed computing whose application requires deep understanding. This dissertation focuses on exploring the potential impact of applying efficient distributed computing concepts to long existing challenges or issues in (i) a widely data-intensive scientific application (ii) applying homomorphic encryption to data intensive workloads found in outsourced databases and (iii) security of tokenized incentive mechanism for Federated learning (FL) systems.The first part of the dissertation tackles the Microelectrode arrays (MEAs) parameterization problem from an orthogonal viewpoint enlightened by algebraic topology, which allows us to algebraically parametrize MEAs whose structure and intrinsic parallelism are hard to identify otherwise. We implement a new paradigm, namely Parma, to demonstrate the effectiveness of the proposed approach and report how it outperforms the state-of-the-practice in time, scalability, and memory usage.The second part discusses our work on introducing the concept of parallel caching of secure aggregation to mitigate the performance overhead incurred by the HE module in outsourced databases. The key idea of this optimization approach is caching selected radix-ciphertexts in parallel without violating existing security guarantees of the primitive/base HE scheme. A new radix HE algorithm was designed and applied to both batch and incremental HE schemes, and experiments carried out on six workloads show that the proposed caching boost state-of-the-art HE schemes by high orders of magnitudes.In the third part, I will discuss our work on leveraging the security benefit of blockchains to enhance or protect the fairness and reliability of tokenized incentive mechanism for FL systems. We designed a blockchain-based auditing protocol to mitigate Gaussian attacks and carried out experiments with multiple FL aggregation algorithms, popular data sets and a variety of scales to validate its effectiveness

    Verifiable Encodings for Secure Homomorphic Analytics

    Full text link
    Homomorphic encryption, which enables the execution of arithmetic operations directly on ciphertexts, is a promising solution for protecting privacy of cloud-delegated computations on sensitive data. However, the correctness of the computation result is not ensured. We propose two error detection encodings and build authenticators that enable practical client-verification of cloud-based homomorphic computations under different trade-offs and without compromising on the features of the encryption algorithm. Our authenticators operate on top of trending ring learning with errors based fully homomorphic encryption schemes over the integers. We implement our solution in VERITAS, a ready-to-use system for verification of outsourced computations executed over encrypted data. We show that contrary to prior work VERITAS supports verification of any homomorphic operation and we demonstrate its practicality for various applications, such as ride-hailing, genomic-data analysis, encrypted search, and machine-learning training and inference.Comment: update authors, typos corrected, scheme update

    Revealing the Landscape of Privacy-Enhancing Technologies in the Context of Data Markets for the IoT: A Systematic Literature Review

    Get PDF
    IoT data markets in public and private institutions have become increasingly relevant in recent years because of their potential to improve data availability and unlock new business models. However, exchanging data in markets bears considerable challenges related to disclosing sensitive information. Despite considerable research focused on different aspects of privacy-enhancing data markets for the IoT, none of the solutions proposed so far seems to find a practical adoption. Thus, this study aims to organize the state-of-the-art solutions, analyze and scope the technologies that have been suggested in this context, and structure the remaining challenges to determine areas where future research is required. To accomplish this goal, we conducted a systematic literature review on privacy enhancement in data markets for the IoT, covering 50 publications dated up to July 2020, and provided updates with 24 publications dated up to May 2022. Our results indicate that most research in this area has emerged only recently, and no IoT data market architecture has established itself as canonical. Existing solutions frequently lack the required combination of anonymization and secure computation technologies. Furthermore, there is no consensus on the appropriate use of blockchain technology for IoT data markets and a low degree of leveraging existing libraries or reusing generic data market architectures. We also identified significant challenges remaining, such as the copy problem and the recursive enforcement problem that-while solutions have been suggested to some extent-are often not sufficiently addressed in proposed designs. We conclude that privacy-enhancing technologies need further improvements to positively impact data markets so that, ultimately, the value of data is preserved through data scarcity and users' privacy and businesses-critical information are protected.Comment: 49 pages, 17 figures, 11 table

    Enabling privacy-preserving sharing of genomic data for GWASs in decentralized networks

    Get PDF
    The human genome can reveal sensitive information and is potentially re-identifiable, which raises privacy and security concerns about sharing such data on wide scales. In this work, we propose a preventive approach for privacy-preserving sharing of genomic data in decentralized networks for Genome-wide association studies (GWASs), which have been widely used in discovering the association between genotypes and phenotypes. The key components of this work are: a decentralized secure network, with a privacy-preserving sharing protocol, and a gene fragmentation framework that is trainable in an end-to-end manner. Our experiments on real datasets show the effectiveness of our privacy-preserving approaches as well as significant improvements in efficiency when compared with recent, related algorithms
    corecore