104 research outputs found
The Crypto-democracy and the Trustworthy
In the current architecture of the Internet, there is a strong asymmetry in
terms of power between the entities that gather and process personal data
(e.g., major Internet companies, telecom operators, cloud providers, ...) and
the individuals from which this personal data is issued. In particular,
individuals have no choice but to blindly trust that these entities will
respect their privacy and protect their personal data. In this position paper,
we address this issue by proposing an utopian crypto-democracy model based on
existing scientific achievements from the field of cryptography. More
precisely, our main objective is to show that cryptographic primitives,
including in particular secure multiparty computation, offer a practical
solution to protect privacy while minimizing the trust assumptions. In the
crypto-democracy envisioned, individuals do not have to trust a single physical
entity with their personal data but rather their data is distributed among
several institutions. Together these institutions form a virtual entity called
the Trustworthy that is responsible for the storage of this data but which can
also compute on it (provided first that all the institutions agree on this).
Finally, we also propose a realistic proof-of-concept of the Trustworthy, in
which the roles of institutions are played by universities. This
proof-of-concept would have an important impact in demonstrating the
possibilities offered by the crypto-democracy paradigm.Comment: DPM 201
MedCo: Enabling Secure and Privacy-Preserving Exploration of Distributed Clinical and Genomic Data
The increasing number of health-data breaches is creating a complicated environment for medical-data sharing and, consequently, for medical progress. Therefore, the development of new solutions that can reassure clinical sites by enabling privacy-preserving sharing of sensitive medical data in compliance with stringent regulations (e.g., HIPAA, GDPR) is now more urgent than ever. In this work, we introduce MedCo, the first operational system that enables a group of clinical sites to federate and collectively protect their data in order to share them with external investigators without worrying about security and privacy concerns. MedCo uses (a) collective homomorphic encryption to provide trust decentralization and end-to-end confidentiality protection, and (b) obfuscation techniques to achieve formal notions of privacy, such as differential privacy. A critical feature of MedCo is that it is fully integrated within the i2b2 (Informatics for Integrating Biology and the Bedside) framework, currently used in more than 300 hospitals worldwide. Therefore, it is easily adoptable by clinical sites. We demonstrate MedCo’s practicality by testing it on data from The Cancer Genome Atlas in a simulated network of three institutions. Its performance is comparable to the ones of SHRINE (networked i2b2), which, in contrast, does not provide any data protection guarantee
Exploring Machine Learning Models for Federated Learning: A Review of Approaches, Performance, and Limitations
In the growing world of artificial intelligence, federated learning is a
distributed learning framework enhanced to preserve the privacy of individuals'
data. Federated learning lays the groundwork for collaborative research in
areas where the data is sensitive. Federated learning has several implications
for real-world problems. In times of crisis, when real-time decision-making is
critical, federated learning allows multiple entities to work collectively
without sharing sensitive data. This distributed approach enables us to
leverage information from multiple sources and gain more diverse insights. This
paper is a systematic review of the literature on privacy-preserving machine
learning in the last few years based on the Preferred Reporting Items for
Systematic Reviews and Meta-Analyses (PRISMA) guidelines. Specifically, we have
presented an extensive review of supervised/unsupervised machine learning
algorithms, ensemble methods, meta-heuristic approaches, blockchain technology,
and reinforcement learning used in the framework of federated learning, in
addition to an overview of federated learning applications. This paper reviews
the literature on the components of federated learning and its applications in
the last few years. The main purpose of this work is to provide researchers and
practitioners with a comprehensive overview of federated learning from the
machine learning point of view. A discussion of some open problems and future
research directions in federated learning is also provided
GenoPPML – a framework for genomic privacy-preserving machine learning
We present a framework GenoPPML for privacy-preserving machine learning in the context of sensitive genomic data processing.
The technology combines secure multiparty computation techniques based on the recently proposed Manticore secure multiparty computation framework for model training and fully homomorphic encryption based on TFHE for model inference.
The framework was successfully used to solve breast cancer prediction problems on gene expression datasets coming from distinct private sources while preserving their privacy - the solution winning 1st place for both Tracks I and III of the genomic privacy competition iDASH\u272020.
Extensive benchmarks and comparisons to existing works are performed.
Our 2-party logistic regression computation is faster than the one in De Cock et al. on the same dataset and it uses only a single CPU core
Harnessing the Power of Distributed Computing: Advancements in Scientific Applications, Homomorphic Encryption, and Federated Learning Security
Data explosion poses lot of challenges to the state-of-the art systems, applications, and methodologies. It has been reported that 181 zettabytes of data are expected to be generated in 2025 which is over 150\% increase compared to the data that is expected to be generated in 2023. However, while system manufacturers are consistently developing devices with larger storage spaces and providing alternative storage capacities in the cloud at affordable rates, another key challenge experienced is how to effectively process the fraction of large scale of stored data in time-critical conventional systems. One transformative paradigm revolutionizing the processing and management of these large data is distributed computing whose application requires deep understanding. This dissertation focuses on exploring the potential impact of applying efficient distributed computing concepts to long existing challenges or issues in (i) a widely data-intensive scientific application (ii) applying homomorphic encryption to data intensive workloads found in outsourced databases and (iii) security of tokenized incentive mechanism for Federated learning (FL) systems.The first part of the dissertation tackles the Microelectrode arrays (MEAs) parameterization problem from an orthogonal viewpoint enlightened by algebraic topology, which allows us to algebraically parametrize MEAs whose structure and intrinsic parallelism are hard to identify otherwise. We implement a new paradigm, namely Parma, to demonstrate the effectiveness of the proposed approach and report how it outperforms the state-of-the-practice in time, scalability, and memory usage.The second part discusses our work on introducing the concept of parallel caching of secure aggregation to mitigate the performance overhead incurred by the HE module in outsourced databases. The key idea of this optimization approach is caching selected radix-ciphertexts in parallel without violating existing security guarantees of the primitive/base HE scheme. A new radix HE algorithm was designed and applied to both batch and incremental HE schemes, and experiments carried out on six workloads show that the proposed caching boost state-of-the-art HE schemes by high orders of magnitudes.In the third part, I will discuss our work on leveraging the security benefit of blockchains to enhance or protect the fairness and reliability of tokenized incentive mechanism for FL systems. We designed a blockchain-based auditing protocol to mitigate Gaussian attacks and carried out experiments with multiple FL aggregation algorithms, popular data sets and a variety of scales to validate its effectiveness
Verifiable Encodings for Secure Homomorphic Analytics
Homomorphic encryption, which enables the execution of arithmetic operations
directly on ciphertexts, is a promising solution for protecting privacy of
cloud-delegated computations on sensitive data. However, the correctness of the
computation result is not ensured. We propose two error detection encodings and
build authenticators that enable practical client-verification of cloud-based
homomorphic computations under different trade-offs and without compromising on
the features of the encryption algorithm. Our authenticators operate on top of
trending ring learning with errors based fully homomorphic encryption schemes
over the integers. We implement our solution in VERITAS, a ready-to-use system
for verification of outsourced computations executed over encrypted data. We
show that contrary to prior work VERITAS supports verification of any
homomorphic operation and we demonstrate its practicality for various
applications, such as ride-hailing, genomic-data analysis, encrypted search,
and machine-learning training and inference.Comment: update authors, typos corrected, scheme update
Revealing the Landscape of Privacy-Enhancing Technologies in the Context of Data Markets for the IoT: A Systematic Literature Review
IoT data markets in public and private institutions have become increasingly
relevant in recent years because of their potential to improve data
availability and unlock new business models. However, exchanging data in
markets bears considerable challenges related to disclosing sensitive
information. Despite considerable research focused on different aspects of
privacy-enhancing data markets for the IoT, none of the solutions proposed so
far seems to find a practical adoption. Thus, this study aims to organize the
state-of-the-art solutions, analyze and scope the technologies that have been
suggested in this context, and structure the remaining challenges to determine
areas where future research is required. To accomplish this goal, we conducted
a systematic literature review on privacy enhancement in data markets for the
IoT, covering 50 publications dated up to July 2020, and provided updates with
24 publications dated up to May 2022. Our results indicate that most research
in this area has emerged only recently, and no IoT data market architecture has
established itself as canonical. Existing solutions frequently lack the
required combination of anonymization and secure computation technologies.
Furthermore, there is no consensus on the appropriate use of blockchain
technology for IoT data markets and a low degree of leveraging existing
libraries or reusing generic data market architectures. We also identified
significant challenges remaining, such as the copy problem and the recursive
enforcement problem that-while solutions have been suggested to some extent-are
often not sufficiently addressed in proposed designs. We conclude that
privacy-enhancing technologies need further improvements to positively impact
data markets so that, ultimately, the value of data is preserved through data
scarcity and users' privacy and businesses-critical information are protected.Comment: 49 pages, 17 figures, 11 table
Enabling privacy-preserving sharing of genomic data for GWASs in decentralized networks
The human genome can reveal sensitive information and is potentially re-identifiable, which raises privacy and security concerns about sharing such data on wide scales. In this work, we propose a preventive approach for privacy-preserving sharing of genomic data in decentralized networks for Genome-wide association studies (GWASs), which have been widely used in discovering the association between genotypes and phenotypes. The key components of this work are: a decentralized secure network, with a privacy-preserving sharing protocol, and a gene fragmentation framework that is trainable in an end-to-end manner. Our experiments on real datasets show the effectiveness of our privacy-preserving approaches as well as significant improvements in efficiency when compared with recent, related algorithms
- …