1,330 research outputs found
Secure and scalable deduplication of horizontally partitioned health data for privacy-preserving distributed statistical computation
Background
Techniques have been developed to compute statistics on distributed datasets without revealing private information except the statistical results. However, duplicate records in a distributed dataset may lead to incorrect statistical results. Therefore, to increase the accuracy of the statistical analysis of a distributed dataset, secure deduplication is an important preprocessing step.
Methods
We designed a secure protocol for the deduplication of horizontally partitioned datasets with deterministic record linkage algorithms. We provided a formal security analysis of the protocol in the presence of semi-honest adversaries. The protocol was implemented and deployed across three microbiology laboratories located in Norway, and we ran experiments on the datasets in which the number of records for each laboratory varied. Experiments were also performed on simulated microbiology datasets and data custodians connected through a local area network.
Results
The security analysis demonstrated that the protocol protects the privacy of individuals and data custodians under a semi-honest adversarial model. More precisely, the protocol remains secure with the collusion of up to N − 2 corrupt data custodians. The total runtime for the protocol scales linearly with the addition of data custodians and records. One million simulated records distributed across 20 data custodians were deduplicated within 45 s. The experimental results showed that the protocol is more efficient and scalable than previous protocols for the same problem.
Conclusions
The proposed deduplication protocol is efficient and scalable for practical uses while protecting the privacy of patients and data custodians
Conclave: secure multi-party computation on big data (extended TR)
Secure Multi-Party Computation (MPC) allows mutually distrusting parties to
run joint computations without revealing private data. Current MPC algorithms
scale poorly with data size, which makes MPC on "big data" prohibitively slow
and inhibits its practical use.
Many relational analytics queries can maintain MPC's end-to-end security
guarantee without using cryptographic MPC techniques for all operations.
Conclave is a query compiler that accelerates such queries by transforming them
into a combination of data-parallel, local cleartext processing and small MPC
steps. When parties trust others with specific subsets of the data, Conclave
applies new hybrid MPC-cleartext protocols to run additional steps outside of
MPC and improve scalability further.
Our Conclave prototype generates code for cleartext processing in Python and
Spark, and for secure MPC using the Sharemind and Obliv-C frameworks. Conclave
scales to data sets between three and six orders of magnitude larger than
state-of-the-art MPC frameworks support on their own. Thanks to its hybrid
protocols, Conclave also substantially outperforms SMCQL, the most similar
existing system.Comment: Extended technical report for EuroSys 2019 pape
Hang With Your Buddies to Resist Intersection Attacks
Some anonymity schemes might in principle protect users from pervasive
network surveillance - but only if all messages are independent and unlinkable.
Users in practice often need pseudonymity - sending messages intentionally
linkable to each other but not to the sender - but pseudonymity in dynamic
networks exposes users to intersection attacks. We present Buddies, the first
systematic design for intersection attack resistance in practical anonymity
systems. Buddies groups users dynamically into buddy sets, controlling message
transmission to make buddies within a set behaviorally indistinguishable under
traffic analysis. To manage the inevitable tradeoffs between anonymity
guarantees and communication responsiveness, Buddies enables users to select
independent attack mitigation policies for each pseudonym. Using trace-based
simulations and a working prototype, we find that Buddies can guarantee
non-trivial anonymity set sizes in realistic chat/microblogging scenarios, for
both short-lived and long-lived pseudonyms.Comment: 15 pages, 8 figure
SoK: Differential Privacies
Shortly after it was first introduced in 2006, differential privacy became
the flagship data privacy definition. Since then, numerous variants and
extensions were proposed to adapt it to different scenarios and attacker
models. In this work, we propose a systematic taxonomy of these variants and
extensions. We list all data privacy definitions based on differential privacy,
and partition them into seven categories, depending on which aspect of the
original definition is modified.
These categories act like dimensions: variants from the same category cannot
be combined, but variants from different categories can be combined to form new
definitions. We also establish a partial ordering of relative strength between
these notions by summarizing existing results. Furthermore, we list which of
these definitions satisfy some desirable properties, like composition,
post-processing, and convexity by either providing a novel proof or collecting
existing ones.Comment: This is the full version of the SoK paper with the same title,
accepted at PETS (Privacy Enhancing Technologies Symposium) 202
PrivGenDB: Efficient and privacy-preserving query executions over encrypted SNP-Phenotype database
Privacy and security issues limit the query executions over genomics datasets, notably single nucleotide polymorphisms (SNPs), raised by the sensitivity of this type of data. Therefore, it is important to ensure that executing queries on these datasets do not reveal sensitive information, such as the identity of the individuals and their genetic traits, to a data server. In this paper, we propose and present a novel model, we call PrivGenDB, to ensure the confidentiality of SNP-phenotype data while executing queries. The confidentiality in PrivGenDB is enabled by its system architecture and the search functionality provided by searchable symmetric encryption (SSE). To the best of our knowledge, PrivGenDB construction is the first SSE-based approach ensuring the confidentiality of SNP-phenotype data as the current SSE-based approaches for genomic data are limited only to substring search and range queries on a sequence of genomic data. Besides, a new data encoding mechanism is proposed and incorporated in the PrivGenDB model. This enables PrivGenDB to handle the dataset containing both genotype and phenotype and also support storing and managing other metadata, like gender and ethnicity, privately. Furthermore, different queries, namely Count, Boolean, Negation and k′-out-of-k match queries used for genomic data analysis, are supported and executed by PrivGenDB. The execution of these queries on genomic data in PrivGenDB is efficient and scalable for biomedical research and services. These are demonstrated by our analytical and empirical analysis presented in this paper. Specifically, our empirical studies on a dataset with 5000 entries (records) containing 1000 SNPs demonstrate that a count/Boolean query and a k′-out-of-k match query over 40 SNPs take approximately 4.3s and 86.4μs, respectively, outperforming the existing schemes
- …