47 research outputs found
Quantifying and mitigating privacy risks in biomedical data
Die stetig sinkenden Kosten für molekulares Profiling haben der Biomedizin zahlreiche neue Arten biomedizinischer Daten geliefert und den Durchbruch für eine präzisere und personalisierte Medizin ermöglicht. Die Veröffentlichung dieser inhärent hochsensiblen und miteinander verbundenen Daten stellt jedoch eine neue Bedrohung für unsere Privatsphäre dar. Während die IT-Sicherheitsforschung sich bisher hauptsächlich auf die Auswirkung genetischer Daten auf die Privatsphäre konzentriert hat, wurden die vielfältigen Risiken durch andere Arten biomedizinischer Daten – epigenetischer Daten im Speziellen – größtenteils außer Acht gelassen. Diese Dissertation stellt Verfahren zur Messung und Abwehr solcher Privatsphärerisiken vor. Neben dem Genom konzentrieren wir uns auf zwei der wichtigsten gesundheitsrelevanten epigenetischen Elemente: microRNAs und DNA-Methylierung. Wir quantifizieren die Privatsphäre für die folgenden realistischen Angriffe: (1) Verknüpfung von Profilen über die Zeit, Verknüpfung verschiedener Datentypen und verwandter Personen, (2) Feststellung der Studienteilnahme und (3) Inferenz von Attributen. Unsere Resultate bekräftigen, dass die Privatsphärerisiken solcher Daten ernst genommen werden müssen. Zudem präsentieren und evaluieren wir Lösungen zum Schutz der Privatsphäre. Sie reichen von der Anwendung von Differential Privacy unter Berücksichtigung des Nutzwertes bis zu kryptographischen Protokollen zur sicheren Auswertung eines Random Forests.The decreasing costs of molecular profiling have fueled the biomedical research community with a plethora of new types of biomedical data, allowing for a breakthrough towards a more precise and personalized medicine. However, the release of these intrinsically highly sensitive, interdependent data poses a new severe privacy threat. So far, the security community has mostly focused on privacy risks arising from genomic data. However, the manifold privacy risks stemming from other types of biomedical data – and epigenetic data in particular – have been largely overlooked. In this thesis, we provide means to quantify and protect the privacy of individuals’ biomedical data. Besides the genome, we specifically focus on two of the most important epigenetic elements influencing human health: microRNAs and DNA methylation. We quantify the privacy for multiple realistic attack scenarios, namely, (1) linkability attacks along the temporal dimension, between different types of data, and between related individuals, (2) membership attacks, and (3) inference attacks. Our results underline that the privacy risks inherent to biomedical data have to be taken seriously. Moreover, we present and evaluate solutions to preserve the privacy of individuals. Our mitigation techniques stretch from the differentially private release of epigenetic data, considering its utility, up to cryptographic constructions to securely, and privately evaluate a random forest on a patient’s data
Measuring Conditional Anonymity—A Global Study
The realm of digital health is experiencing a global surge, with mobile applications extending their reach into various facets of daily life. From tracking daily eating habits and vital functions to monitoring sleep patterns and even the menstrual cycle, these apps have become ubiquitous in their pursuit of comprehensive health insights. Many of these apps collect sensitive data and promise users to protect their privacy - often through pseudonymization. We analyze the real anonymity that users can expect by this approach and report on our findings. More concretely: We introduce the notion of conditional anonymity sets derived from statistical properties of the population; We measure anonymity sets for two real-world applications and present overarching findings from 39 countries; We develop a graphical tool for people to explore their own anonymity set. One of our case studies is a popular app for tracking the menstruation cycle. Our findings for this app show that, despite their promise to protect privacy, the collected data can be used to identify users up to groups of 5 people in 97% of all the US counties, allowing the de-anonymization of the individuals. Given that the US Supreme Court recently overturned abortion rights, the possibility of determining individuals is a calamity
Albatross: An optimistic consensus algorithm
The area of distributed ledgers is a vast and quickly developing landscape.
At the heart of most distributed ledgers is their consensus protocol. The
consensus protocol describes the way participants in a distributed network
interact with each other to obtain and agree on a shared state. While classical
consensus Byzantine fault tolerant (BFT) algorithms are designed to work in
closed, size-limited networks only, modern distributed ledgers -- and
blockchains in particular -- often focus on open, permissionless networks. In
this paper, we present a novel blockchain consensus algorithm, called
Albatross, inspired by speculative BFT algorithms. Transactions in Albatross
benefit from strong probabilistic finality. We describe the technical
specification of Albatross in detail and analyse its security and performance.
We conclude that the protocol is secure under regular PBFT security assumptions
and has a performance close to the theoretical maximum for single-chain
Proof-of-Stake consensus algorithms
Measuring Conditional Anonymity - A Global Study
The realm of digital health is experiencing a global surge, with mobile applications extending their reach into various facets of daily life. From tracking daily eating habits and vital functions to monitoring sleep patterns and even the menstrual cycle, these apps have become ubiquitous in their pursuit of comprehensive health insights.
Many of these apps collect sensitive data and promise users to protect their privacy - often through pseudonymization. We analyze the real anonymity that users can expect by this approach and report on our findings. More concretely:
1. We introduce the notion of conditional anonymity sets derived from statistical properties of the population.
2. We measure anonymity sets for two real-world applications and present overarching findings from 39 countries.
3. We develop a graphical tool for people to explore their own anonymity set.
One of our case studies is a popular app for tracking the menstruation cycle. Our findings for this app show that, despite their promise to protect privacy, the collected data can be used to identify users up to groups of 5 people in 97% of all the US counties, allowing the de-anonymization of the individuals. Given that the US Supreme Court recently overturned abortion rights, the possibility
of determining individuals is a calamity
A framework for constructing Single Secret Leader Election from MPC
The emergence of distributed digital currencies has raised the need for a reliable consensus mechanism. In proof-of-stake cryptocurrencies, the participants periodically choose a closed set of validators, who can vote and append transactions to the blockchain. Each validator can become a leader with the probability proportional to its stake.Keeping the leader private yet unique until it publishes a new block can significantly reduce the attack vector of an adversary and improve the throughput of the network. The problem of Single Secret Leader Election(SSLE) was first formally defined by Boneh et al. in 2020. In this work, we propose a novel framework for constructing SSLE protocols, which relies on secure multi-party computation (MPC) and satisfies the desired security properties. Our framework does not use any shuffle or sort operations and has a computational cost for N parties as low as O(N) of basic MPC operations per party. We improve the state-of-the-art for SSLE protocols that do not assume a trusted setup. Moreover, our SSLE scheme efficiently handles weighted elections. That is, for a total weight S of N parties, the associated costs are only increased by a factor of log S. When the MPC layer is instantiated with techniques based on Shamir’s secret-sharing, our SSLE has a communication cost of O(N^2) which is spread over O(log N) rounds, can tolerate up to t < N/2 of faulty nodes without restarting the protocol, and its security relies on DDH in the random oracle model. When the MPC layer is instantiated with more efficient techniques based on garbled circuits, our SSLE re-quires all parties to participate, up to N−1 of which can be malicious, and its security is based on the random oracle model
Link Stealing Attacks Against Inductive Graph Neural Networks
A graph neural network (GNN) is a type of neural network that is specifically designed to process graph-structured data. Typically, GNNs can be implemented in two settings, including the transductive setting and the inductive setting. In the transductive setting, the trained model can only predict the labels of nodes that were observed at the training time. In the inductive setting, the trained model can be generalized to new nodes/graphs. Due to its flexibility, the inductive setting is the most popular GNN setting at the moment. Previous work has shown that transductive GNNs are vulnerable to a series of privacy attacks. However, a comprehensive privacy analysis of inductive GNN models is still missing. This paper fills the gap by conducting a systematic privacy analysis of inductive GNNs through the lens of link stealing attacks. We propose two types of link stealing attacks, i.e., posterior-only attacks and combined attacks. We define threat models of the posterior-only attacks with respect to node topology and the combined attacks by considering combinations of posteriors, node attributes, and graph features. Extensive evaluation on six real-world datasets demonstrates that inductive GNNs leak rich information that enables link stealing attacks with advantageous properties. Even attacks with no knowledge about graph structures can be effective. We also show that our attacks are robust to different node similarities and different graph features. As a counterpart, we investigate two possible defenses and discover they are ineffective against our attacks, which calls for more effective defenses
Quantifying Privacy Risks of Prompts in Visual Prompt Learning
Large-scale pre-trained models are increasingly adapted to downstream tasks
through a new paradigm called prompt learning. In contrast to fine-tuning,
prompt learning does not update the pre-trained model's parameters. Instead, it
only learns an input perturbation, namely prompt, to be added to the downstream
task data for predictions. Given the fast development of prompt learning, a
well-generalized prompt inevitably becomes a valuable asset as significant
effort and proprietary data are used to create it. This naturally raises the
question of whether a prompt may leak the proprietary information of its
training data. In this paper, we perform the first comprehensive privacy
assessment of prompts learned by visual prompt learning through the lens of
property inference and membership inference attacks. Our empirical evaluation
shows that the prompts are vulnerable to both attacks. We also demonstrate that
the adversary can mount a successful property inference attack with limited
cost. Moreover, we show that membership inference attacks against prompts can
be successful with relaxed adversarial assumptions. We further make some
initial investigations on the defenses and observe that our method can mitigate
the membership inference attacks with a decent utility-defense trade-off but
fails to defend against property inference attacks. We hope our results can
shed light on the privacy risks of the popular prompt learning paradigm. To
facilitate the research in this direction, we will share our code and models
with the community.Comment: To appear in the 33rd USENIX Security Symposium, August 14-16, 202