1,987 research outputs found
Private Graph Data Release: A Survey
The application of graph analytics to various domains have yielded tremendous
societal and economical benefits in recent years. However, the increasingly
widespread adoption of graph analytics comes with a commensurate increase in
the need to protect private information in graph databases, especially in light
of the many privacy breaches in real-world graph data that was supposed to
preserve sensitive information. This paper provides a comprehensive survey of
private graph data release algorithms that seek to achieve the fine balance
between privacy and utility, with a specific focus on provably private
mechanisms. Many of these mechanisms fall under natural extensions of the
Differential Privacy framework to graph data, but we also investigate more
general privacy formulations like Pufferfish Privacy that can deal with the
limitations of Differential Privacy. A wide-ranging survey of the applications
of private graph data release mechanisms to social networks, finance, supply
chain, health and energy is also provided. This survey paper and the taxonomy
it provides should benefit practitioners and researchers alike in the
increasingly important area of private graph data release and analysis
Novel approaches to anonymity and privacy in decentralized, open settings
The Internet has undergone dramatic changes in the last two decades, evolving from a mere communication network to a global multimedia platform in which billions of users actively exchange information. While this transformation has brought tremendous benefits to society, it has also created new threats to online privacy that existing technology is failing to keep pace with. In this dissertation, we present the results of two lines of research that developed two novel approaches to anonymity and privacy in decentralized, open settings. First, we examine the issue of attribute and identity disclosure in open settings and develop the novel notion of (k,d)-anonymity for open settings that we extensively study and validate experimentally. Furthermore, we investigate the relationship between anonymity and linkability using the notion of (k,d)-anonymity and show that, in contrast to the traditional closed setting, anonymity within one online community does necessarily imply unlinkability across different online communities in the decentralized, open setting. Secondly, we consider the transitive diffusion of information that is shared in social networks and spread through pairwise interactions of user connected in this social network. We develop the novel approach of exposure minimization to control the diffusion of information within an open network, allowing the owner to minimize its exposure by suitably choosing who they share their information with. We implement our algorithms and investigate the practical limitations of user side exposure minimization in large social networks. At their core, both of these approaches present a departure from the provable privacy guarantees that we can achieve in closed settings and a step towards sound assessments of privacy risks in decentralized, open settings.Das Internet hat in den letzten zwei Jahrzehnten eine drastische Transformation erlebt und entwickelte sich dabei von einem einfachen Kommunikationsnetzwerk zu einer globalen Multimedia Plattform auf der Milliarden von Nutzern aktiv Informationen austauschen. Diese Transformation hat zwar einen gewaltigen Nutzen und vielfältige Vorteile für die Gesellschaft mit sich gebracht, hat aber gleichzeitig auch neue Herausforderungen und Gefahren für online Privacy mit sich gebracht mit der die aktuelle Technologie nicht mithalten kann. In dieser Dissertation präsentieren wir zwei neue Ansätze für Anonymität und Privacy in dezentralisierten und offenen Systemen. Mit unserem ersten Ansatz untersuchen wir das Problem der Attribut- und Identitätspreisgabe in offenen Netzwerken und entwickeln hierzu den Begriff der (k, d)-Anonymität für offene Systeme welchen wir extensiv analysieren und anschließend experimentell validieren. Zusätzlich untersuchen wir die Beziehung zwischen Anonymität und Unlinkability in offenen Systemen mithilfe des Begriff der (k, d)-Anonymität und zeigen, dass, im Gegensatz zu traditionell betrachteten, abgeschlossenen Systeme, Anonymität innerhalb einer Online Community nicht zwingend die Unlinkability zwischen verschiedenen Online Communitys impliziert. Mit unserem zweiten Ansatz untersuchen wir die transitive Diffusion von Information die in Sozialen Netzwerken geteilt wird und sich dann durch die paarweisen Interaktionen von Nutzern durch eben dieses Netzwerk ausbreitet. Wir entwickeln eine neue Methode zur Kontrolle der Ausbreitung dieser Information durch die Minimierung ihrer Exposure, was dem Besitzer dieser Information erlaubt zu kontrollieren wie weit sich deren Information ausbreitet indem diese initial mit einer sorgfältig gewählten Menge von Nutzern geteilt wird. Wir implementieren die hierzu entwickelten Algorithmen und untersuchen die praktischen Grenzen der Exposure Minimierung, wenn sie von Nutzerseite für große Netzwerke ausgeführt werden soll. Beide hier vorgestellten Ansätze verbindet eine Neuausrichtung der Aussagen die diese bezüglich Privacy treffen: wir bewegen uns weg von beweisbaren Privacy Garantien für abgeschlossene Systeme, und machen einen Schritt zu robusten Privacy Risikoeinschätzungen für dezentralisierte, offene Systeme in denen solche beweisbaren Garantien nicht möglich sind
Decentralised, Scalable and Privacy-Preserving Synthetic Data Generation
Synthetic data is emerging as a promising way to harness the value of data,
while reducing privacy risks. The potential of synthetic data is not limited to
privacy-friendly data release, but also includes complementing real data in
use-cases such as training machine learning algorithms that are more fair and
robust to distribution shifts etc. There is a lot of interest in algorithmic
advances in synthetic data generation for providing better privacy and
statistical guarantees and for its better utilisation in machine learning
pipelines. However, for responsible and trustworthy synthetic data generation,
it is not sufficient to focus only on these algorithmic aspects and instead, a
holistic view of the synthetic data generation pipeline must be considered. We
build a novel system that allows the contributors of real data to autonomously
participate in differentially private synthetic data generation without relying
on a trusted centre. Our modular, general and scalable solution is based on
three building blocks namely: Solid (Social Linked Data), MPC (Secure
Multi-Party Computation) and Trusted Execution Environments (TEEs). Solid is a
specification that lets people store their data securely in decentralised data
stores called Pods and control access to their data. MPC refers to the set of
cryptographic methods for different parties to jointly compute a function over
their inputs while keeping those inputs private. TEEs such as Intel SGX rely on
hardware based features for confidentiality and integrity of code and data. We
show how these three technologies can be effectively used to address various
challenges in responsible and trustworthy synthetic data generation by
ensuring: 1) contributor autonomy, 2) decentralisation, 3) privacy and 4)
scalability. We support our claims with rigorous empirical results on simulated
and real datasets and different synthetic data generation algorithms
Privacy-Preserving Data in IoT-based Cloud Systems: A Comprehensive Survey with AI Integration
As the integration of Internet of Things devices with cloud computing
proliferates, the paramount importance of privacy preservation comes to the
forefront. This survey paper meticulously explores the landscape of privacy
issues in the dynamic intersection of IoT and cloud systems. The comprehensive
literature review synthesizes existing research, illuminating key challenges
and discerning emerging trends in privacy preserving techniques. The
categorization of diverse approaches unveils a nuanced understanding of
encryption techniques, anonymization strategies, access control mechanisms, and
the burgeoning integration of artificial intelligence. Notable trends include
the infusion of machine learning for dynamic anonymization, homomorphic
encryption for secure computation, and AI-driven access control systems. The
culmination of this survey contributes a holistic view, laying the groundwork
for understanding the multifaceted strategies employed in securing sensitive
data within IoT-based cloud environments. The insights garnered from this
survey provide a valuable resource for researchers, practitioners, and
policymakers navigating the complex terrain of privacy preservation in the
evolving landscape of IoT and cloud computingComment: 33 page
PrivGraph: Differentially Private Graph Data Publication by Exploiting Community Information
Graph data is used in a wide range of applications, while analyzing graph
data without protection is prone to privacy breach risks. To mitigate the
privacy risks, we resort to the standard technique of differential privacy to
publish a synthetic graph. However, existing differentially private graph
synthesis approaches either introduce excessive noise by directly perturbing
the adjacency matrix, or suffer significant information loss during the graph
encoding process. In this paper, we propose an effective graph synthesis
algorithm PrivGraph by exploiting the community information. Concretely,
PrivGraph differentially privately partitions the private graph into
communities, extracts intra-community and inter-community information, and
reconstructs the graph from the extracted graph information. We validate the
effectiveness of PrivGraph on six real-world graph datasets and seven commonly
used graph metrics.Comment: To Appear in the 32nd USENIX Security Symposiu
Seeding with Differentially Private Network Information
When designing interventions in public health, development, and education,
decision makers rely on social network data to target a small number of people,
capitalizing on peer effects and social contagion to bring about the most
welfare benefits to the population. Developing new methods that are
privacy-preserving for network data collection and targeted interventions is
critical for designing sustainable public health and development interventions
on social networks. In a similar vein, social media platforms rely on network
data and information from past diffusions to organize their ad campaign and
improve the efficacy of targeted advertising. Ensuring that these network
operations do not violate users' privacy is critical to the sustainability of
social media platforms and their ad economies. We study privacy guarantees for
influence maximization algorithms when the social network is unknown, and the
inputs are samples of prior influence cascades that are collected at random.
Building on recent results that address seeding with costly network
information, our privacy-preserving algorithms introduce randomization in the
collected data or the algorithm output, and can bound each node's (or group of
nodes') privacy loss in deciding whether or not their data should be included
in the algorithm input. We provide theoretical guarantees of the seeding
performance with a limited sample size subject to differential privacy budgets
in both central and local privacy regimes. Simulations on synthetic and
empirical network datasets reveal the diminishing value of network information
with decreasing privacy budget in both regimes.Comment: Preliminary version in AAMAS 2023:
https://dl.acm.org/doi/10.5555/3545946.3599081 -- Code and data:
https://github.com/aminrahimian/dp-inf-ma
- …