999 research outputs found
On the k-anonymization of time-varying and multi-layer social graphs
The popularity of online social media platforms provides an unprecedented opportunity to study real-world complex networks of interactions. However, releasing this data to researchers and the public comes at the cost of potentially exposing private and sensitive user information. It has been shown that a naive anonymization of a network by removing the identity of the nodes is not sufficient to preserve users’ privacy. In order to deal with malicious attacks, k -anonymity solutions have been proposed to partially obfuscate topological information that can be used to infer nodes’ identity. In this paper, we study the problem of ensuring k anonymity in time-varying graphs, i.e., graphs with a structure that changes over time, and multi-layer graphs, i.e., graphs with multiple types of links. More specifically, we examine the case in which the attacker has access to the degree of the nodes. The goal is to generate a new graph where, given the degree of a node in each (temporal) layer of the graph, such a node remains indistinguishable from other k-1 nodes in the graph. In order to achieve this, we find the optimal partitioning of the graph nodes such that the cost of anonymizing the degree information within each group is minimum. We show that this reduces to a special case of a Generalized Assignment Problem, and we propose a simple yet effective algorithm to solve it. Finally, we introduce an iterated linear programming approach to enforce the realizability of the anonymized degree sequences. The efficacy of the method is assessed through an extensive set of experiments on synthetic and real-world graphs
Privacy and Anonymization of Neighborhoods in Multiplex Networks
Since the beginning of the digital age, the amount of available data on human behaviour has dramatically increased, along with the risk for the privacy of the represented subjects. Since the analysis of those data can bring advances to science, it is important to share them while preserving the subjects' anonymity. A significant portion of the available information can be modelled as networks, introducing an additional privacy risk related to the structure of the data themselves. For instance, in a social network, people can be uniquely identifiable because of the structure of their neighborhood, formed by the amount of their friends and the connections between them. The neighborhood's structure is the target of an identity disclosure attack on released social network data, called neighborhood attack. To mitigate this threat, algorithms to anonymize networks have been proposed. However, this problem has not been deeply studied on multiplex networks, which combine different social network data into a single representation. The multiplex network representation makes the neighborhood attack setting more complicated, and adds information that an attacker can use to re-identify subjects.
This thesis aims to understand how multiplex networks behave in terms of anonymization difficulty and neighborhood attack. We present two definitions of multiplex neighborhoods, and discuss how the fraction of nodes with unique neighborhoods can be affected.
Through analysis of network models, we study the variation of the uniqueness of neighborhoods in networks with different structure and characteristics. We show that the uniqueness of neighborhoods has a linear trend depending on the network size and average degree. If the network has a more random structure, the uniqueness decreases significantly when the network size increases. On the other hand, if the local structure is more pronounced, the uniqueness is not strongly influenced by the number of nodes. We also conduct a motif analysis to study the recurring patterns that can make social networks' neighborhoods less unique.
Lastly, we propose an algorithm to anonymize a pair of multiplex neighborhoods. This algorithm is the core building block that can be used in a method to prevent neighborhood attacks on multiplex networks
k-Anonymity on Graphs using the Szemerédi Regularity Lemma
Graph anonymisation aims at reducing the ability of an attacker to identify the nodes of a graph by obfuscating its structural information. In k-anonymity, this means making each node indistinguishable from at least other k-1 nodes. Simply stripping the nodes of a graph of their identifying label is insufficient, as with enough structural knowledge an attacker can still recover the nodes identities. We propose an algorithm to enforce k-anonymity based on the Szemerédi regularity lemma. Given a graph, we start by computing a regular partition of its nodes. The Szemerédi regularity lemma ensures that such a partition exists and that the edges between the sets of nodes behave quasi-randomly. With this partition to hand, we anonymize the graph by randomizing the edges within each set, obtaining a graph that is structurally similar to the original one yet the nodes within each set are structurally indistinguishable. Unlike other k-anonymisation methods, our approach does not consider a single type of attack, but instead it aims to prevent any structure-based de-anonymisation attempt. We test our framework on a wide range of real-world networks and we compare it against another simple yet widely used k-anonymisation technique demonstrating the effectiveness of our approach
You Can't See Me: Anonymizing Graphs Using the Szemerédi Regularity Lemma.
Complex networks gathered from our online interactions provide a rich source of information that can be used to try to model and predict our behavior. While this has very tangible benefits that we have all grown accustomed to, there is a concrete privacy risk in sharing potentially sensitive data about ourselves and the people we interact with, especially when this data is publicly available online and unprotected from malicious attacks. k-anonymity is a technique aimed at reducing this risk by obfuscating the topological information of a graph that can be used to infer the nodes' identity. In this paper we propose a novel algorithm to enforce k-anonymity based on a well-known result in extremal graph theory, the Szemerédi regularity lemma. Given a graph, we start by computing a regular partition of its nodes. The Szemerédi regularity lemma ensures that such a partition exists and that the edges between the sets of nodes behave almost randomly. With this partition, we anonymize the graph by randomizing the edges within each set, obtaining a graph that is structurally similar to the original one yet the nodes within each set are structurally indistinguishable. We test the proposed approach on real-world networks extracted from Facebook. Our experimental results show that the proposed approach is able to anonymize a graph while retaining most of its structural information
Recommended from our members
ePRIVO: an enhanced PRIvacy-preserVing opportunistic routing protocol for vehicular delay-tolerant networks
This article proposes an enhanced PRIvacy preserVing Opportunistic routing protocol (ePRIVO) for Vehicular Delay-Tolerant Networks (VDTN). ePRIVO models a VDTN as a time-varying neighboring graph where edges correspond to neighboring relationship between pairs of vehicles. It addresses the problem of vehicles taking routing decision meanwhile keeping their information private, i.e, vehicles compute their similarity and/or compare their routing metrics in a private manner using the Paillier homomorphic encryption scheme.
The effectiveness of ePRIVO is supported through extensive simulations with synthetic mobility models and a real mobility trace. Simulation results show that ePRIVO presents on average very low cryptographic costs in most scenarios. Additionally, ePRIVO presents on average gains of approximately 29% and 238% in terms of delivery ratio for the real and synthetic scenarios considered compared to other privacy-preserving routing protocols
Privacy-Preserving Graph Machine Learning from Data to Computation: A Survey
In graph machine learning, data collection, sharing, and analysis often
involve multiple parties, each of which may require varying levels of data
security and privacy. To this end, preserving privacy is of great importance in
protecting sensitive information. In the era of big data, the relationships
among data entities have become unprecedentedly complex, and more applications
utilize advanced data structures (i.e., graphs) that can support network
structures and relevant attribute information. To date, many graph-based AI
models have been proposed (e.g., graph neural networks) for various domain
tasks, like computer vision and natural language processing. In this paper, we
focus on reviewing privacy-preserving techniques of graph machine learning. We
systematically review related works from the data to the computational aspects.
We first review methods for generating privacy-preserving graph data. Then we
describe methods for transmitting privacy-preserved information (e.g., graph
model parameters) to realize the optimization-based computation when data
sharing among multiple parties is risky or impossible. In addition to
discussing relevant theoretical methodology and software tools, we also discuss
current challenges and highlight several possible future research opportunities
for privacy-preserving graph machine learning. Finally, we envision a unified
and comprehensive secure graph machine learning system.Comment: Accepted by SIGKDD Explorations 2023, Volume 25, Issue
Vertical Federated Graph Neural Network for Recommender System
Conventional recommender systems are required to train the recommendation
model using a centralized database. However, due to data privacy concerns, this
is often impractical when multi-parties are involved in recommender system
training. Federated learning appears as an excellent solution to the data
isolation and privacy problem. Recently, Graph neural network (GNN) is becoming
a promising approach for federated recommender systems. However, a key
challenge is to conduct embedding propagation while preserving the privacy of
the graph structure. Few studies have been conducted on the federated GNN-based
recommender system. Our study proposes the first vertical federated GNN-based
recommender system, called VerFedGNN. We design a framework to transmit: (i)
the summation of neighbor embeddings using random projection, and (ii)
gradients of public parameter perturbed by ternary quantization mechanism.
Empirical studies show that VerFedGNN has competitive prediction accuracy with
existing privacy preserving GNN frameworks while enhanced privacy protection
for users' interaction information.Comment: 17 pages, 9 figure
ToR K-Anonymity against deep learning watermarking attacks
It is known that totalitarian regimes often perform surveillance and censorship of their
communication networks. The Tor anonymity network allows users to browse the Internet
anonymously to circumvent censorship filters and possible prosecution. This has made
Tor an enticing target for state-level actors and cooperative state-level adversaries, with
privileged access to network traffic captured at the level of Autonomous Systems(ASs) or
Internet Exchange Points(IXPs).
This thesis studied the attack typologies involved, with a particular focus on traffic
correlation techniques for de-anonymization of Tor endpoints. Our goal was to design a
test-bench environment and tool, based on recently researched deep learning techniques
for traffic analysis, to evaluate the effectiveness of countermeasures provided by recent ap-
proaches that try to strengthen Tor’s anonymity protection. The targeted solution is based
on K-anonymity input covert channels organized as a pre-staged multipath network.
The research challenge was to design a test-bench environment and tool, to launch
active correlation attacks leveraging traffic flow correlation through the detection of in-
duced watermarks in Tor traffic. To de-anonymize Tor connection endpoints, our tool
analyses intrinsic time patterns of Tor synthetic egress traffic to detect flows with previ-
ously injected time-based watermarks.
With the obtained results and conclusions, we contributed to the evaluation of the
security guarantees that the targeted K-anonymity solution provides as a countermeasure
against de-anonymization attacks.Já foi extensamente observado que em vários países governados por regimes totalitários
existe monitorização, e consequente censura, nos vários meios de comunicação utilizados.
O Tor permite aos seus utilizadores navegar pela internet com garantias de privacidade e
anonimato, de forma a evitar bloqueios, censura e processos legais impostos pela entidade
que governa. Estas propriedades tornaram a rede Tor um alvo de ataque para vários
governos e ações conjuntas de várias entidades, com acesso privilegiado a extensas zonas
da rede e vários pontos de acesso à mesma.
Esta tese realiza o estudo de tipologias de ataques que quebram o anonimato da rede
Tor, com especial foco em técnicas de correlação de tráfegos. O nosso objetivo é realizar
um ambiente de estudo e ferramenta, baseada em técnicas recentes de aprendizagem pro-
funda e injeção de marcas de água, para avaliar a eficácia de contramedidas recentemente
investigadas, que tentam fortalecer o anonimato da rede Tor. A contramedida que pre-
tendemos avaliar é baseada na criação de multi-circuitos encobertos, recorrendo a túneis
TLS de entrada, de forma a acoplar o tráfego de um grupo anonimo de K utilizadores. A
solução a ser desenvolvida deve lançar um ataque de correlação de tráfegos recorrendo a
técnicas ativas de indução de marcas de água. Esta ferramenta deve ser capaz de correla-
cionar tráfego sintético de saída de circuitos Tor, realizando a injeção de marcas de água à
entrada com o propósito de serem detetadas num segundo ponto de observação. Aplicada
a um cenário real, o propósito da ferramenta está enquadrado na quebra do anonimato
de serviços secretos fornecidos pela rede Tor, assim como os utilizadores dos mesmos.
Os resultados esperados irão contribuir para a avaliação da solução de anonimato de
K utilizadores mencionada, que é vista como contramedida para ataques de desanonimi-
zação
GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training
Graph representation learning has emerged as a powerful technique for
addressing real-world problems. Various downstream graph learning tasks have
benefited from its recent developments, such as node classification, similarity
search, and graph classification. However, prior arts on graph representation
learning focus on domain specific problems and train a dedicated model for each
graph dataset, which is usually non-transferable to out-of-domain data.
Inspired by the recent advances in pre-training from natural language
processing and computer vision, we design Graph Contrastive Coding (GCC) -- a
self-supervised graph neural network pre-training framework -- to capture the
universal network topological properties across multiple networks. We design
GCC's pre-training task as subgraph instance discrimination in and across
networks and leverage contrastive learning to empower graph neural networks to
learn the intrinsic and transferable structural representations. We conduct
extensive experiments on three graph learning tasks and ten graph datasets. The
results show that GCC pre-trained on a collection of diverse datasets can
achieve competitive or better performance to its task-specific and
trained-from-scratch counterparts. This suggests that the pre-training and
fine-tuning paradigm presents great potential for graph representation
learning.Comment: 11 pages, 5 figures, to appear in KDD 2020 proceeding
- …