999 research outputs found

    On the k-anonymization of time-varying and multi-layer social graphs

    Get PDF
    The popularity of online social media platforms provides an unprecedented opportunity to study real-world complex networks of interactions. However, releasing this data to researchers and the public comes at the cost of potentially exposing private and sensitive user information. It has been shown that a naive anonymization of a network by removing the identity of the nodes is not sufficient to preserve users’ privacy. In order to deal with malicious attacks, k -anonymity solutions have been proposed to partially obfuscate topological information that can be used to infer nodes’ identity. In this paper, we study the problem of ensuring k anonymity in time-varying graphs, i.e., graphs with a structure that changes over time, and multi-layer graphs, i.e., graphs with multiple types of links. More specifically, we examine the case in which the attacker has access to the degree of the nodes. The goal is to generate a new graph where, given the degree of a node in each (temporal) layer of the graph, such a node remains indistinguishable from other k-1 nodes in the graph. In order to achieve this, we find the optimal partitioning of the graph nodes such that the cost of anonymizing the degree information within each group is minimum. We show that this reduces to a special case of a Generalized Assignment Problem, and we propose a simple yet effective algorithm to solve it. Finally, we introduce an iterated linear programming approach to enforce the realizability of the anonymized degree sequences. The efficacy of the method is assessed through an extensive set of experiments on synthetic and real-world graphs

    Privacy and Anonymization of Neighborhoods in Multiplex Networks

    Get PDF
    Since the beginning of the digital age, the amount of available data on human behaviour has dramatically increased, along with the risk for the privacy of the represented subjects. Since the analysis of those data can bring advances to science, it is important to share them while preserving the subjects' anonymity. A significant portion of the available information can be modelled as networks, introducing an additional privacy risk related to the structure of the data themselves. For instance, in a social network, people can be uniquely identifiable because of the structure of their neighborhood, formed by the amount of their friends and the connections between them. The neighborhood's structure is the target of an identity disclosure attack on released social network data, called neighborhood attack. To mitigate this threat, algorithms to anonymize networks have been proposed. However, this problem has not been deeply studied on multiplex networks, which combine different social network data into a single representation. The multiplex network representation makes the neighborhood attack setting more complicated, and adds information that an attacker can use to re-identify subjects. This thesis aims to understand how multiplex networks behave in terms of anonymization difficulty and neighborhood attack. We present two definitions of multiplex neighborhoods, and discuss how the fraction of nodes with unique neighborhoods can be affected. Through analysis of network models, we study the variation of the uniqueness of neighborhoods in networks with different structure and characteristics. We show that the uniqueness of neighborhoods has a linear trend depending on the network size and average degree. If the network has a more random structure, the uniqueness decreases significantly when the network size increases. On the other hand, if the local structure is more pronounced, the uniqueness is not strongly influenced by the number of nodes. We also conduct a motif analysis to study the recurring patterns that can make social networks' neighborhoods less unique. Lastly, we propose an algorithm to anonymize a pair of multiplex neighborhoods. This algorithm is the core building block that can be used in a method to prevent neighborhood attacks on multiplex networks

    k-Anonymity on Graphs using the Szemerédi Regularity Lemma

    Get PDF
    Graph anonymisation aims at reducing the ability of an attacker to identify the nodes of a graph by obfuscating its structural information. In k-anonymity, this means making each node indistinguishable from at least other k-1 nodes. Simply stripping the nodes of a graph of their identifying label is insufficient, as with enough structural knowledge an attacker can still recover the nodes identities. We propose an algorithm to enforce k-anonymity based on the Szemerédi regularity lemma. Given a graph, we start by computing a regular partition of its nodes. The Szemerédi regularity lemma ensures that such a partition exists and that the edges between the sets of nodes behave quasi-randomly. With this partition to hand, we anonymize the graph by randomizing the edges within each set, obtaining a graph that is structurally similar to the original one yet the nodes within each set are structurally indistinguishable. Unlike other k-anonymisation methods, our approach does not consider a single type of attack, but instead it aims to prevent any structure-based de-anonymisation attempt. We test our framework on a wide range of real-world networks and we compare it against another simple yet widely used k-anonymisation technique demonstrating the effectiveness of our approach

    You Can't See Me: Anonymizing Graphs Using the Szemerédi Regularity Lemma.

    Get PDF
    Complex networks gathered from our online interactions provide a rich source of information that can be used to try to model and predict our behavior. While this has very tangible benefits that we have all grown accustomed to, there is a concrete privacy risk in sharing potentially sensitive data about ourselves and the people we interact with, especially when this data is publicly available online and unprotected from malicious attacks. k-anonymity is a technique aimed at reducing this risk by obfuscating the topological information of a graph that can be used to infer the nodes' identity. In this paper we propose a novel algorithm to enforce k-anonymity based on a well-known result in extremal graph theory, the Szemerédi regularity lemma. Given a graph, we start by computing a regular partition of its nodes. The Szemerédi regularity lemma ensures that such a partition exists and that the edges between the sets of nodes behave almost randomly. With this partition, we anonymize the graph by randomizing the edges within each set, obtaining a graph that is structurally similar to the original one yet the nodes within each set are structurally indistinguishable. We test the proposed approach on real-world networks extracted from Facebook. Our experimental results show that the proposed approach is able to anonymize a graph while retaining most of its structural information

    Privacy-Preserving Graph Machine Learning from Data to Computation: A Survey

    Full text link
    In graph machine learning, data collection, sharing, and analysis often involve multiple parties, each of which may require varying levels of data security and privacy. To this end, preserving privacy is of great importance in protecting sensitive information. In the era of big data, the relationships among data entities have become unprecedentedly complex, and more applications utilize advanced data structures (i.e., graphs) that can support network structures and relevant attribute information. To date, many graph-based AI models have been proposed (e.g., graph neural networks) for various domain tasks, like computer vision and natural language processing. In this paper, we focus on reviewing privacy-preserving techniques of graph machine learning. We systematically review related works from the data to the computational aspects. We first review methods for generating privacy-preserving graph data. Then we describe methods for transmitting privacy-preserved information (e.g., graph model parameters) to realize the optimization-based computation when data sharing among multiple parties is risky or impossible. In addition to discussing relevant theoretical methodology and software tools, we also discuss current challenges and highlight several possible future research opportunities for privacy-preserving graph machine learning. Finally, we envision a unified and comprehensive secure graph machine learning system.Comment: Accepted by SIGKDD Explorations 2023, Volume 25, Issue

    Vertical Federated Graph Neural Network for Recommender System

    Full text link
    Conventional recommender systems are required to train the recommendation model using a centralized database. However, due to data privacy concerns, this is often impractical when multi-parties are involved in recommender system training. Federated learning appears as an excellent solution to the data isolation and privacy problem. Recently, Graph neural network (GNN) is becoming a promising approach for federated recommender systems. However, a key challenge is to conduct embedding propagation while preserving the privacy of the graph structure. Few studies have been conducted on the federated GNN-based recommender system. Our study proposes the first vertical federated GNN-based recommender system, called VerFedGNN. We design a framework to transmit: (i) the summation of neighbor embeddings using random projection, and (ii) gradients of public parameter perturbed by ternary quantization mechanism. Empirical studies show that VerFedGNN has competitive prediction accuracy with existing privacy preserving GNN frameworks while enhanced privacy protection for users' interaction information.Comment: 17 pages, 9 figure

    ToR K-Anonymity against deep learning watermarking attacks

    Get PDF
    It is known that totalitarian regimes often perform surveillance and censorship of their communication networks. The Tor anonymity network allows users to browse the Internet anonymously to circumvent censorship filters and possible prosecution. This has made Tor an enticing target for state-level actors and cooperative state-level adversaries, with privileged access to network traffic captured at the level of Autonomous Systems(ASs) or Internet Exchange Points(IXPs). This thesis studied the attack typologies involved, with a particular focus on traffic correlation techniques for de-anonymization of Tor endpoints. Our goal was to design a test-bench environment and tool, based on recently researched deep learning techniques for traffic analysis, to evaluate the effectiveness of countermeasures provided by recent ap- proaches that try to strengthen Tor’s anonymity protection. The targeted solution is based on K-anonymity input covert channels organized as a pre-staged multipath network. The research challenge was to design a test-bench environment and tool, to launch active correlation attacks leveraging traffic flow correlation through the detection of in- duced watermarks in Tor traffic. To de-anonymize Tor connection endpoints, our tool analyses intrinsic time patterns of Tor synthetic egress traffic to detect flows with previ- ously injected time-based watermarks. With the obtained results and conclusions, we contributed to the evaluation of the security guarantees that the targeted K-anonymity solution provides as a countermeasure against de-anonymization attacks.Já foi extensamente observado que em vários países governados por regimes totalitários existe monitorização, e consequente censura, nos vários meios de comunicação utilizados. O Tor permite aos seus utilizadores navegar pela internet com garantias de privacidade e anonimato, de forma a evitar bloqueios, censura e processos legais impostos pela entidade que governa. Estas propriedades tornaram a rede Tor um alvo de ataque para vários governos e ações conjuntas de várias entidades, com acesso privilegiado a extensas zonas da rede e vários pontos de acesso à mesma. Esta tese realiza o estudo de tipologias de ataques que quebram o anonimato da rede Tor, com especial foco em técnicas de correlação de tráfegos. O nosso objetivo é realizar um ambiente de estudo e ferramenta, baseada em técnicas recentes de aprendizagem pro- funda e injeção de marcas de água, para avaliar a eficácia de contramedidas recentemente investigadas, que tentam fortalecer o anonimato da rede Tor. A contramedida que pre- tendemos avaliar é baseada na criação de multi-circuitos encobertos, recorrendo a túneis TLS de entrada, de forma a acoplar o tráfego de um grupo anonimo de K utilizadores. A solução a ser desenvolvida deve lançar um ataque de correlação de tráfegos recorrendo a técnicas ativas de indução de marcas de água. Esta ferramenta deve ser capaz de correla- cionar tráfego sintético de saída de circuitos Tor, realizando a injeção de marcas de água à entrada com o propósito de serem detetadas num segundo ponto de observação. Aplicada a um cenário real, o propósito da ferramenta está enquadrado na quebra do anonimato de serviços secretos fornecidos pela rede Tor, assim como os utilizadores dos mesmos. Os resultados esperados irão contribuir para a avaliação da solução de anonimato de K utilizadores mencionada, que é vista como contramedida para ataques de desanonimi- zação

    GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training

    Full text link
    Graph representation learning has emerged as a powerful technique for addressing real-world problems. Various downstream graph learning tasks have benefited from its recent developments, such as node classification, similarity search, and graph classification. However, prior arts on graph representation learning focus on domain specific problems and train a dedicated model for each graph dataset, which is usually non-transferable to out-of-domain data. Inspired by the recent advances in pre-training from natural language processing and computer vision, we design Graph Contrastive Coding (GCC) -- a self-supervised graph neural network pre-training framework -- to capture the universal network topological properties across multiple networks. We design GCC's pre-training task as subgraph instance discrimination in and across networks and leverage contrastive learning to empower graph neural networks to learn the intrinsic and transferable structural representations. We conduct extensive experiments on three graph learning tasks and ten graph datasets. The results show that GCC pre-trained on a collection of diverse datasets can achieve competitive or better performance to its task-specific and trained-from-scratch counterparts. This suggests that the pre-training and fine-tuning paradigm presents great potential for graph representation learning.Comment: 11 pages, 5 figures, to appear in KDD 2020 proceeding
    corecore