Privacy preserving analysis of graph structured data

Abstract

In the real world, graph structured data is ubiquitous. For example, social networks, communications networks, logistics networks, etc. can all be modeled as graphs. Many concepts and theories have been proposed to deepen the understanding of the graph data and be used to solve problems of practical interest represented by graphs. However, little of this work takes privacy concerns into account. The objective of this dissertation is to investigate the problem of preserving the privacy of graph structured data while enabling useful analysis. To this end, we have addressed the following research issues. First, we have investigated the Privacy Preserving Link Discovery problem. Link discovery is the process of identifying association(s) among different entities included in a complex network structure. We show that the problem of privacy preserving link discovery can be reduced to finding the transitive closure of a distributed graph in a secure manner. We have proposed the protocols for secure transitive closure computation. To improve the computational efficiency, we have further proposed two efficient alternatives. While link discovery is quite useful, for applications such as financial fraud or terrorist detection, it may be necessary to figure out if certain entities are related by transactions having certain properties. To this end, we have investigated more complex problems such as figuring out the maximum-flow between entities across transactions. We formulate the privacy preserving maximum-flow problem in distributed graphs. We have proposed a novel edge expansion technique for graph transformation. We show that the proposed technique ensures the required privacy while guaranteeing the correctness of maximum-flow computation. Since the graphs are distributed among the parties, we present a secure integration procedure that protects the structure of each involved private graph. In addition, it prevents revealing which edge (node) in the final integrated graph originates from which participating party. One important problem with centralized graphs is the question of how to effectively anonymize them. This is especially important in the domain of social networks, where subgraph structure could be used to breach individual privacy. We have proposed to create effective structure-aware anonymization techniques that maximally preserve the structure of the original graph as well as its structural properties. Moreover, since grouping and matching local structures is indeed the most important step in the proposed anonymization, alternative grouping and matching techniques are further explored and proposed. Due to the interconnections among the nodes, it remains a challenge to incorporate the goal of preserving graph properties directly into the anonymization process without breaching privacy. Still, it is desirable to derive the original graph properties by making use of some known facts during randomization process. To address the challenge, we have explored the randomization perturbation techniques to protect graph privacy and proposed iterative procedures to derive some important graph properties such as the nodes reachability and degree distribution.Ph. D.Includes bibliographical referencesIncludes vitaby Xiaoyun H

    Similar works

    Full text

    thumbnail-image