Understanding the importance of side information in graph matching problem

Abstract

Graph matching algorithms rely on the availability of seed vertex pairs as side information to deanonymize users across networks. Although such algorithms work well in practice, there are other types of side information available which are potentially useful to an attacker. In this thesis, we consider the problem of matching two correlated graphs when an attacker has access to side information either in the form of community labels or an imperfect initial matching. First, we propose a naive graph matching algorithm by introducing the community degree vectors which harness the information from community labels in an e cient manner. Next, we analyze the basic percolation algorithm for graphs with community structure. Finally, we propose a novel percolation algorithm with two thresholds which uses an imperfect matching as input to match correlated graphs. We also analyze these algorithms and provide theoretical guarantees for matching graphs generated using the Stochastic Block Model. We evaluate the proposed algorithms on synthetic as well as real world datasets using various experiments. The experimental results demonstrate the importance of communities as side information especially when the number of seeds is small and the networks are weakly correlated. These results motivate the study of other types of potential side information available to the attacker. Such studies could assist in devising mechanisms to counter the effects of side information in network deanonymization

    Similar works