4,755 research outputs found
Growing a Graph Matching from a Handful of Seeds
In many graph–mining problems, two networks from different domains have to be matched. In the absence of reliable node attributes, graph matching has to rely on only the link structures of the two networks, which amounts to a generalization of the classic graph isomorphism problem. Graph matching has applications in social–network reconciliation and de-anonymization, protein–network alignment in biology, and computer vision. The most scalable graph–matching approaches use ideas from percolation theory, where a matched node pair “infects” neighbouring pairs as additional potential matches. This class of matching algorithm requires an initial seed set of known matches to start the percolation. The size and correctness of the matching is very sensitive to the size of the seed set. In this paper, we give a new graph–matching algorithm that can operate with a much smaller seed set than previous approaches, with only a small increase in matching errors. We characterize a phase transition in matching performance as a function of the seed set size, using a random bigraph model and ideas from bootstrap percolation theory. We also show the excellent performance in matching several real large-scale social networks, using only a handful of seeds
Seeded Graph Matching: Efficient Algorithms and Theoretical Guarantees
In this paper, a new information theoretic framework for graph matching is
introduced. Using this framework, the graph isomorphism and seeded graph
matching problems are studied. The maximum degree algorithm for graph
isomorphism is analyzed and sufficient conditions for successful matching are
rederived using type analysis. Furthermore, a new seeded matching algorithm
with polynomial time complexity is introduced. The algorithm uses `typicality
matching' and techniques from point-to-point communications for reliable
matching. Assuming an Erdos-Renyi model on the correlated graph pair, it is
shown that successful matching is guaranteed when the number of seeds grows
logarithmically with the number of vertices in the graphs. The logarithmic
coefficient is shown to be inversely proportional to the mutual information
between the edge variables in the two graphs
An Automated Social Graph De-anonymization Technique
We present a generic and automated approach to re-identifying nodes in
anonymized social networks which enables novel anonymization techniques to be
quickly evaluated. It uses machine learning (decision forests) to matching
pairs of nodes in disparate anonymized sub-graphs. The technique uncovers
artefacts and invariants of any black-box anonymization scheme from a small set
of examples. Despite a high degree of automation, classification succeeds with
significant true positive rates even when small false positive rates are
sought. Our evaluation uses publicly available real world datasets to study the
performance of our approach against real-world anonymization strategies, namely
the schemes used to protect datasets of The Data for Development (D4D)
Challenge. We show that the technique is effective even when only small numbers
of samples are used for training. Further, since it detects weaknesses in the
black-box anonymization scheme it can re-identify nodes in one social network
when trained on another.Comment: 12 page
- …