4 research outputs found

    Significance of Side Information in the Graph Matching Problem

    Full text link
    Percolation based graph matching algorithms rely on the availability of seed vertex pairs as side information to efficiently match users across networks. Although such algorithms work well in practice, there are other types of side information available which are potentially useful to an attacker. In this paper, we consider the problem of matching two correlated graphs when an attacker has access to side information, either in the form of community labels or an imperfect initial matching. In the former case, we propose a naive graph matching algorithm by introducing the community degree vectors which harness the information from community labels in an efficient manner. Furthermore, we analyze a variant of the basic percolation algorithm proposed in literature for graphs with community structure. In the latter case, we propose a novel percolation algorithm with two thresholds which uses an imperfect matching as input to match correlated graphs. We evaluate the proposed algorithms on synthetic as well as real world datasets using various experiments. The experimental results demonstrate the importance of communities as side information especially when the number of seeds is small and the networks are weakly correlated

    Privacy of Dependent Users Against Statistical Matching

    Full text link
    Modern applications significantly enhance user experience by adapting to each user's individual condition and/or preferences. While this adaptation can greatly improve a user's experience or be essential for the application to work, the exposure of user data to the application presents a significant privacy threat to the users\textemdash even when the traces are anonymized\textemdash since the statistical matching of an anonymized trace to prior user behavior can identify a user and their habits. Because of the current and growing algorithmic and computational capabilities of adversaries, provable privacy guarantees as a function of the degree of anonymization and obfuscation of the traces are necessary. Our previous work has established the requirements on anonymization and obfuscation in the case that data traces are independent between users. However, the data traces of different users will be dependent in many applications, and an adversary can potentially exploit such. In this paper, we consider the impact of dependency between user traces on their privacy. First, we demonstrate that the adversary can readily identify the association graph of the obfuscated and anonymized version of the data, revealing which user data traces are dependent. Next, we demonstrate that the adversary can use this association graph to break user privacy with significantly shorter traces than in the case of independent users, and that obfuscating data traces independently across users is often insufficient to remedy such leakage. Finally, we discuss how users can improve privacy by employing joint obfuscation that removes or reduces the data dependency.Comment: Submitted to IEEE Transaction on Information Theor

    On Graph Matching Using Generalized Seed Side-Information

    Full text link
    In this paper, matching pairs of stocahstically generated graphs in the presence of generalized seed side-information is considered. The graph matching problem emerges naturally in various applications such as social network de-anonymization, image processing, DNA sequencing, and natural language processing. A pair of randomly generated labeled Erdos-Renyi graphs with pairwise correlated edges are considered. It is assumed that the matching strategy has access to the labeling of the vertices in the first graph, as well as a collection of shortlists -- called ambiguity sets -- of possible labels for the vertices of the second graph. The objective is to leverage the correlation among the edges of the graphs along with the side-information provided in the form of ambiguity sets to recover the labels of the vertices in the second graph. This scenario can be viewed as a generalization of the seeded graph matching problem, where the ambiguity sets take a specific form such that the exact labels for a subset of vertices in the second graph are known prior to matching. A matching strategy is proposed which operates by evaluating the joint typicality of the adjacency matrices of the graphs. Sufficient conditions on the edge statistics as well as ambiguity set statistics are derived under which the proposed matching strategy successfully recovers the labels of the vertices in the second graph. Additionally, Fano-type arguments are used to derive general necessary conditions for successful matching.Comment: arXiv admin note: text overlap with arXiv:2009.0046

    A Concentration of Measure Approach to Correlated Graph Matching

    Full text link
    The graph matching problem emerges naturally in various applications such as web privacy, image processing and computational biology. In this paper, graph matching is considered under a stochastic model, where a pair of randomly generated graphs with pairwise correlated edges are to be matched such that given the labeling of the vertices in the first graph, the labels in the second graph are recovered by leveraging the correlation among their edges. The problem is considered under various settings and graph models. In the first step, the Correlated Erd\"{o}s-R\'enyi (CER) graph model is studied, where all edge pairs whose vertices have similar labels are generated based on identical distributions and independently of other edges. A matching scheme called the \textit{typicality matching scheme} is introduced. The scheme operates by investigating the joint typicality of the adjacency matrices of the two graphs. New results on the typicality of permutations of sequences lead to necessary and sufficient conditions for successful matching based on the parameters of the CER model. In the next step, the results are extended to graphs with community structure generated based on the Stochastic Block Model (SBM). The SBM model is a generalization of the CER model where each vertex in the graph is associated with a community label, which affects its edge statistics. The results are further extended to matching of ensembles of more than two correlated graphs. Lastly, the problem of seeded graph matching is investigated where a subset of the labels in the second graph are known prior to matching. In this scenario, in addition to obtaining necessary and sufficient conditions for successful matching, a polytime matching algorithm is proposed.Comment: arXiv admin note: text overlap with arXiv:2001.06962, arXiv:1810.1334
    corecore