4 research outputs found
Significance of Side Information in the Graph Matching Problem
Percolation based graph matching algorithms rely on the availability of seed
vertex pairs as side information to efficiently match users across networks.
Although such algorithms work well in practice, there are other types of side
information available which are potentially useful to an attacker. In this
paper, we consider the problem of matching two correlated graphs when an
attacker has access to side information, either in the form of community labels
or an imperfect initial matching. In the former case, we propose a naive graph
matching algorithm by introducing the community degree vectors which harness
the information from community labels in an efficient manner. Furthermore, we
analyze a variant of the basic percolation algorithm proposed in literature for
graphs with community structure. In the latter case, we propose a novel
percolation algorithm with two thresholds which uses an imperfect matching as
input to match correlated graphs.
We evaluate the proposed algorithms on synthetic as well as real world
datasets using various experiments. The experimental results demonstrate the
importance of communities as side information especially when the number of
seeds is small and the networks are weakly correlated
Privacy of Dependent Users Against Statistical Matching
Modern applications significantly enhance user experience by adapting to each
user's individual condition and/or preferences. While this adaptation can
greatly improve a user's experience or be essential for the application to
work, the exposure of user data to the application presents a significant
privacy threat to the users\textemdash even when the traces are
anonymized\textemdash since the statistical matching of an anonymized trace to
prior user behavior can identify a user and their habits. Because of the
current and growing algorithmic and computational capabilities of adversaries,
provable privacy guarantees as a function of the degree of anonymization and
obfuscation of the traces are necessary. Our previous work has established the
requirements on anonymization and obfuscation in the case that data traces are
independent between users. However, the data traces of different users will be
dependent in many applications, and an adversary can potentially exploit such.
In this paper, we consider the impact of dependency between user traces on
their privacy. First, we demonstrate that the adversary can readily identify
the association graph of the obfuscated and anonymized version of the data,
revealing which user data traces are dependent. Next, we demonstrate that the
adversary can use this association graph to break user privacy with
significantly shorter traces than in the case of independent users, and that
obfuscating data traces independently across users is often insufficient to
remedy such leakage. Finally, we discuss how users can improve privacy by
employing joint obfuscation that removes or reduces the data dependency.Comment: Submitted to IEEE Transaction on Information Theor
On Graph Matching Using Generalized Seed Side-Information
In this paper, matching pairs of stocahstically generated graphs in the
presence of generalized seed side-information is considered. The graph matching
problem emerges naturally in various applications such as social network
de-anonymization, image processing, DNA sequencing, and natural language
processing. A pair of randomly generated labeled Erdos-Renyi graphs with
pairwise correlated edges are considered. It is assumed that the matching
strategy has access to the labeling of the vertices in the first graph, as well
as a collection of shortlists -- called ambiguity sets -- of possible labels
for the vertices of the second graph. The objective is to leverage the
correlation among the edges of the graphs along with the side-information
provided in the form of ambiguity sets to recover the labels of the vertices in
the second graph. This scenario can be viewed as a generalization of the seeded
graph matching problem, where the ambiguity sets take a specific form such that
the exact labels for a subset of vertices in the second graph are known prior
to matching. A matching strategy is proposed which operates by evaluating the
joint typicality of the adjacency matrices of the graphs. Sufficient conditions
on the edge statistics as well as ambiguity set statistics are derived under
which the proposed matching strategy successfully recovers the labels of the
vertices in the second graph. Additionally, Fano-type arguments are used to
derive general necessary conditions for successful matching.Comment: arXiv admin note: text overlap with arXiv:2009.0046
A Concentration of Measure Approach to Correlated Graph Matching
The graph matching problem emerges naturally in various applications such as
web privacy, image processing and computational biology. In this paper, graph
matching is considered under a stochastic model, where a pair of randomly
generated graphs with pairwise correlated edges are to be matched such that
given the labeling of the vertices in the first graph, the labels in the second
graph are recovered by leveraging the correlation among their edges. The
problem is considered under various settings and graph models. In the first
step, the Correlated Erd\"{o}s-R\'enyi (CER) graph model is studied, where all
edge pairs whose vertices have similar labels are generated based on identical
distributions and independently of other edges. A matching scheme called the
\textit{typicality matching scheme} is introduced. The scheme operates by
investigating the joint typicality of the adjacency matrices of the two graphs.
New results on the typicality of permutations of sequences lead to necessary
and sufficient conditions for successful matching based on the parameters of
the CER model. In the next step, the results are extended to graphs with
community structure generated based on the Stochastic Block Model (SBM). The
SBM model is a generalization of the CER model where each vertex in the graph
is associated with a community label, which affects its edge statistics. The
results are further extended to matching of ensembles of more than two
correlated graphs. Lastly, the problem of seeded graph matching is investigated
where a subset of the labels in the second graph are known prior to matching.
In this scenario, in addition to obtaining necessary and sufficient conditions
for successful matching, a polytime matching algorithm is proposed.Comment: arXiv admin note: text overlap with arXiv:2001.06962,
arXiv:1810.1334