9 research outputs found
Detecting correlated Gaussian databases
CCF-1955981 - National Science Foundationhttps://arxiv.org/abs/2206.12011First author draf
The Umeyama algorithm for matching correlated Gaussian geometric models in the low-dimensional regime
Motivated by the problem of matching two correlated random geometric graphs,
we study the problem of matching two Gaussian geometric models correlated
through a latent node permutation. Specifically, given an unknown permutation
on and given i.i.d. pairs of correlated Gaussian
vectors in with noise parameter ,
we consider two types of (correlated) weighted complete graphs with edge
weights given by , . The goal is to recover the hidden vertex correspondence based
on the observed matrices and . For the low-dimensional regime where
, Wang, Wu, Xu, and Yolou [WWXY22+] established the information
thresholds for exact and almost exact recovery in matching correlated Gaussian
geometric models. They also conducted numerical experiments for the classical
Umeyama algorithm. In our work, we prove that this algorithm achieves exact
recovery of when the noise parameter , and
almost exact recovery when . Our results approach the
information thresholds up to a factor in the
low-dimensional regime.Comment: 31 page
Joint Correlation Detection and Alignment of Gaussian Databases
In this work, we propose an efficient two-stage algorithm solving a joint
problem of correlation detection and permutation recovery between two Gaussian
databases. Correlation detection is an hypothesis testing problem; under the
null hypothesis, the databases are independent, and under the alternate
hypothesis, they are correlated, under an unknown row permutation. We develop
relatively tight bounds on the type-I and type-II error probabilities, and show
that the analyzed detector performs better than a recently proposed detector,
at least for some specific parameter choices. Since the proposed detector
relies on a statistic, which is a sum of dependent indicator random variables,
then in order to bound the type-I probability of error, we develop a novel
graph-theoretic technique for bounding the -th order moments of such
statistics. When the databases are accepted as correlated, the algorithm also
outputs an estimation for the underlying row permutation. By comparing to known
converse results for this problem, we prove that the alignment error
probability converges to zero under the asymptotically lowest possible
correlation coefficient.Comment: 41 pages, 7 figure
Database Matching Under Noisy Synchronization Errors
The re-identification or de-anonymization of users from anonymized data
through matching with publicly available correlated user data has raised
privacy concerns, leading to the complementary measure of obfuscation in
addition to anonymization. Recent research provides a fundamental understanding
of the conditions under which privacy attacks, in the form of database
matching, are successful in the presence of obfuscation. Motivated by
synchronization errors stemming from the sampling of time-indexed databases,
this paper presents a unified framework considering both obfuscation and
synchronization errors and investigates the matching of databases under noisy
entry repetitions. By investigating different structures for the repetition
pattern, replica detection and seeded deletion detection algorithms are devised
and sufficient and necessary conditions for successful matching are derived.
Finally, the impacts of some variations of the underlying assumptions, such as
the adversarial deletion model, seedless database matching, and zero-rate
regime, on the results are discussed. Overall, our results provide insights
into the privacy-preserving publication of anonymized and obfuscated
time-indexed data as well as the closely related problem of the capacity of
synchronization channels
Database Alignment with Gaussian Features
We consider the problem of aligning a pair of databases with jointly Gaussian features. We consider two algorithms, complete database alignment via MAP estimation among all possible database alignments, and partial alignment via a thresholding approach of log likelihood ratios. We derive conditions on mutual information between feature pairs, identifying the regimes where the algorithms are guaranteed to perform reliably and those where they cannot be expected to succeed