25 research outputs found
Information Recovery from Pairwise Measurements
A variety of information processing tasks in practice involve recovering
objects from single-shot graph-based measurements, particularly those taken
over the edges of some measurement graph . This paper concerns the
situation where each object takes value over a group of different values,
and where one is interested to recover all these values based on observations
of certain pairwise relations over . The imperfection of
measurements presents two major challenges for information recovery: 1)
: a (dominant) portion of measurements are
corrupted; 2) : a significant fraction of pairs are
unobservable, i.e. can be highly sparse.
Under a natural random outlier model, we characterize the , that is, the critical threshold of non-corruption rate
below which exact information recovery is infeasible. This accommodates a very
general class of pairwise relations. For various homogeneous random graph
models (e.g. Erdos Renyi random graphs, random geometric graphs, small world
graphs), the minimax recovery rate depends almost exclusively on the edge
sparsity of the measurement graph irrespective of other graphical
metrics. This fundamental limit decays with the group size at a square root
rate before entering a connectivity-limited regime. Under the Erdos Renyi
random graph, a tractable combinatorial algorithm is proposed to approach the
limit for large (), while order-optimal recovery is
enabled by semidefinite programs in the small regime.
The extended (and most updated) version of this work can be found at
(http://arxiv.org/abs/1504.01369).Comment: This version is no longer updated -- please find the latest version
at (arXiv:1504.01369
Clustering from Sparse Pairwise Measurements
We consider the problem of grouping items into clusters based on few random
pairwise comparisons between the items. We introduce three closely related
algorithms for this task: a belief propagation algorithm approximating the
Bayes optimal solution, and two spectral algorithms based on the
non-backtracking and Bethe Hessian operators. For the case of two symmetric
clusters, we conjecture that these algorithms are asymptotically optimal in
that they detect the clusters as soon as it is information theoretically
possible to do so. We substantiate this claim for one of the spectral
approaches we introduce
Fundamental Limits on Data Acquisition: Trade-offs between Sample Complexity and Query Difficulty
We consider query-based data acquisition and the corresponding information
recovery problem, where the goal is to recover binary variables
(information bits) from parity measurements of those variables. The queries and
the corresponding parity measurements are designed using the encoding rule of
Fountain codes. By using Fountain codes, we can design potentially limitless
number of queries, and corresponding parity measurements, and guarantee that
the original information bits can be recovered with high probability from
any sufficiently large set of measurements of size . In the query design,
the average number of information bits that is associated with one parity
measurement is called query difficulty () and the minimum number of
measurements required to recover the information bits for a fixed
is called sample complexity (). We analyze the fundamental trade-offs
between the query difficulty and the sample complexity, and show that the
sample complexity of for some constant
is necessary and sufficient to recover information bits with high
probability as