25 research outputs found

    Information Recovery from Pairwise Measurements

    Full text link
    A variety of information processing tasks in practice involve recovering nn objects from single-shot graph-based measurements, particularly those taken over the edges of some measurement graph G\mathcal{G}. This paper concerns the situation where each object takes value over a group of MM different values, and where one is interested to recover all these values based on observations of certain pairwise relations over G\mathcal{G}. The imperfection of measurements presents two major challenges for information recovery: 1) inaccuracy\textit{inaccuracy}: a (dominant) portion 1p1-p of measurements are corrupted; 2) incompleteness\textit{incompleteness}: a significant fraction of pairs are unobservable, i.e. G\mathcal{G} can be highly sparse. Under a natural random outlier model, we characterize the minimax recovery rate\textit{minimax recovery rate}, that is, the critical threshold of non-corruption rate pp below which exact information recovery is infeasible. This accommodates a very general class of pairwise relations. For various homogeneous random graph models (e.g. Erdos Renyi random graphs, random geometric graphs, small world graphs), the minimax recovery rate depends almost exclusively on the edge sparsity of the measurement graph G\mathcal{G} irrespective of other graphical metrics. This fundamental limit decays with the group size MM at a square root rate before entering a connectivity-limited regime. Under the Erdos Renyi random graph, a tractable combinatorial algorithm is proposed to approach the limit for large MM (M=nΩ(1)M=n^{\Omega(1)}), while order-optimal recovery is enabled by semidefinite programs in the small MM regime. The extended (and most updated) version of this work can be found at (http://arxiv.org/abs/1504.01369).Comment: This version is no longer updated -- please find the latest version at (arXiv:1504.01369

    Clustering from Sparse Pairwise Measurements

    Get PDF
    We consider the problem of grouping items into clusters based on few random pairwise comparisons between the items. We introduce three closely related algorithms for this task: a belief propagation algorithm approximating the Bayes optimal solution, and two spectral algorithms based on the non-backtracking and Bethe Hessian operators. For the case of two symmetric clusters, we conjecture that these algorithms are asymptotically optimal in that they detect the clusters as soon as it is information theoretically possible to do so. We substantiate this claim for one of the spectral approaches we introduce

    Fundamental Limits on Data Acquisition: Trade-offs between Sample Complexity and Query Difficulty

    Full text link
    We consider query-based data acquisition and the corresponding information recovery problem, where the goal is to recover kk binary variables (information bits) from parity measurements of those variables. The queries and the corresponding parity measurements are designed using the encoding rule of Fountain codes. By using Fountain codes, we can design potentially limitless number of queries, and corresponding parity measurements, and guarantee that the original kk information bits can be recovered with high probability from any sufficiently large set of measurements of size nn. In the query design, the average number of information bits that is associated with one parity measurement is called query difficulty (dˉ\bar{d}) and the minimum number of measurements required to recover the kk information bits for a fixed dˉ\bar{d} is called sample complexity (nn). We analyze the fundamental trade-offs between the query difficulty and the sample complexity, and show that the sample complexity of n=cmax{k,(klogk)/dˉ}n=c\max\{k,(k\log k)/\bar{d}\} for some constant c>0c>0 is necessary and sufficient to recover kk information bits with high probability as kk\to\infty
    corecore