108 research outputs found
Community detection and stochastic block models: recent developments
The stochastic block model (SBM) is a random graph model with planted
clusters. It is widely employed as a canonical model to study clustering and
community detection, and provides generally a fertile ground to study the
statistical and computational tradeoffs that arise in network and data
sciences.
This note surveys the recent developments that establish the fundamental
limits for community detection in the SBM, both with respect to
information-theoretic and computational thresholds, and for various recovery
requirements such as exact, partial and weak recovery (a.k.a., detection). The
main results discussed are the phase transitions for exact recovery at the
Chernoff-Hellinger threshold, the phase transition for weak recovery at the
Kesten-Stigum threshold, the optimal distortion-SNR tradeoff for partial
recovery, the learning of the SBM parameters and the gap between
information-theoretic and computational thresholds.
The note also covers some of the algorithms developed in the quest of
achieving the limits, in particular two-round algorithms via graph-splitting,
semi-definite programming, linearized belief propagation, classical and
nonbacktracking spectral methods. A few open problems are also discussed
Communication-Computation Efficient Gradient Coding
This paper develops coding techniques to reduce the running time of
distributed learning tasks. It characterizes the fundamental tradeoff to
compute gradients (and more generally vector summations) in terms of three
parameters: computation load, straggler tolerance and communication cost. It
further gives an explicit coding scheme that achieves the optimal tradeoff
based on recursive polynomial constructions, coding both across data subsets
and vector components. As a result, the proposed scheme allows to minimize the
running time for gradient computations. Implementations are made on Amazon EC2
clusters using Python with mpi4py package. Results show that the proposed
scheme maintains the same generalization error while reducing the running time
by compared to uncoded schemes and compared to prior coded
schemes focusing only on stragglers (Tandon et al., ICML 2017)
Polarization of the Renyi Information Dimension with Applications to Compressed Sensing
In this paper, we show that the Hadamard matrix acts as an extractor over the
reals of the Renyi information dimension (RID), in an analogous way to how it
acts as an extractor of the discrete entropy over finite fields. More
precisely, we prove that the RID of an i.i.d. sequence of mixture random
variables polarizes to the extremal values of 0 and 1 (corresponding to
discrete and continuous distributions) when transformed by a Hadamard matrix.
Further, we prove that the polarization pattern of the RID admits a closed form
expression and follows exactly the Binary Erasure Channel (BEC) polarization
pattern in the discrete setting. We also extend the results from the single- to
the multi-terminal setting, obtaining a Slepian-Wolf counterpart of the RID
polarization. We discuss applications of the RID polarization to Compressed
Sensing of i.i.d. sources. In particular, we use the RID polarization to
construct a family of deterministic -valued sensing matrices for
Compressed Sensing. We run numerical simulations to compare the performance of
the resulting matrices with that of random Gaussian and random Hadamard
matrices. The results indicate that the proposed matrices afford competitive
performances while being explicitly constructed.Comment: 12 pages, 2 figure
Polynomial complexity of polar codes for non-binary alphabets, key agreement and Slepian-Wolf coding
We consider polar codes for memoryless sources with side information and show
that the blocklength, construction, encoding and decoding complexities are
bounded by a polynomial of the reciprocal of the gap between the compression
rate and the conditional entropy. This extends the recent results of Guruswami
and Xia to a slightly more general setting, which in turn can be applied to (1)
sources with non-binary alphabets, (2) key generation for discrete and Gaussian
sources, and (3) Slepian-Wolf coding and multiple accessing. In each of these
cases, the complexity scaling with respect to the number of users is also
controlled. In particular, we construct coding schemes for these multi-user
information theory problems which achieve optimal rates with an overall
polynomial complexity.Comment: 6 pages; presented at CISS 201
High-Girth Matrices and Polarization
The girth of a matrix is the least number of linearly dependent columns, in
contrast to the rank which is the largest number of linearly independent
columns. This paper considers the construction of {\it high-girth} matrices,
whose probabilistic girth is close to its rank. Random matrices can be used to
show the existence of high-girth matrices with constant relative rank, but the
construction is non-explicit. This paper uses a polar-like construction to
obtain a deterministic and efficient construction of high-girth matrices for
arbitrary fields and relative ranks. Applications to coding and sparse recovery
are discussed
- …