18 research outputs found
Comparison of Channels: Criteria for Domination by a Symmetric Channel
This paper studies the basic question of whether a given channel can be
dominated (in the precise sense of being more noisy) by a -ary symmetric
channel. The concept of "less noisy" relation between channels originated in
network information theory (broadcast channels) and is defined in terms of
mutual information or Kullback-Leibler divergence. We provide an equivalent
characterization in terms of -divergence. Furthermore, we develop a
simple criterion for domination by a -ary symmetric channel in terms of the
minimum entry of the stochastic matrix defining the channel . The criterion
is strengthened for the special case of additive noise channels over finite
Abelian groups. Finally, it is shown that domination by a symmetric channel
implies (via comparison of Dirichlet forms) a logarithmic Sobolev inequality
for the original channel.Comment: 31 pages, 2 figures. Presented at 2017 IEEE International Symposium
on Information Theory (ISIT
Broadcasting on Random Directed Acyclic Graphs
We study a generalization of the well-known model of broadcasting on trees.
Consider a directed acyclic graph (DAG) with a unique source vertex , and
suppose all other vertices have indegree . Let the vertices at
distance from be called layer . At layer , is given a random
bit. At layer , each vertex receives bits from its parents in
layer , which are transmitted along independent binary symmetric channel
edges, and combines them using a -ary Boolean processing function. The goal
is to reconstruct with probability of error bounded away from using
the values of all vertices at an arbitrarily deep layer. This question is
closely related to models of reliable computation and storage, and information
flow in biological networks.
In this paper, we analyze randomly constructed DAGs, for which we show that
broadcasting is only possible if the noise level is below a certain degree and
function dependent critical threshold. For , and random DAGs with
layer sizes and majority processing functions, we identify the
critical threshold. For , we establish a similar result for NAND
processing functions. We also prove a partial converse for odd
illustrating that the identified thresholds are impossible to improve by
selecting different processing functions if the decoder is restricted to using
a single vertex.
Finally, for any noise level, we construct explicit DAGs (using expander
graphs) with bounded degree and layer sizes admitting
reconstruction. In particular, we show that such DAGs can be generated in
deterministic quasi-polynomial time or randomized polylogarithmic time in the
depth. These results portray a doubly-exponential advantage for storing a bit
in DAGs compared to trees, where but layer sizes must grow exponentially
with depth in order to enable broadcasting.Comment: 33 pages, double column format. arXiv admin note: text overlap with
Probabilistic Clustering Using Maximal Matrix Norm Couplings
In this paper, we present a local information theoretic approach to
explicitly learn probabilistic clustering of a discrete random variable. Our
formulation yields a convex maximization problem for which it is NP-hard to
find the global optimum. In order to algorithmically solve this optimization
problem, we propose two relaxations that are solved via gradient ascent and
alternating maximization. Experiments on the MSR Sentence Completion Challenge,
MovieLens 100K, and Reuters21578 datasets demonstrate that our approach is
competitive with existing techniques and worthy of further investigation.Comment: Presented at 56th Annual Allerton Conference on Communication,
Control, and Computing, 201
Doeblin Coefficients and Related Measures
Doeblin coefficients are a classical tool for analyzing the ergodicity and
exponential convergence rates of Markov chains. Propelled by recent works on
contraction coefficients of strong data processing inequalities, we investigate
whether Doeblin coefficients also exhibit some of the notable properties of
canonical contraction coefficients. In this paper, we present several new
structural and geometric properties of Doeblin coefficients. Specifically, we
show that Doeblin coefficients form a multi-way divergence, exhibit
tensorization, and possess an extremal trace characterization. We then show
that they also have extremal coupling and simultaneously maximal coupling
characterizations. By leveraging these characterizations, we demonstrate that
Doeblin coefficients act as a nice generalization of the well-known total
variation (TV) distance to a multi-way divergence, enabling us to measure the
"distance" between multiple distributions rather than just two. We then prove
that Doeblin coefficients exhibit contraction properties over Bayesian networks
similar to other canonical contraction coefficients. We additionally derive
some other results and discuss an application of Doeblin coefficients to
distribution fusion. Finally, in a complementary vein, we introduce and discuss
three new quantities: max-Doeblin coefficient, max-DeGroot distance, and
min-DeGroot distance. The max-Doeblin coefficient shares a connection with the
concept of maximal leakage in information security; we explore its properties
and provide a coupling characterization. On the other hand, the max-DeGroot and
min-DeGroot measures extend the concept of DeGroot distance to multiple
distributions.Comment: 26 pages, 1 figur
Broadcasting on Two-Dimensional Regular Grids
We study a specialization of the problem of broadcasting on directed acyclic
graphs, namely, broadcasting on 2D regular grids. Consider a 2D regular grid
with source vertex at layer and vertices at layer ,
which are at distance from . Every vertex of the 2D regular grid has
outdegree , the vertices at the boundary have indegree , and all other
vertices have indegree . At time , is given a random bit. At time
, each vertex in layer receives transmitted bits from its parents
in layer , where the bits pass through binary symmetric channels with
noise level . Then, each vertex combines its received bits
using a common Boolean processing function to produce an output bit. The
objective is to recover with probability of error better than from
all vertices at layer as . Besides their natural
interpretation in communication networks, such broadcasting processes can be
construed as 1D probabilistic cellular automata (PCA) with boundary conditions
that limit the number of sites at each time to . We conjecture that it
is impossible to propagate information in a 2D regular grid regardless of the
noise level and the choice of processing function. In this paper, we make
progress towards establishing this conjecture, and prove using ideas from
percolation and coding theory that recovery of is impossible for any
provided that all vertices use either AND or XOR processing functions.
Furthermore, we propose a martingale-based approach that establishes the
impossibility of recovering for any when all NAND processing
functions are used if certain supermartingales can be rigorously constructed.
We also provide numerical evidence for the existence of these supermartingales
by computing explicit examples for different values of via linear
programming.Comment: 52 pages, 2 figure
Bounds between contraction coefficients
In this paper, we delineate how the contraction coefficient of the strong data processing inequality for KL divergence can be used to learn likelihood models. We then present an alternative formulation that forces the input KL divergence to vanish, and achieves a contraction coefficient equivalent to the squared maximal correlation using a linear algebraic solution. To analyze the performance loss in using this simple but suboptimal procedure, we bound these coefficients in the discrete and finite regime, and prove their equivalence in the Gaussian regime
A study of local approximations in information theory
Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2015.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from student-submitted PDF version of thesis.Includes bibliographical references (pages 171-173).The intractability of many information theoretic problems arises from the meaningful but nonlinear definition of Kullback-Leibler (KL) divergence between two probability distributions. Local information theory addresses this issue by assuming all distributions of interest are perturbations of certain reference distributions, and then approximating KL divergence with a squared weighted Euclidean distance, thereby linearizing such problems. We show that large classes of statistical divergence measures, such as f-divergences and Bregman divergences, can be approximated in an analogous manner to local metrics which are very similar in form. We then capture the cost of making local approximations of KL divergence instead of using its global value. This is achieved by appropriately bounding the tightness of the Data Processing Inequality in the local and global scenarios. This task turns out to be equivalent to bounding the chordal slope of the hypercontractivity ribbon at infinity and the Hirschfeld-Gebelein-Renyi maximal correlation with each other. We derive such bounds for the discrete and finite, as well as the Gaussian regimes. An application of the local approximation technique is in understanding the large deviation behavior of sources and channels. We elucidate a source-channel decomposition of the large deviation characteristics of i.i.d. sources going through discrete memoryless channels. This is used to derive an additive Gaussian noise channel model for the local perturbations of probability distributions. We next shift our focus to infinite alphabet channels instead of discrete and finite channels. On this front, existing literature has demonstrated that the singular vectors of additive white Gaussian noise channels are Hermite polynomials, and the singular vectors of Poisson channels are Laguerre polynomials. We characterize the set of infinite alphabet channels whose singular value decompositions produce singular vectors that are orthogonal polynomials by providing equivalent conditions on the conditional moments. In doing so, we also unveil the elegant relationship between certain natural exponential families with quadratic variance functions, their conjugate priors, and their corresponding orthogonal polynomial singular vectors. Finally, we propose various related directions for future research in the hope that our work will beget more research concerning local approximation methods in information theory.by Anuran Makur.S.M