18 research outputs found

    Comparison of Channels: Criteria for Domination by a Symmetric Channel

    Full text link
    This paper studies the basic question of whether a given channel VV can be dominated (in the precise sense of being more noisy) by a qq-ary symmetric channel. The concept of "less noisy" relation between channels originated in network information theory (broadcast channels) and is defined in terms of mutual information or Kullback-Leibler divergence. We provide an equivalent characterization in terms of Ο‡2\chi^2-divergence. Furthermore, we develop a simple criterion for domination by a qq-ary symmetric channel in terms of the minimum entry of the stochastic matrix defining the channel VV. The criterion is strengthened for the special case of additive noise channels over finite Abelian groups. Finally, it is shown that domination by a symmetric channel implies (via comparison of Dirichlet forms) a logarithmic Sobolev inequality for the original channel.Comment: 31 pages, 2 figures. Presented at 2017 IEEE International Symposium on Information Theory (ISIT

    Broadcasting on Random Directed Acyclic Graphs

    Full text link
    We study a generalization of the well-known model of broadcasting on trees. Consider a directed acyclic graph (DAG) with a unique source vertex XX, and suppose all other vertices have indegree dβ‰₯2d\geq 2. Let the vertices at distance kk from XX be called layer kk. At layer 00, XX is given a random bit. At layer kβ‰₯1k\geq 1, each vertex receives dd bits from its parents in layer kβˆ’1k-1, which are transmitted along independent binary symmetric channel edges, and combines them using a dd-ary Boolean processing function. The goal is to reconstruct XX with probability of error bounded away from 1/21/2 using the values of all vertices at an arbitrarily deep layer. This question is closely related to models of reliable computation and storage, and information flow in biological networks. In this paper, we analyze randomly constructed DAGs, for which we show that broadcasting is only possible if the noise level is below a certain degree and function dependent critical threshold. For dβ‰₯3d\geq 3, and random DAGs with layer sizes Ξ©(log⁑k)\Omega(\log k) and majority processing functions, we identify the critical threshold. For d=2d=2, we establish a similar result for NAND processing functions. We also prove a partial converse for odd dβ‰₯3d\geq 3 illustrating that the identified thresholds are impossible to improve by selecting different processing functions if the decoder is restricted to using a single vertex. Finally, for any noise level, we construct explicit DAGs (using expander graphs) with bounded degree and layer sizes Θ(log⁑k)\Theta(\log k) admitting reconstruction. In particular, we show that such DAGs can be generated in deterministic quasi-polynomial time or randomized polylogarithmic time in the depth. These results portray a doubly-exponential advantage for storing a bit in DAGs compared to trees, where d=1d=1 but layer sizes must grow exponentially with depth in order to enable broadcasting.Comment: 33 pages, double column format. arXiv admin note: text overlap with arXiv:1803.0752

    Probabilistic Clustering Using Maximal Matrix Norm Couplings

    Full text link
    In this paper, we present a local information theoretic approach to explicitly learn probabilistic clustering of a discrete random variable. Our formulation yields a convex maximization problem for which it is NP-hard to find the global optimum. In order to algorithmically solve this optimization problem, we propose two relaxations that are solved via gradient ascent and alternating maximization. Experiments on the MSR Sentence Completion Challenge, MovieLens 100K, and Reuters21578 datasets demonstrate that our approach is competitive with existing techniques and worthy of further investigation.Comment: Presented at 56th Annual Allerton Conference on Communication, Control, and Computing, 201

    Doeblin Coefficients and Related Measures

    Full text link
    Doeblin coefficients are a classical tool for analyzing the ergodicity and exponential convergence rates of Markov chains. Propelled by recent works on contraction coefficients of strong data processing inequalities, we investigate whether Doeblin coefficients also exhibit some of the notable properties of canonical contraction coefficients. In this paper, we present several new structural and geometric properties of Doeblin coefficients. Specifically, we show that Doeblin coefficients form a multi-way divergence, exhibit tensorization, and possess an extremal trace characterization. We then show that they also have extremal coupling and simultaneously maximal coupling characterizations. By leveraging these characterizations, we demonstrate that Doeblin coefficients act as a nice generalization of the well-known total variation (TV) distance to a multi-way divergence, enabling us to measure the "distance" between multiple distributions rather than just two. We then prove that Doeblin coefficients exhibit contraction properties over Bayesian networks similar to other canonical contraction coefficients. We additionally derive some other results and discuss an application of Doeblin coefficients to distribution fusion. Finally, in a complementary vein, we introduce and discuss three new quantities: max-Doeblin coefficient, max-DeGroot distance, and min-DeGroot distance. The max-Doeblin coefficient shares a connection with the concept of maximal leakage in information security; we explore its properties and provide a coupling characterization. On the other hand, the max-DeGroot and min-DeGroot measures extend the concept of DeGroot distance to multiple distributions.Comment: 26 pages, 1 figur

    Broadcasting on Two-Dimensional Regular Grids

    Full text link
    We study a specialization of the problem of broadcasting on directed acyclic graphs, namely, broadcasting on 2D regular grids. Consider a 2D regular grid with source vertex XX at layer 00 and k+1k+1 vertices at layer kβ‰₯1k\geq 1, which are at distance kk from XX. Every vertex of the 2D regular grid has outdegree 22, the vertices at the boundary have indegree 11, and all other vertices have indegree 22. At time 00, XX is given a random bit. At time kβ‰₯1k\geq 1, each vertex in layer kk receives transmitted bits from its parents in layer kβˆ’1k-1, where the bits pass through binary symmetric channels with noise level δ∈(0,1/2)\delta\in(0,1/2). Then, each vertex combines its received bits using a common Boolean processing function to produce an output bit. The objective is to recover XX with probability of error better than 1/21/2 from all vertices at layer kk as kβ†’βˆžk \rightarrow \infty. Besides their natural interpretation in communication networks, such broadcasting processes can be construed as 1D probabilistic cellular automata (PCA) with boundary conditions that limit the number of sites at each time kk to k+1k+1. We conjecture that it is impossible to propagate information in a 2D regular grid regardless of the noise level and the choice of processing function. In this paper, we make progress towards establishing this conjecture, and prove using ideas from percolation and coding theory that recovery of XX is impossible for any Ξ΄\delta provided that all vertices use either AND or XOR processing functions. Furthermore, we propose a martingale-based approach that establishes the impossibility of recovering XX for any Ξ΄\delta when all NAND processing functions are used if certain supermartingales can be rigorously constructed. We also provide numerical evidence for the existence of these supermartingales by computing explicit examples for different values of Ξ΄\delta via linear programming.Comment: 52 pages, 2 figure

    Bounds between contraction coefficients

    Get PDF
    In this paper, we delineate how the contraction coefficient of the strong data processing inequality for KL divergence can be used to learn likelihood models. We then present an alternative formulation that forces the input KL divergence to vanish, and achieves a contraction coefficient equivalent to the squared maximal correlation using a linear algebraic solution. To analyze the performance loss in using this simple but suboptimal procedure, we bound these coefficients in the discrete and finite regime, and prove their equivalence in the Gaussian regime

    A study of local approximations in information theory

    No full text
    Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2015.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from student-submitted PDF version of thesis.Includes bibliographical references (pages 171-173).The intractability of many information theoretic problems arises from the meaningful but nonlinear definition of Kullback-Leibler (KL) divergence between two probability distributions. Local information theory addresses this issue by assuming all distributions of interest are perturbations of certain reference distributions, and then approximating KL divergence with a squared weighted Euclidean distance, thereby linearizing such problems. We show that large classes of statistical divergence measures, such as f-divergences and Bregman divergences, can be approximated in an analogous manner to local metrics which are very similar in form. We then capture the cost of making local approximations of KL divergence instead of using its global value. This is achieved by appropriately bounding the tightness of the Data Processing Inequality in the local and global scenarios. This task turns out to be equivalent to bounding the chordal slope of the hypercontractivity ribbon at infinity and the Hirschfeld-Gebelein-Renyi maximal correlation with each other. We derive such bounds for the discrete and finite, as well as the Gaussian regimes. An application of the local approximation technique is in understanding the large deviation behavior of sources and channels. We elucidate a source-channel decomposition of the large deviation characteristics of i.i.d. sources going through discrete memoryless channels. This is used to derive an additive Gaussian noise channel model for the local perturbations of probability distributions. We next shift our focus to infinite alphabet channels instead of discrete and finite channels. On this front, existing literature has demonstrated that the singular vectors of additive white Gaussian noise channels are Hermite polynomials, and the singular vectors of Poisson channels are Laguerre polynomials. We characterize the set of infinite alphabet channels whose singular value decompositions produce singular vectors that are orthogonal polynomials by providing equivalent conditions on the conditional moments. In doing so, we also unveil the elegant relationship between certain natural exponential families with quadratic variance functions, their conjugate priors, and their corresponding orthogonal polynomial singular vectors. Finally, we propose various related directions for future research in the hope that our work will beget more research concerning local approximation methods in information theory.by Anuran Makur.S.M
    corecore