17 research outputs found

    Comparison of Channels: Criteria for Domination by a Symmetric Channel

    Full text link
    This paper studies the basic question of whether a given channel VV can be dominated (in the precise sense of being more noisy) by a qq-ary symmetric channel. The concept of "less noisy" relation between channels originated in network information theory (broadcast channels) and is defined in terms of mutual information or Kullback-Leibler divergence. We provide an equivalent characterization in terms of χ2\chi^2-divergence. Furthermore, we develop a simple criterion for domination by a qq-ary symmetric channel in terms of the minimum entry of the stochastic matrix defining the channel VV. The criterion is strengthened for the special case of additive noise channels over finite Abelian groups. Finally, it is shown that domination by a symmetric channel implies (via comparison of Dirichlet forms) a logarithmic Sobolev inequality for the original channel.Comment: 31 pages, 2 figures. Presented at 2017 IEEE International Symposium on Information Theory (ISIT

    Broadcasting on Random Directed Acyclic Graphs

    Full text link
    We study a generalization of the well-known model of broadcasting on trees. Consider a directed acyclic graph (DAG) with a unique source vertex XX, and suppose all other vertices have indegree d2d\geq 2. Let the vertices at distance kk from XX be called layer kk. At layer 00, XX is given a random bit. At layer k1k\geq 1, each vertex receives dd bits from its parents in layer k1k-1, which are transmitted along independent binary symmetric channel edges, and combines them using a dd-ary Boolean processing function. The goal is to reconstruct XX with probability of error bounded away from 1/21/2 using the values of all vertices at an arbitrarily deep layer. This question is closely related to models of reliable computation and storage, and information flow in biological networks. In this paper, we analyze randomly constructed DAGs, for which we show that broadcasting is only possible if the noise level is below a certain degree and function dependent critical threshold. For d3d\geq 3, and random DAGs with layer sizes Ω(logk)\Omega(\log k) and majority processing functions, we identify the critical threshold. For d=2d=2, we establish a similar result for NAND processing functions. We also prove a partial converse for odd d3d\geq 3 illustrating that the identified thresholds are impossible to improve by selecting different processing functions if the decoder is restricted to using a single vertex. Finally, for any noise level, we construct explicit DAGs (using expander graphs) with bounded degree and layer sizes Θ(logk)\Theta(\log k) admitting reconstruction. In particular, we show that such DAGs can be generated in deterministic quasi-polynomial time or randomized polylogarithmic time in the depth. These results portray a doubly-exponential advantage for storing a bit in DAGs compared to trees, where d=1d=1 but layer sizes must grow exponentially with depth in order to enable broadcasting.Comment: 33 pages, double column format. arXiv admin note: text overlap with arXiv:1803.0752

    Probabilistic Clustering Using Maximal Matrix Norm Couplings

    Full text link
    In this paper, we present a local information theoretic approach to explicitly learn probabilistic clustering of a discrete random variable. Our formulation yields a convex maximization problem for which it is NP-hard to find the global optimum. In order to algorithmically solve this optimization problem, we propose two relaxations that are solved via gradient ascent and alternating maximization. Experiments on the MSR Sentence Completion Challenge, MovieLens 100K, and Reuters21578 datasets demonstrate that our approach is competitive with existing techniques and worthy of further investigation.Comment: Presented at 56th Annual Allerton Conference on Communication, Control, and Computing, 201

    Broadcasting on Two-Dimensional Regular Grids

    Full text link
    We study a specialization of the problem of broadcasting on directed acyclic graphs, namely, broadcasting on 2D regular grids. Consider a 2D regular grid with source vertex XX at layer 00 and k+1k+1 vertices at layer k1k\geq 1, which are at distance kk from XX. Every vertex of the 2D regular grid has outdegree 22, the vertices at the boundary have indegree 11, and all other vertices have indegree 22. At time 00, XX is given a random bit. At time k1k\geq 1, each vertex in layer kk receives transmitted bits from its parents in layer k1k-1, where the bits pass through binary symmetric channels with noise level δ(0,1/2)\delta\in(0,1/2). Then, each vertex combines its received bits using a common Boolean processing function to produce an output bit. The objective is to recover XX with probability of error better than 1/21/2 from all vertices at layer kk as kk \rightarrow \infty. Besides their natural interpretation in communication networks, such broadcasting processes can be construed as 1D probabilistic cellular automata (PCA) with boundary conditions that limit the number of sites at each time kk to k+1k+1. We conjecture that it is impossible to propagate information in a 2D regular grid regardless of the noise level and the choice of processing function. In this paper, we make progress towards establishing this conjecture, and prove using ideas from percolation and coding theory that recovery of XX is impossible for any δ\delta provided that all vertices use either AND or XOR processing functions. Furthermore, we propose a martingale-based approach that establishes the impossibility of recovering XX for any δ\delta when all NAND processing functions are used if certain supermartingales can be rigorously constructed. We also provide numerical evidence for the existence of these supermartingales by computing explicit examples for different values of δ\delta via linear programming.Comment: 52 pages, 2 figure

    Bounds between contraction coefficients

    Get PDF
    In this paper, we delineate how the contraction coefficient of the strong data processing inequality for KL divergence can be used to learn likelihood models. We then present an alternative formulation that forces the input KL divergence to vanish, and achieves a contraction coefficient equivalent to the squared maximal correlation using a linear algebraic solution. To analyze the performance loss in using this simple but suboptimal procedure, we bound these coefficients in the discrete and finite regime, and prove their equivalence in the Gaussian regime

    A study of local approximations in information theory

    No full text
    Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2015.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from student-submitted PDF version of thesis.Includes bibliographical references (pages 171-173).The intractability of many information theoretic problems arises from the meaningful but nonlinear definition of Kullback-Leibler (KL) divergence between two probability distributions. Local information theory addresses this issue by assuming all distributions of interest are perturbations of certain reference distributions, and then approximating KL divergence with a squared weighted Euclidean distance, thereby linearizing such problems. We show that large classes of statistical divergence measures, such as f-divergences and Bregman divergences, can be approximated in an analogous manner to local metrics which are very similar in form. We then capture the cost of making local approximations of KL divergence instead of using its global value. This is achieved by appropriately bounding the tightness of the Data Processing Inequality in the local and global scenarios. This task turns out to be equivalent to bounding the chordal slope of the hypercontractivity ribbon at infinity and the Hirschfeld-Gebelein-Renyi maximal correlation with each other. We derive such bounds for the discrete and finite, as well as the Gaussian regimes. An application of the local approximation technique is in understanding the large deviation behavior of sources and channels. We elucidate a source-channel decomposition of the large deviation characteristics of i.i.d. sources going through discrete memoryless channels. This is used to derive an additive Gaussian noise channel model for the local perturbations of probability distributions. We next shift our focus to infinite alphabet channels instead of discrete and finite channels. On this front, existing literature has demonstrated that the singular vectors of additive white Gaussian noise channels are Hermite polynomials, and the singular vectors of Poisson channels are Laguerre polynomials. We characterize the set of infinite alphabet channels whose singular value decompositions produce singular vectors that are orthogonal polynomials by providing equivalent conditions on the conditional moments. In doing so, we also unveil the elegant relationship between certain natural exponential families with quadratic variance functions, their conjugate priors, and their corresponding orthogonal polynomial singular vectors. Finally, we propose various related directions for future research in the hope that our work will beget more research concerning local approximation methods in information theory.by Anuran Makur.S.M

    Information contraction and decomposition

    No full text
    This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Thesis: Sc. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019Cataloged from student-submitted PDF version of thesis.Includes bibliographical references (pages 327-350).Information contraction is one of the most fundamental concepts in information theory as evidenced by the numerous classical converse theorems that utilize it. In this dissertation, we study several problems aimed at better understanding this notion, broadly construed, within the intertwined realms of information theory, statistics, and discrete probability theory. In information theory, the contraction of f-divergences, such as Kullback-Leibler (KL) divergence, X²-divergence, and total variation (TV) distance, through channels (or the contraction of mutual f-information along Markov chains) is quantitatively captured by the well-known data processing inequalities.These inequalities can be tightened to produce "strong" data processing inequalities (SDPIs), which are obtained by introducing appropriate channel-dependent or source-channel-dependent "contraction coefficients." We first prove various properties of contraction coefficients of source-channel pairs, and derive linear bounds on specific classes of such contraction coefficients in terms of the contraction coefficient for X²-divergence (or the Hirschfeld-Gebelein-Rényi maximal correlation). Then, we extend the notion of an SDPI for KL divergence by analyzing when a q-ary symmetric channel dominates a given channel in the "less noisy" sense. Specifically, we develop sufficient conditions for less noisy domination using ideas of degradation and majorization, and strengthen these conditions for additive noise channels over finite Abelian groups.Furthermore, we also establish equivalent characterizations of the less noisy preorder over channels using non-linear operator convex f-divergences, and illustrate the relationship between less noisy domination and important functional inequalities such as logarithmic Sobolev inequalities. Next, adopting a more statistical and machine learning perspective, we elucidate the elegant geometry of SDPIs for X²-divergence by developing modal decompositions of bivariate distributions based on singular value decompositions of conditional expectation operators. In particular, we demonstrate that maximal correlation functions meaningfully decompose the information contained in categorical bivariate data in a local information geometric sense and serve as suitable embeddings of this data into Euclidean spaces.Moreover, we propose an extension of the well-known alternating conditional expectations algorithm to estimate maximal correlation functions from training data for the purposes of feature extraction and dimensionality reduction. We then analyze the sample complexity of this algorithm using basic matrix perturbation theory and standard concentration of measure inequalities. On a related but tangential front, we also define and study the information capacity of permutation channels. Finally, we consider the discrete probability problem of broadcasting on bounded indegree directed acyclic graphs (DAGs), which corresponds to examining the contraction of TV distance in Bayesian networks whose vertices combine their noisy input signals using Boolean processing functions.This generalizes the classical problem of broadcasting on trees and Ising models, and is closely related to results on reliable computation using noisy circuits, probabilistic cellular automata, and information flow in biological networks. Specifically, we establish phase transition phenomena for random DAGs which imply (via the probabilistic method) the existence of DAGs with logarithmic layer size where broadcasting is possible. We also construct deterministic DAGs where broadcasting is possible using expander graphs in deterministic quasi-polynomial or randomized polylogarithmic time in the depth. Lastly, we show that broadcasting is impossible for certain two-dimensional regular grids using techniques from percolation theory and coding theory.by Anuran Makur.Sc. D.Sc.D. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Scienc
    corecore