14 research outputs found

    A Perspective on Unique Information: Directionality, Intuitions, and Secret Key Agreement

    Get PDF
    Recently, the partial information decomposition emerged as a promising framework for identifying the meaningful components of the information contained in a joint distribution. Its adoption and practical application, however, have been stymied by the lack of a generally-accepted method of quantifying its components. Here, we briefly discuss the bivariate (two-source) partial information decomposition and two implicitly directional interpretations used to intuitively motivate alternative component definitions. Drawing parallels with secret key agreement rates from information-theoretic cryptography, we demonstrate that these intuitions are mutually incompatible and suggest that this underlies the persistence of competing definitions and interpretations. Having highlighted this hitherto unacknowledged issue, we outline several possible solutions.Comment: 5 pages, 3 tables; http://csc.ucdavis.edu/~cmg/compmech/pubs/pid_intuition.ht

    Unique Information and Secret Key Agreement

    Get PDF
    The partial information decomposition (PID) is a promising framework for decomposing a joint random variable into the amount of influence each source variable Xi has on a target variable Y, relative to the other sources. For two sources, influence breaks down into the information that both X0 and X1 redundantly share with Y, what X0 uniquely shares with Y, what X1 uniquely shares with Y, and finally what X0 and X1 synergistically share with Y. Unfortunately, considerable disagreement has arisen as to how these four components should be quantified. Drawing from cryptography, we consider the secret key agreement rate as an operational method of quantifying unique informations. Secret key agreement rate comes in several forms, depending upon which parties are permitted to communicate. We demonstrate that three of these four forms are inconsistent with the PID. The remaining form implies certain interpretations as to the PID's meaning---interpretations not present in PID's definition but that, we argue, need to be explicit. These reveal an inconsistency between third-order connected information, two-way secret key agreement rate, and synergy. Similar difficulties arise with a popular PID measure in light the results here as well as from a maximum entropy viewpoint. We close by reviewing the challenges facing the PID.Comment: 9 pages, 3 figures, 4 tables; http://csc.ucdavis.edu/~cmg/compmech/pubs/pid_skar.htm. arXiv admin note: text overlap with arXiv:1808.0860

    Unique Information via Dependency Constraints

    Full text link
    The partial information decomposition (PID) is perhaps the leading proposal for resolving information shared between a set of sources and a target into redundant, synergistic, and unique constituents. Unfortunately, the PID framework has been hindered by a lack of a generally agreed-upon, multivariate method of quantifying the constituents. Here, we take a step toward rectifying this by developing a decomposition based on a new method that quantifies unique information. We first develop a broadly applicable method---the dependency decomposition---that delineates how statistical dependencies influence the structure of a joint distribution. The dependency decomposition then allows us to define a measure of the information about a target that can be uniquely attributed to a particular source as the least amount which the source-target statistical dependency can influence the information shared between the sources and the target. The result is the first measure that satisfies the core axioms of the PID framework while not satisfying the Blackwell relation, which depends on a particular interpretation of how the variables are related. This makes a key step forward to a practical PID.Comment: 15 pages, 7 figures, 2 tables, 3 appendices; http://csc.ucdavis.edu/~cmg/compmech/pubs/idep.ht

    Understanding Individual Neuron Importance Using Information Theory

    Full text link
    In this work, we investigate the use of three information-theoretic quantities -- entropy, mutual information with the class variable, and a class selectivity measure based on Kullback-Leibler divergence -- to understand and study the behavior of already trained fully-connected feed-forward neural networks. We analyze the connection between these information-theoretic quantities and classification performance on the test set by cumulatively ablating neurons in networks trained on MNIST, FashionMNIST, and CIFAR-10. Our results parallel those recently published by Morcos et al., indicating that class selectivity is not a good indicator for classification performance. However, looking at individual layers separately, both mutual information and class selectivity are positively correlated with classification performance, at least for networks with ReLU activation functions. We provide explanations for this phenomenon and conclude that it is ill-advised to compare the proposed information-theoretic quantities across layers. Finally, we briefly discuss future prospects of employing information-theoretic quantities for different purposes, including neuron pruning and studying the effect that different regularizers and architectures have on the trained neural network. We also draw connections to the information bottleneck theory of neural networks.Comment: 30 page

    Introducing a differentiable measure of pointwise shared information

    Full text link
    Partial information decomposition (PID) of the multivariate mutual information describes the distinct ways in which a set of source variables contains information about a target variable. The groundbreaking work of Williams and Beer has shown that this decomposition cannot be determined from classic information theory without making additional assumptions, and several candidate measures have been proposed, often drawing on principles from related fields such as decision theory. None of these measures is differentiable with respect to the underlying probability mass function. We here present a novel measure that satisfies this property, emerges solely from information-theoretic principles, and has the form of a local mutual information. We show how the measure can be understood from the perspective of exclusions of probability mass, a principle that is foundational to the original definition of the mutual information by Fano. Since our measure is well-defined for individual realizations of the random variables it lends itself for example to local learning in artificial neural networks. We also show that it has a meaningful M\"{o}bius inversion on a redundancy lattice and obeys a target chain rule. We give an operational interpretation of the measure based on the decisions that an agent should take if given only the shared information.Comment: 19 pages, 6 figures; title modified, text modified, typos corrected, manuscript publishe

    A New Framework for Decomposing Multivariate Information

    Get PDF
    What are the distinct ways in which a set of predictor variables can provide information about a target variable? When does a variable provide unique information, when do variables share redundant information, and when do variables combine synergistically to provide complementary information? The redundancy lattice from the partial information decomposition of Williams and Beer provided a promising glimpse at the answer to these questions. However, this structure was constructed using a much-criticised measure of redundant information, and despite sustained research, no completely satisfactory replacement measure has been proposed. This thesis presents a new framework for information decomposition that is based upon the decomposition of pointwise mutual information rather than mutual information. The framework is derived in two separate ways. The first of these derivations is based upon a modified version of the original axiomatic approach taken by Williams and Beer. However, to overcome the difficulty associated with signed pointwise mutual information, the decomposition is applied separately to the unsigned entropic components of pointwise mutual information which are referred to as the specificity and ambiguity. This yields a separate redundancy lattice for each component. Based upon an operational interpretation of redundancy, measures of redundant specificity and redundant ambiguity are defined which enables one to evaluate the partial information atoms separately for each lattice. These separate atoms can then be recombined to yield the sought-after multivariate information decomposition. This framework is applied to canonical examples from the literature and the results and various properties of the decomposition are discussed. In particular, the pointwise decomposition using specificity and ambiguity is shown to satisfy a chain rule over target variables, which provides new insights into the so-called two-bit-copy example. The second approach begins by considering the distinct ways in which two marginal observers can share their information with the non-observing individual third party. Several novel measures of information content are introduced, namely the union, intersection and unique information contents. Next, the algebraic structure of these new measures of shared marginal information is explored, and it is shown that the structure of shared marginal information is that of a distributive lattice. Furthermore, by using the fundamental theorem of distributive lattices, it is shown that these new measures are isomorphic to a ring of sets. Finally, by combining this structure together with the semi-lattice of joint information, the redundancy lattice form partial information decomposition is found to be embedded within this larger algebraic structure. However, since this structure considers information contents, it is actually equivalent to the specificity lattice from the first derivation of pointwise partial information decomposition. The thesis then closes with a discussion about whether or not one should combine the information contents from the specificity and ambiguity lattices

    Pointwise Partial Information Decomposition using the Specificity and Ambiguity Lattices

    Full text link
    What are the distinct ways in which a set of predictor variables can provide information about a target variable? When does a variable provide unique information, when do variables share redundant information, and when do variables combine synergistically to provide complementary information? The redundancy lattice from the partial information decomposition of Williams and Beer provided a promising glimpse at the answer to these questions. However, this structure was constructed using a much criticised measure of redundant information, and despite sustained research, no completely satisfactory replacement measure has been proposed. In this paper, we take a different approach, applying the axiomatic derivation of the redundancy lattice to a single realisation from a set of discrete variables. To overcome the difficulty associated with signed pointwise mutual information, we apply this decomposition separately to the unsigned entropic components of pointwise mutual information which we refer to as the specificity and ambiguity. This yields a separate redundancy lattice for each component. Then based upon an operational interpretation of redundancy, we define measures of redundant specificity and ambiguity enabling us to evaluate the partial information atoms in each lattice. These atoms can be recombined to yield the sought-after multivariate information decomposition. We apply this framework to canonical examples from the literature and discuss the results and the various properties of the decomposition. In particular, the pointwise decomposition using specificity and ambiguity satisfies a chain rule over target variables, which provides new insights into the so-called two-bit-copy example.Comment: 31 pages, 10 figures. (v1: preprint; v2: as accepted; v3: title corrected