14 research outputs found
A Perspective on Unique Information: Directionality, Intuitions, and Secret Key Agreement
Recently, the partial information decomposition emerged as a promising
framework for identifying the meaningful components of the information
contained in a joint distribution. Its adoption and practical application,
however, have been stymied by the lack of a generally-accepted method of
quantifying its components. Here, we briefly discuss the bivariate (two-source)
partial information decomposition and two implicitly directional
interpretations used to intuitively motivate alternative component definitions.
Drawing parallels with secret key agreement rates from information-theoretic
cryptography, we demonstrate that these intuitions are mutually incompatible
and suggest that this underlies the persistence of competing definitions and
interpretations. Having highlighted this hitherto unacknowledged issue, we
outline several possible solutions.Comment: 5 pages, 3 tables;
http://csc.ucdavis.edu/~cmg/compmech/pubs/pid_intuition.ht
Unique Information and Secret Key Agreement
The partial information decomposition (PID) is a promising framework for
decomposing a joint random variable into the amount of influence each source
variable Xi has on a target variable Y, relative to the other sources. For two
sources, influence breaks down into the information that both X0 and X1
redundantly share with Y, what X0 uniquely shares with Y, what X1 uniquely
shares with Y, and finally what X0 and X1 synergistically share with Y.
Unfortunately, considerable disagreement has arisen as to how these four
components should be quantified. Drawing from cryptography, we consider the
secret key agreement rate as an operational method of quantifying unique
informations. Secret key agreement rate comes in several forms, depending upon
which parties are permitted to communicate. We demonstrate that three of these
four forms are inconsistent with the PID. The remaining form implies certain
interpretations as to the PID's meaning---interpretations not present in PID's
definition but that, we argue, need to be explicit. These reveal an
inconsistency between third-order connected information, two-way secret key
agreement rate, and synergy. Similar difficulties arise with a popular PID
measure in light the results here as well as from a maximum entropy viewpoint.
We close by reviewing the challenges facing the PID.Comment: 9 pages, 3 figures, 4 tables;
http://csc.ucdavis.edu/~cmg/compmech/pubs/pid_skar.htm. arXiv admin note:
text overlap with arXiv:1808.0860
Unique Information via Dependency Constraints
The partial information decomposition (PID) is perhaps the leading proposal
for resolving information shared between a set of sources and a target into
redundant, synergistic, and unique constituents. Unfortunately, the PID
framework has been hindered by a lack of a generally agreed-upon, multivariate
method of quantifying the constituents. Here, we take a step toward rectifying
this by developing a decomposition based on a new method that quantifies unique
information. We first develop a broadly applicable method---the dependency
decomposition---that delineates how statistical dependencies influence the
structure of a joint distribution. The dependency decomposition then allows us
to define a measure of the information about a target that can be uniquely
attributed to a particular source as the least amount which the source-target
statistical dependency can influence the information shared between the sources
and the target. The result is the first measure that satisfies the core axioms
of the PID framework while not satisfying the Blackwell relation, which depends
on a particular interpretation of how the variables are related. This makes a
key step forward to a practical PID.Comment: 15 pages, 7 figures, 2 tables, 3 appendices;
http://csc.ucdavis.edu/~cmg/compmech/pubs/idep.ht
Understanding Individual Neuron Importance Using Information Theory
In this work, we investigate the use of three information-theoretic
quantities -- entropy, mutual information with the class variable, and a class
selectivity measure based on Kullback-Leibler divergence -- to understand and
study the behavior of already trained fully-connected feed-forward neural
networks. We analyze the connection between these information-theoretic
quantities and classification performance on the test set by cumulatively
ablating neurons in networks trained on MNIST, FashionMNIST, and CIFAR-10. Our
results parallel those recently published by Morcos et al., indicating that
class selectivity is not a good indicator for classification performance.
However, looking at individual layers separately, both mutual information and
class selectivity are positively correlated with classification performance, at
least for networks with ReLU activation functions. We provide explanations for
this phenomenon and conclude that it is ill-advised to compare the proposed
information-theoretic quantities across layers. Finally, we briefly discuss
future prospects of employing information-theoretic quantities for different
purposes, including neuron pruning and studying the effect that different
regularizers and architectures have on the trained neural network. We also draw
connections to the information bottleneck theory of neural networks.Comment: 30 page
Introducing a differentiable measure of pointwise shared information
Partial information decomposition (PID) of the multivariate mutual
information describes the distinct ways in which a set of source variables
contains information about a target variable. The groundbreaking work of
Williams and Beer has shown that this decomposition cannot be determined from
classic information theory without making additional assumptions, and several
candidate measures have been proposed, often drawing on principles from related
fields such as decision theory. None of these measures is differentiable with
respect to the underlying probability mass function. We here present a novel
measure that satisfies this property, emerges solely from information-theoretic
principles, and has the form of a local mutual information. We show how the
measure can be understood from the perspective of exclusions of probability
mass, a principle that is foundational to the original definition of the mutual
information by Fano. Since our measure is well-defined for individual
realizations of the random variables it lends itself for example to local
learning in artificial neural networks. We also show that it has a meaningful
M\"{o}bius inversion on a redundancy lattice and obeys a target chain rule. We
give an operational interpretation of the measure based on the decisions that
an agent should take if given only the shared information.Comment: 19 pages, 6 figures; title modified, text modified, typos corrected,
manuscript publishe
A New Framework for Decomposing Multivariate Information
What are the distinct ways in which a set of predictor variables can provide information about a target variable? When does a variable provide unique information, when do variables share redundant information, and when do variables combine synergistically to provide complementary information? The redundancy lattice from the partial information decomposition of Williams and Beer provided a promising glimpse at the answer to these questions. However, this structure was constructed using a much-criticised measure of redundant information, and despite sustained research, no completely satisfactory replacement measure has been proposed. This thesis presents a new framework for information decomposition that is based upon the decomposition of pointwise mutual information rather than mutual information. The framework is derived in two separate ways. The first of these derivations is based upon a modified version of the original axiomatic approach taken by Williams and Beer. However, to overcome the difficulty associated with signed pointwise mutual information, the decomposition is applied separately to the unsigned entropic components of pointwise mutual information which are referred to as the specificity and ambiguity. This yields a separate redundancy lattice for each component. Based upon an operational interpretation of redundancy, measures of redundant specificity and redundant ambiguity are defined which enables one to evaluate the partial information atoms separately for each lattice. These separate atoms can then be recombined to yield the sought-after multivariate information decomposition. This framework is applied to canonical examples from the literature and the results and various properties of the decomposition are discussed. In particular, the pointwise decomposition using specificity and ambiguity is shown to satisfy a chain rule over target variables, which provides new insights into the so-called two-bit-copy example. The second approach begins by considering the distinct ways in which two marginal observers can share their information with the non-observing individual third party. Several novel measures of information content are introduced, namely the union, intersection and unique information contents. Next, the algebraic structure of these new measures of shared marginal information is explored, and it is shown that the structure of shared marginal information is that of a distributive lattice. Furthermore, by using the fundamental theorem of distributive lattices, it is shown that these new measures are isomorphic to a ring of sets. Finally, by combining this structure together with the semi-lattice of joint information, the redundancy lattice form partial information decomposition is found to be embedded within this larger algebraic structure. However, since this structure considers information contents, it is actually equivalent to the specificity lattice from the first derivation of pointwise partial information decomposition. The thesis then closes with a discussion about whether or not one should combine the information contents from the specificity and ambiguity lattices
Pointwise Partial Information Decomposition using the Specificity and Ambiguity Lattices
What are the distinct ways in which a set of predictor variables can provide
information about a target variable? When does a variable provide unique
information, when do variables share redundant information, and when do
variables combine synergistically to provide complementary information? The
redundancy lattice from the partial information decomposition of Williams and
Beer provided a promising glimpse at the answer to these questions. However,
this structure was constructed using a much criticised measure of redundant
information, and despite sustained research, no completely satisfactory
replacement measure has been proposed. In this paper, we take a different
approach, applying the axiomatic derivation of the redundancy lattice to a
single realisation from a set of discrete variables. To overcome the difficulty
associated with signed pointwise mutual information, we apply this
decomposition separately to the unsigned entropic components of pointwise
mutual information which we refer to as the specificity and ambiguity. This
yields a separate redundancy lattice for each component. Then based upon an
operational interpretation of redundancy, we define measures of redundant
specificity and ambiguity enabling us to evaluate the partial information atoms
in each lattice. These atoms can be recombined to yield the sought-after
multivariate information decomposition. We apply this framework to canonical
examples from the literature and discuss the results and the various properties
of the decomposition. In particular, the pointwise decomposition using
specificity and ambiguity satisfies a chain rule over target variables, which
provides new insights into the so-called two-bit-copy example.Comment: 31 pages, 10 figures. (v1: preprint; v2: as accepted; v3: title
corrected