2,815 research outputs found
Introducing a differentiable measure of pointwise shared information
Partial information decomposition (PID) of the multivariate mutual
information describes the distinct ways in which a set of source variables
contains information about a target variable. The groundbreaking work of
Williams and Beer has shown that this decomposition cannot be determined from
classic information theory without making additional assumptions, and several
candidate measures have been proposed, often drawing on principles from related
fields such as decision theory. None of these measures is differentiable with
respect to the underlying probability mass function. We here present a novel
measure that satisfies this property, emerges solely from information-theoretic
principles, and has the form of a local mutual information. We show how the
measure can be understood from the perspective of exclusions of probability
mass, a principle that is foundational to the original definition of the mutual
information by Fano. Since our measure is well-defined for individual
realizations of the random variables it lends itself for example to local
learning in artificial neural networks. We also show that it has a meaningful
M\"{o}bius inversion on a redundancy lattice and obeys a target chain rule. We
give an operational interpretation of the measure based on the decisions that
an agent should take if given only the shared information.Comment: 19 pages, 6 figures; title modified, text modified, typos corrected,
manuscript publishe
Pointwise differentiability of higher order for sets
The present paper develops two concepts of pointwise differentiability of
higher order for arbitrary subsets of Euclidean space defined by comparing
their distance functions to those of smooth submanifolds. Results include that
differentials are Borel functions, higher order rectifiability of the set of
differentiability points, and a Rademacher result. One concept is characterised
by a limit procedure involving inhomogeneously dilated sets.
The original motivation to formulate the concepts stems from studying the
support of stationary integral varifolds. In particular, strong pointwise
differentiability of every positive integer order is shown at almost all points
of the intersection of the support with a given plane.Comment: Description of subsequent work added to the introduction, references
and affiliations updated, typographical corrections made; 34 page
On Sampling Strategies for Neural Network-based Collaborative Filtering
Recent advances in neural networks have inspired people to design hybrid
recommendation algorithms that can incorporate both (1) user-item interaction
information and (2) content information including image, audio, and text.
Despite their promising results, neural network-based recommendation algorithms
pose extensive computational costs, making it challenging to scale and improve
upon. In this paper, we propose a general neural network-based recommendation
framework, which subsumes several existing state-of-the-art recommendation
algorithms, and address the efficiency issue by investigating sampling
strategies in the stochastic gradient descent training for the framework. We
tackle this issue by first establishing a connection between the loss functions
and the user-item interaction bipartite graph, where the loss function terms
are defined on links while major computation burdens are located at nodes. We
call this type of loss functions "graph-based" loss functions, for which varied
mini-batch sampling strategies can have different computational costs. Based on
the insight, three novel sampling strategies are proposed, which can
significantly improve the training efficiency of the proposed framework (up to
times speedup in our experiments), as well as improving the
recommendation performance. Theoretical analysis is also provided for both the
computational cost and the convergence. We believe the study of sampling
strategies have further implications on general graph-based loss functions, and
would also enable more research under the neural network-based recommendation
framework.Comment: This is a longer version (with supplementary attached) of the KDD'17
pape
Measuring multivariate redundant information with pointwise common change in surprisal
The problem of how to properly quantify redundant information is an open question that has been the subject of much recent research. Redundant information refers to information about a target variable S that is common to two or more predictor variables Xi . It can be thought of as quantifying overlapping information content or similarities in the representation of S between the Xi . We present a new measure of redundancy which measures the common change in surprisal shared between variables at the local or pointwise level. We provide a game-theoretic operational definition of unique information, and use this to derive constraints which are used to obtain a maximum entropy distribution. Redundancy is then calculated from this maximum entropy distribution by counting only those local co-information terms which admit an unambiguous interpretation as redundant information. We show how this redundancy measure can be used within the framework of the Partial Information Decomposition (PID) to give an intuitive decomposition of the multivariate mutual information into redundant, unique and synergistic contributions. We compare our new measure to existing approaches over a range of example systems, including continuous Gaussian variables. Matlab code for the measure is provided, including all considered examples
- …