2,815 research outputs found

    Introducing a differentiable measure of pointwise shared information

    Full text link
    Partial information decomposition (PID) of the multivariate mutual information describes the distinct ways in which a set of source variables contains information about a target variable. The groundbreaking work of Williams and Beer has shown that this decomposition cannot be determined from classic information theory without making additional assumptions, and several candidate measures have been proposed, often drawing on principles from related fields such as decision theory. None of these measures is differentiable with respect to the underlying probability mass function. We here present a novel measure that satisfies this property, emerges solely from information-theoretic principles, and has the form of a local mutual information. We show how the measure can be understood from the perspective of exclusions of probability mass, a principle that is foundational to the original definition of the mutual information by Fano. Since our measure is well-defined for individual realizations of the random variables it lends itself for example to local learning in artificial neural networks. We also show that it has a meaningful M\"{o}bius inversion on a redundancy lattice and obeys a target chain rule. We give an operational interpretation of the measure based on the decisions that an agent should take if given only the shared information.Comment: 19 pages, 6 figures; title modified, text modified, typos corrected, manuscript publishe

    Pointwise differentiability of higher order for sets

    Full text link
    The present paper develops two concepts of pointwise differentiability of higher order for arbitrary subsets of Euclidean space defined by comparing their distance functions to those of smooth submanifolds. Results include that differentials are Borel functions, higher order rectifiability of the set of differentiability points, and a Rademacher result. One concept is characterised by a limit procedure involving inhomogeneously dilated sets. The original motivation to formulate the concepts stems from studying the support of stationary integral varifolds. In particular, strong pointwise differentiability of every positive integer order is shown at almost all points of the intersection of the support with a given plane.Comment: Description of subsequent work added to the introduction, references and affiliations updated, typographical corrections made; 34 page

    On Sampling Strategies for Neural Network-based Collaborative Filtering

    Full text link
    Recent advances in neural networks have inspired people to design hybrid recommendation algorithms that can incorporate both (1) user-item interaction information and (2) content information including image, audio, and text. Despite their promising results, neural network-based recommendation algorithms pose extensive computational costs, making it challenging to scale and improve upon. In this paper, we propose a general neural network-based recommendation framework, which subsumes several existing state-of-the-art recommendation algorithms, and address the efficiency issue by investigating sampling strategies in the stochastic gradient descent training for the framework. We tackle this issue by first establishing a connection between the loss functions and the user-item interaction bipartite graph, where the loss function terms are defined on links while major computation burdens are located at nodes. We call this type of loss functions "graph-based" loss functions, for which varied mini-batch sampling strategies can have different computational costs. Based on the insight, three novel sampling strategies are proposed, which can significantly improve the training efficiency of the proposed framework (up to ×30\times 30 times speedup in our experiments), as well as improving the recommendation performance. Theoretical analysis is also provided for both the computational cost and the convergence. We believe the study of sampling strategies have further implications on general graph-based loss functions, and would also enable more research under the neural network-based recommendation framework.Comment: This is a longer version (with supplementary attached) of the KDD'17 pape

    Measuring multivariate redundant information with pointwise common change in surprisal

    Get PDF
    The problem of how to properly quantify redundant information is an open question that has been the subject of much recent research. Redundant information refers to information about a target variable S that is common to two or more predictor variables Xi . It can be thought of as quantifying overlapping information content or similarities in the representation of S between the Xi . We present a new measure of redundancy which measures the common change in surprisal shared between variables at the local or pointwise level. We provide a game-theoretic operational definition of unique information, and use this to derive constraints which are used to obtain a maximum entropy distribution. Redundancy is then calculated from this maximum entropy distribution by counting only those local co-information terms which admit an unambiguous interpretation as redundant information. We show how this redundancy measure can be used within the framework of the Partial Information Decomposition (PID) to give an intuitive decomposition of the multivariate mutual information into redundant, unique and synergistic contributions. We compare our new measure to existing approaches over a range of example systems, including continuous Gaussian variables. Matlab code for the measure is provided, including all considered examples
    • …
    corecore