338 research outputs found

    On the Difference Between the Information Bottleneck and the Deep Information Bottleneck

    Full text link
    Combining the Information Bottleneck model with deep learning by replacing mutual information terms with deep neural nets has proved successful in areas ranging from generative modelling to interpreting deep neural networks. In this paper, we revisit the Deep Variational Information Bottleneck and the assumptions needed for its derivation. The two assumed properties of the data XX, YY and their latent representation TT take the form of two Markov chains T−X−YT-X-Y and X−T−YX-T-Y. Requiring both to hold during the optimisation process can be limiting for the set of potential joint distributions P(X,Y,T)P(X,Y,T). We therefore show how to circumvent this limitation by optimising a lower bound for I(T;Y)I(T;Y) for which only the latter Markov chain has to be satisfied. The actual mutual information consists of the lower bound which is optimised in DVIB and cognate models in practice and of two terms measuring how much the former requirement T−X−YT-X-Y is violated. Finally, we propose to interpret the family of information bottleneck models as directed graphical models and show that in this framework the original and deep information bottlenecks are special cases of a fundamental IB model

    Semantic Compression of Episodic Memories

    Get PDF
    Storing knowledge of an agent's environment in the form of a probabilistic generative model has been established as a crucial ingredient in a multitude of cognitive tasks. Perception has been formalised as probabilistic inference over the state of latent variables, whereas in decision making the model of the environment is used to predict likely consequences of actions. Such generative models have earlier been proposed to underlie semantic memory but it remained unclear if this model also underlies the efficient storage of experiences in episodic memory. We formalise the compression of episodes in the normative framework of information theory and argue that semantic memory provides the distortion function for compression of experiences. Recent advances and insights from machine learning allow us to approximate semantic compression in naturalistic domains and contrast the resulting deviations in compressed episodes with memory errors observed in the experimental literature on human memory.Comment: CogSci201

    Learning Extremal Representations with Deep Archetypal Analysis

    Full text link
    Archetypes are typical population representatives in an extremal sense, where typicality is understood as the most extreme manifestation of a trait or feature. In linear feature space, archetypes approximate the data convex hull allowing all data points to be expressed as convex mixtures of archetypes. However, it might not always be possible to identify meaningful archetypes in a given feature space. Learning an appropriate feature space and identifying suitable archetypes simultaneously addresses this problem. This paper introduces a generative formulation of the linear archetype model, parameterized by neural networks. By introducing the distance-dependent archetype loss, the linear archetype model can be integrated into the latent space of a variational autoencoder, and an optimal representation with respect to the unknown archetypes can be learned end-to-end. The reformulation of linear Archetypal Analysis as deep variational information bottleneck, allows the incorporation of arbitrarily complex side information during training. Furthermore, an alternative prior, based on a modified Dirichlet distribution, is proposed. The real-world applicability of the proposed method is demonstrated by exploring archetypes of female facial expressions while using multi-rater based emotion scores of these expressions as side information. A second application illustrates the exploration of the chemical space of small organic molecules. In this experiment, it is demonstrated that exchanging the side information but keeping the same set of molecules, e. g. using as side information the heat capacity of each molecule instead of the band gap energy, will result in the identification of different archetypes. As an application, these learned representations of chemical space might reveal distinct starting points for de novo molecular design.Comment: Under review for publication at the International Journal of Computer Vision (IJCV). Extended version of our GCPR2019 paper "Deep Archetypal Analysis
    • …
    corecore