350 research outputs found

    Optimal Kullback-Leibler Aggregation via Information Bottleneck

    Full text link
    In this paper, we present a method for reducing a regular, discrete-time Markov chain (DTMC) to another DTMC with a given, typically much smaller number of states. The cost of reduction is defined as the Kullback-Leibler divergence rate between a projection of the original process through a partition function and a DTMC on the correspondingly partitioned state space. Finding the reduced model with minimal cost is computationally expensive, as it requires an exhaustive search among all state space partitions, and an exact evaluation of the reduction cost for each candidate partition. Our approach deals with the latter problem by minimizing an upper bound on the reduction cost instead of minimizing the exact cost; The proposed upper bound is easy to compute and it is tight if the original chain is lumpable with respect to the partition. Then, we express the problem in the form of information bottleneck optimization, and propose using the agglomerative information bottleneck algorithm for searching a sub-optimal partition greedily, rather than exhaustively. The theory is illustrated with examples and one application scenario in the context of modeling bio-molecular interactions.Comment: 13 pages, 4 figure

    Approximate inference in hidden Markov models using iterative active state selection

    Get PDF

    Reduction of Markov Chains using a Value-of-Information-Based Approach

    Full text link
    In this paper, we propose an approach to obtain reduced-order models of Markov chains. Our approach is composed of two information-theoretic processes. The first is a means of comparing pairs of stationary chains on different state spaces, which is done via the negative Kullback-Leibler divergence defined on a model joint space. Model reduction is achieved by solving a value-of-information criterion with respect to this divergence. Optimizing the criterion leads to a probabilistic partitioning of the states in the high-order Markov chain. A single free parameter that emerges through the optimization process dictates both the partition uncertainty and the number of state groups. We provide a data-driven means of choosing the `optimal' value of this free parameter, which sidesteps needing to a priori know the number of state groups in an arbitrary chain.Comment: Submitted to Entrop

    Empirical and Strong Coordination via Soft Covering with Polar Codes

    Full text link
    We design polar codes for empirical coordination and strong coordination in two-node networks. Our constructions hinge on the fact that polar codes enable explicit low-complexity schemes for soft covering. We leverage this property to propose explicit and low-complexity coding schemes that achieve the capacity regions of both empirical coordination and strong coordination for sequences of actions taking value in an alphabet of prime cardinality. Our results improve previously known polar coding schemes, which (i) were restricted to uniform distributions and to actions obtained via binary symmetric channels for strong coordination, (ii) required a non-negligible amount of common randomness for empirical coordination, and (iii) assumed that the simulation of discrete memoryless channels could be perfectly implemented. As a by-product of our results, we obtain a polar coding scheme that achieves channel resolvability for an arbitrary discrete memoryless channel whose input alphabet has prime cardinality.Comment: 14 pages, two-column, 5 figures, accepted to IEEE Transactions on Information Theor

    Information theoretic novelty detection

    Get PDF
    We present a novel approach to online change detection problems when the training sample size is small. The proposed approach is based on estimating the expected information content of a new data point and allows an accurate control of the false positive rate even for small data sets. In the case of the Gaussian distribution, our approach is analytically tractable and closely related to classical statistical tests. We then propose an approximation scheme to extend our approach to the case of the mixture of Gaussians. We evaluate extensively our approach on synthetic data and on three real benchmark data sets. The experimental validation shows that our method maintains a good overall accuracy, but significantly improves the control over the false positive rate

    The information bottleneck method

    Get PDF
    We define the relevant information in a signal x∈Xx\in X as being the information that this signal provides about another signal y\in \Y. Examples include the information that face images provide about the names of the people portrayed, or the information that speech sounds provide about the words spoken. Understanding the signal xx requires more than just predicting yy, it also requires specifying which features of \X play a role in the prediction. We formalize this problem as that of finding a short code for \X that preserves the maximum information about \Y. That is, we squeeze the information that \X provides about \Y through a `bottleneck' formed by a limited set of codewords \tX. This constrained optimization problem can be seen as a generalization of rate distortion theory in which the distortion measure d(x,\x) emerges from the joint statistics of \X and \Y. This approach yields an exact set of self consistent equations for the coding rules X \to \tX and \tX \to \Y. Solutions to these equations can be found by a convergent re-estimation method that generalizes the Blahut-Arimoto algorithm. Our variational principle provides a surprisingly rich framework for discussing a variety of problems in signal processing and learning, as will be described in detail elsewhere
    • 

    corecore