    Optimal Kullback-Leibler Aggregation via Information Bottleneck

    In this paper, we present a method for reducing a regular, discrete-time Markov chain (DTMC) to another DTMC with a given, typically much smaller number of states. The cost of reduction is defined as the Kullback-Leibler divergence rate between a projection of the original process through a partition function and a DTMC on the correspondingly partitioned state space. Finding the reduced model with minimal cost is computationally expensive, as it requires an exhaustive search among all state space partitions, and an exact evaluation of the reduction cost for each candidate partition. Our approach deals with the latter problem by minimizing an upper bound on the reduction cost instead of minimizing the exact cost; The proposed upper bound is easy to compute and it is tight if the original chain is lumpable with respect to the partition. Then, we express the problem in the form of information bottleneck optimization, and propose using the agglomerative information bottleneck algorithm for searching a sub-optimal partition greedily, rather than exhaustively. The theory is illustrated with examples and one application scenario in the context of modeling bio-molecular interactions.Comment: 13 pages, 4 figure

    The information bottleneck method

    We define the relevant information in a signal xXx\in X as being the information that this signal provides about another signal y\in \Y. Examples include the information that face images provide about the names of the people portrayed, or the information that speech sounds provide about the words spoken. Understanding the signal xx requires more than just predicting yy, it also requires specifying which features of \X play a role in the prediction. We formalize this problem as that of finding a short code for \X that preserves the maximum information about \Y. That is, we squeeze the information that \X provides about \Y through a `bottleneck' formed by a limited set of codewords \tX. This constrained optimization problem can be seen as a generalization of rate distortion theory in which the distortion measure d(x,\x) emerges from the joint statistics of \X and \Y. This approach yields an exact set of self consistent equations for the coding rules X \to \tX and \tX \to \Y. Solutions to these equations can be found by a convergent re-estimation method that generalizes the Blahut-Arimoto algorithm. Our variational principle provides a surprisingly rich framework for discussing a variety of problems in signal processing and learning, as will be described in detail elsewhere

    Information-Distilling Quantizers

    Let XX and YY be dependent random variables. This paper considers the problem of designing a scalar quantizer for YY to maximize the mutual information between the quantizer's output and XX, and develops fundamental properties and bounds for this form of quantization, which is connected to the log-loss distortion criterion. The main focus is the regime of low I(X;Y)I(X;Y), where it is shown that, if XX is binary, a constant fraction of the mutual information can always be preserved using O(log(1/I(X;Y)))\mathcal{O}(\log(1/I(X;Y))) quantization levels, and there exist distributions for which this many quantization levels are necessary. Furthermore, for larger finite alphabets 2<X<2 < |\mathcal{X}| < \infty, it is established that an η\eta-fraction of the mutual information can be preserved using roughly (log(X/I(X;Y)))η(X1)(\log(| \mathcal{X} | /I(X;Y)))^{\eta\cdot(|\mathcal{X}| - 1)} quantization levels

    A multistage linear array assignment problem

    The implementation of certain algorithms on parallel processing computing architectures can involve partitioning contiguous elements into a fixed number of groups, each of which is to be handled by a single processor. It is desired to find an assignment of elements to processors that minimizes the sum of the maximum workloads experienced at each stage. This problem can be viewed as a multi-objective network optimization problem. Polynomially-bounded algorithms are developed for the case of two stages, whereas the associated decision problem (for an arbitrary number of stages) is shown to be NP-complete. Heuristic procedures are therefore proposed and analyzed for the general problem. Computational experience with one of the exact problems, incorporating certain pruning rules, is presented with one of the exact problems. Empirical results also demonstrate that one of the heuristic procedures is especially effective in practice

    Speeding up Martins' algorithm for multiple objective shortest path problems

    The latest transportation systems require the best routes in a large network with respect to multiple objectives simultaneously to be calculated in a very short time. The label setting algorithm of Martins efficiently finds this set of Pareto optimal paths, but sometimes tends to be slow, especially for large networks such as transportation networks. In this article we investigate a number of speedup measures, resulting in new algorithms. It is shown that the calculation time to find the Pareto optimal set can be reduced considerably. Moreover, it is mathematically proven that these algorithms still produce the Pareto optimal set of paths

    Breaking Instance-Independent Symmetries In Exact Graph Coloring

    Code optimization and high level synthesis can be posed as constraint satisfaction and optimization problems, such as graph coloring used in register allocation. Graph coloring is also used to model more traditional CSPs relevant to AI, such as planning, time-tabling and scheduling. Provably optimal solutions may be desirable for commercial and defense applications. Additionally, for applications such as register allocation and code optimization, naturally-occurring instances of graph coloring are often small and can be solved optimally. A recent wave of improvements in algorithms for Boolean satisfiability (SAT) and 0-1 Integer Linear Programming (ILP) suggests generic problem-reduction methods, rather than problem-specific heuristics, because (1) heuristics may be upset by new constraints, (2) heuristics tend to ignore structure, and (3) many relevant problems are provably inapproximable. Problem reductions often lead to highly symmetric SAT instances, and symmetries are known to slow down SAT solvers. In this work, we compare several avenues for symmetry breaking, in particular when certain kinds of symmetry are present in all generated instances. Our focus on reducing CSPs to SAT allows us to leverage recent dramatic improvement in SAT solvers and automatically benefit from future progress. We can use a variety of black-box SAT solvers without modifying their source code because our symmetry-breaking techniques are static, i.e., we detect symmetries and add symmetry breaking predicates (SBPs) during pre-processing. An important result of our work is that among the types of instance-independent SBPs we studied and their combinations, the simplest and least complete constructions are the most effective. Our experiments also clearly indicate that instance-independent symmetries should mostly be processed together with instance-specific symmetries rather than at the specification level, contrary to what has been suggested in the literature