23,114 research outputs found

    Squarepants in a Tree: Sum of Subtree Clustering and Hyperbolic Pants Decomposition

    Full text link
    We provide efficient constant factor approximation algorithms for the problems of finding a hierarchical clustering of a point set in any metric space, minimizing the sum of minimimum spanning tree lengths within each cluster, and in the hyperbolic or Euclidean planes, minimizing the sum of cluster perimeters. Our algorithms for the hyperbolic and Euclidean planes can also be used to provide a pants decomposition, that is, a set of disjoint simple closed curves partitioning the plane minus the input points into subsets with exactly three boundary components, with approximately minimum total length. In the Euclidean case, these curves are squares; in the hyperbolic case, they combine our Euclidean square pants decomposition with our tree clustering method for general metric spaces.Comment: 22 pages, 14 figures. This version replaces the proof of what is now Lemma 5.2, as the previous proof was erroneou

    Multilevel compression of random walks on networks reveals hierarchical organization in large integrated systems

    Get PDF
    To comprehend the hierarchical organization of large integrated systems, we introduce the hierarchical map equation, which reveals multilevel structures in networks. In this information-theoretic approach, we exploit the duality between compression and pattern detection; by compressing a description of a random walker as a proxy for real flow on a network, we find regularities in the network that induce this system-wide flow. Finding the shortest multilevel description of the random walker therefore gives us the best hierarchical clustering of the network, the optimal number of levels and modular partition at each level, with respect to the dynamics on the network. With a novel search algorithm, we extract and illustrate the rich multilevel organization of several large social and biological networks. For example, from the global air traffic network we uncover countries and continents, and from the pattern of scientific communication we reveal more than 100 scientific fields organized in four major disciplines: life sciences, physical sciences, ecology and earth sciences, and social sciences. In general, we find shallow hierarchical structures in globally interconnected systems, such as neural networks, and rich multilevel organizations in systems with highly separated regions, such as road networks.Comment: 11 pages, 5 figures. For associated code, see http://www.tp.umu.se/~rosvall/code.htm

    Scalable Compression of Deep Neural Networks

    Full text link
    Deep neural networks generally involve some layers with mil- lions of parameters, making them difficult to be deployed and updated on devices with limited resources such as mobile phones and other smart embedded systems. In this paper, we propose a scalable representation of the network parameters, so that different applications can select the most suitable bit rate of the network based on their own storage constraints. Moreover, when a device needs to upgrade to a high-rate network, the existing low-rate network can be reused, and only some incremental data are needed to be downloaded. We first hierarchically quantize the weights of a pre-trained deep neural network to enforce weight sharing. Next, we adaptively select the bits assigned to each layer given the total bit budget. After that, we retrain the network to fine-tune the quantized centroids. Experimental results show that our method can achieve scalable compression with graceful degradation in the performance.Comment: 5 pages, 4 figures, ACM Multimedia 201

    Clustering by compression

    Full text link
    We present a new method for clustering based on compression. The method doesn't use subject-specific features or background knowledge, and works as follows: First, we determine a universal similarity distance, the normalized compression distance or NCD, computed from the lengths of compressed data files (singly and in pairwise concatenation). Second, we apply a hierarchical clustering method. The NCD is universal in that it is not restricted to a specific application area, and works across application area boundaries. A theoretical precursor, the normalized information distance, co-developed by one of the authors, is provably optimal but uses the non-computable notion of Kolmogorov complexity. We propose precise notions of similarity metric, normal compressor, and show that the NCD based on a normal compressor is a similarity metric that approximates universality. To extract a hierarchy of clusters from the distance matrix, we determine a dendrogram (binary tree) by a new quartet method and a fast heuristic to implement it. The method is implemented and available as public software, and is robust under choice of different compressors. To substantiate our claims of universality and robustness, we report evidence of successful application in areas as diverse as genomics, virology, languages, literature, music, handwritten digits, astronomy, and combinations of objects from completely different domains, using statistical, dictionary, and block sorting compressors. In genomics we presented new evidence for major questions in Mammalian evolution, based on whole-mitochondrial genomic analysis: the Eutherian orders and the Marsupionta hypothesis against the Theria hypothesis.Comment: LaTeX, 27 pages, 20 figure

    Multiscale Markov Decision Problems: Compression, Solution, and Transfer Learning

    Full text link
    Many problems in sequential decision making and stochastic control often have natural multiscale structure: sub-tasks are assembled together to accomplish complex goals. Systematically inferring and leveraging hierarchical structure, particularly beyond a single level of abstraction, has remained a longstanding challenge. We describe a fast multiscale procedure for repeatedly compressing, or homogenizing, Markov decision processes (MDPs), wherein a hierarchy of sub-problems at different scales is automatically determined. Coarsened MDPs are themselves independent, deterministic MDPs, and may be solved using existing algorithms. The multiscale representation delivered by this procedure decouples sub-tasks from each other and can lead to substantial improvements in convergence rates both locally within sub-problems and globally across sub-problems, yielding significant computational savings. A second fundamental aspect of this work is that these multiscale decompositions yield new transfer opportunities across different problems, where solutions of sub-tasks at different levels of the hierarchy may be amenable to transfer to new problems. Localized transfer of policies and potential operators at arbitrary scales is emphasized. Finally, we demonstrate compression and transfer in a collection of illustrative domains, including examples involving discrete and continuous statespaces.Comment: 86 pages, 15 figure

    Fast Hierarchical Clustering Based on Compressed Data and OPTICS

    Full text link
    • …
    corecore