12,603 research outputs found

    Clustering and Community Detection with Imbalanced Clusters

    Full text link
    Spectral clustering methods which are frequently used in clustering and community detection applications are sensitive to the specific graph constructions particularly when imbalanced clusters are present. We show that ratio cut (RCut) or normalized cut (NCut) objectives are not tailored to imbalanced cluster sizes since they tend to emphasize cut sizes over cut values. We propose a graph partitioning problem that seeks minimum cut partitions under minimum size constraints on partitions to deal with imbalanced cluster sizes. Our approach parameterizes a family of graphs by adaptively modulating node degrees on a fixed node set, yielding a set of parameter dependent cuts reflecting varying levels of imbalance. The solution to our problem is then obtained by optimizing over these parameters. We present rigorous limit cut analysis results to justify our approach and demonstrate the superiority of our method through experiments on synthetic and real datasets for data clustering, semi-supervised learning and community detection.Comment: Extended version of arXiv:1309.2303 with new applications. Accepted to IEEE TSIP

    On morphological hierarchical representations for image processing and spatial data clustering

    Full text link
    Hierarchical data representations in the context of classi cation and data clustering were put forward during the fties. Recently, hierarchical image representations have gained renewed interest for segmentation purposes. In this paper, we briefly survey fundamental results on hierarchical clustering and then detail recent paradigms developed for the hierarchical representation of images in the framework of mathematical morphology: constrained connectivity and ultrametric watersheds. Constrained connectivity can be viewed as a way to constrain an initial hierarchy in such a way that a set of desired constraints are satis ed. The framework of ultrametric watersheds provides a generic scheme for computing any hierarchical connected clustering, in particular when such a hierarchy is constrained. The suitability of this framework for solving practical problems is illustrated with applications in remote sensing

    Solving weighted and counting variants of connectivity problems parameterized by treewidth deterministically in single exponential time

    Full text link
    It is well known that many local graph problems, like Vertex Cover and Dominating Set, can be solved in 2^{O(tw)}|V|^{O(1)} time for graphs G=(V,E) with a given tree decomposition of width tw. However, for nonlocal problems, like the fundamental class of connectivity problems, for a long time we did not know how to do this faster than tw^{O(tw)}|V|^{O(1)}. Recently, Cygan et al. (FOCS 2011) presented Monte Carlo algorithms for a wide range of connectivity problems running in time $c^{tw}|V|^{O(1)} for a small constant c, e.g., for Hamiltonian Cycle and Steiner tree. Naturally, this raises the question whether randomization is necessary to achieve this runtime; furthermore, it is desirable to also solve counting and weighted versions (the latter without incurring a pseudo-polynomial cost in terms of the weights). We present two new approaches rooted in linear algebra, based on matrix rank and determinants, which provide deterministic c^{tw}|V|^{O(1)} time algorithms, also for weighted and counting versions. For example, in this time we can solve the traveling salesman problem or count the number of Hamiltonian cycles. The rank-based ideas provide a rather general approach for speeding up even straightforward dynamic programming formulations by identifying "small" sets of representative partial solutions; we focus on the case of expressing connectivity via sets of partitions, but the essential ideas should have further applications. The determinant-based approach uses the matrix tree theorem for deriving closed formulas for counting versions of connectivity problems; we show how to evaluate those formulas via dynamic programming.Comment: 36 page

    RASCAL: calculation of graph similarity using maximum common edge subgraphs

    Get PDF
    A new graph similarity calculation procedure is introduced for comparing labeled graphs. Given a minimum similarity threshold, the procedure consists of an initial screening process to determine whether it is possible for the measure of similarity between the two graphs to exceed the minimum threshold, followed by a rigorous maximum common edge subgraph (MCES) detection algorithm to compute the exact degree and composition of similarity. The proposed MCES algorithm is based on a maximum clique formulation of the problem and is a significant improvement over other published algorithms. It presents new approaches to both lower and upper bounding as well as vertex selection

    Compressing networks with super nodes

    Full text link
    Community detection is a commonly used technique for identifying groups in a network based on similarities in connectivity patterns. To facilitate community detection in large networks, we recast the network to be partitioned into a smaller network of 'super nodes', each super node comprising one or more nodes in the original network. To define the seeds of our super nodes, we apply the 'CoreHD' ranking from dismantling and decycling. We test our approach through the analysis of two common methods for community detection: modularity maximization with the Louvain algorithm and maximum likelihood optimization for fitting a stochastic block model. Our results highlight that applying community detection to the compressed network of super nodes is significantly faster while successfully producing partitions that are more aligned with the local network connectivity, more stable across multiple (stochastic) runs within and between community detection algorithms, and overlap well with the results obtained using the full network
    • …
    corecore