2,314 research outputs found

    Clustering and Community Detection with Imbalanced Clusters

    Full text link
    Spectral clustering methods which are frequently used in clustering and community detection applications are sensitive to the specific graph constructions particularly when imbalanced clusters are present. We show that ratio cut (RCut) or normalized cut (NCut) objectives are not tailored to imbalanced cluster sizes since they tend to emphasize cut sizes over cut values. We propose a graph partitioning problem that seeks minimum cut partitions under minimum size constraints on partitions to deal with imbalanced cluster sizes. Our approach parameterizes a family of graphs by adaptively modulating node degrees on a fixed node set, yielding a set of parameter dependent cuts reflecting varying levels of imbalance. The solution to our problem is then obtained by optimizing over these parameters. We present rigorous limit cut analysis results to justify our approach and demonstrate the superiority of our method through experiments on synthetic and real datasets for data clustering, semi-supervised learning and community detection.Comment: Extended version of arXiv:1309.2303 with new applications. Accepted to IEEE TSIP

    Deep Divergence-Based Approach to Clustering

    Get PDF
    A promising direction in deep learning research consists in learning representations and simultaneously discovering cluster structure in unlabeled data by optimizing a discriminative loss function. As opposed to supervised deep learning, this line of research is in its infancy, and how to design and optimize suitable loss functions to train deep neural networks for clustering is still an open question. Our contribution to this emerging field is a new deep clustering network that leverages the discriminative power of information-theoretic divergence measures, which have been shown to be effective in traditional clustering. We propose a novel loss function that incorporates geometric regularization constraints, thus avoiding degenerate structures of the resulting clustering partition. Experiments on synthetic benchmarks and real datasets show that the proposed network achieves competitive performance with respect to other state-of-the-art methods, scales well to large datasets, and does not require pre-training steps

    On interference among moving sensors and related problems

    Full text link
    We show that for any set of nn points moving along "simple" trajectories (i.e., each coordinate is described with a polynomial of bounded degree) in d\Re^d and any parameter 2kn2 \le k \le n, one can select a fixed non-empty subset of the points of size O(klogk)O(k \log k), such that the Voronoi diagram of this subset is "balanced" at any given time (i.e., it contains O(n/k)O(n/k) points per cell). We also show that the bound O(klogk)O(k \log k) is near optimal even for the one dimensional case in which points move linearly in time. As applications, we show that one can assign communication radii to the sensors of a network of nn moving sensors so that at any given time their interference is O(nlogn)O(\sqrt{n\log n}). We also show some results in kinetic approximate range counting and kinetic discrepancy. In order to obtain these results, we extend well-known results from ε\varepsilon-net theory to kinetic environments

    Evaluating Stability in Massive Social Networks: Efficient Streaming Algorithms for Structural Balance

    Full text link
    Structural balance theory studies stability in networks. Given a nn-vertex complete graph G=(V,E)G=(V,E) whose edges are labeled positive or negative, the graph is considered \emph{balanced} if every triangle either consists of three positive edges (three mutual ``friends''), or one positive edge and two negative edges (two ``friends'' with a common ``enemy''). From a computational perspective, structural balance turns out to be a special case of correlation clustering with the number of clusters at most two. The two main algorithmic problems of interest are: (i)(i) detecting whether a given graph is balanced, or (ii)(ii) finding a partition that approximates the \emph{frustration index}, i.e., the minimum number of edge flips that turn the graph balanced. We study these problems in the streaming model where edges are given one by one and focus on \emph{memory efficiency}. We provide randomized single-pass algorithms for: (i)(i) determining whether an input graph is balanced with O(logn)O(\log{n}) memory, and (ii)(ii) finding a partition that induces a (1+ε)(1 + \varepsilon)-approximation to the frustration index with O(npolylog(n))O(n \cdot \text{polylog}(n)) memory. We further provide several new lower bounds, complementing different aspects of our algorithms such as the need for randomization or approximation. To obtain our main results, we develop a method using pseudorandom generators (PRGs) to sample edges between independently-chosen \emph{vertices} in graph streaming. Furthermore, our algorithm that approximates the frustration index improves the running time of the state-of-the-art correlation clustering with two clusters (Giotis-Guruswami algorithm [SODA 2006]) from nO(1/ε2)n^{O(1/\varepsilon^2)} to O(n2log3n/ε2+nlogn(1/ε)O(1/ε4))O(n^2\log^3{n}/\varepsilon^2 + n\log n \cdot (1/\varepsilon)^{O(1/\varepsilon^4)}) time for (1+ε)(1+\varepsilon)-approximation. These results may be of independent interest
    corecore