268 research outputs found

    Multiclass Total Variation Clustering

    Get PDF
    Ideas from the image processing literature have recently motivated a new set of clustering algorithms that rely on the concept of total variation. While these algorithms perform well for bi-partitioning tasks, their recursive extensions yield unimpressive results for multiclass clustering tasks. This paper presents a general framework for multiclass total variation clustering that does not rely on recursion. The results greatly outperform previous total variation algorithms and compare well with state-of-the-art NMF approaches

    Tight Continuous Relaxation of the Balanced kk-Cut Problem

    Full text link
    Spectral Clustering as a relaxation of the normalized/ratio cut has become one of the standard graph-based clustering methods. Existing methods for the computation of multiple clusters, corresponding to a balanced kk-cut of the graph, are either based on greedy techniques or heuristics which have weak connection to the original motivation of minimizing the normalized cut. In this paper we propose a new tight continuous relaxation for any balanced kk-cut problem and show that a related recently proposed relaxation is in most cases loose leading to poor performance in practice. For the optimization of our tight continuous relaxation we propose a new algorithm for the difficult sum-of-ratios minimization problem which achieves monotonic descent. Extensive comparisons show that our method outperforms all existing approaches for ratio cut and other balanced kk-cut criteria.Comment: Long version of paper accepted at NIPS 201

    An Adaptive Total Variation Algorithm for Computing the Balanced Cut of a Graph

    Get PDF
    We propose an adaptive version of the total variation algorithm proposed in [3] for computing the balanced cut of a graph. The algorithm from [3] used a sequence of inner total variation minimizations to guarantee descent of the balanced cut energy as well as convergence of the algorithm. In practice the total variation minimization step is never solved exactly. Instead, an accuracy parameter is specified and the total variation minimization terminates once this level of accuracy is reached. The choice of this parameter can vastly impact both the computational time of the overall algorithm as well as the accuracy of the result. Moreover, since the total variation minimization step is not solved exactly, the algorithm is not guarantied to be monotonic. In the present work we introduce a new adaptive stopping condition for the total variation minimization that guarantees monotonicity. This results in an algorithm that is actually monotonic in practice and is also significantly faster than previous, non-adaptive algorithms

    Convergence of a Steepest Descent Algorithm for Ratio Cut Clustering

    Get PDF
    Unsupervised clustering of scattered, noisy and high-dimensional data points is an important and difficult problem. Tight continuous relaxations of balanced cut problems have recently been shown to provide excellent clustering results. In this paper, we present an explicit-implicit gradient flow scheme for the relaxed ratio cut problem, and prove that the algorithm converges to a critical point of the energy. We also show the efficiency of the proposed algorithm on the two moons dataset

    Multiclass Data Segmentation using Diffuse Interface Methods on Graphs

    Full text link
    We present two graph-based algorithms for multiclass segmentation of high-dimensional data. The algorithms use a diffuse interface model based on the Ginzburg-Landau functional, related to total variation compressed sensing and image processing. A multiclass extension is introduced using the Gibbs simplex, with the functional's double-well potential modified to handle the multiclass case. The first algorithm minimizes the functional using a convex splitting numerical scheme. The second algorithm is a uses a graph adaptation of the classical numerical Merriman-Bence-Osher (MBO) scheme, which alternates between diffusion and thresholding. We demonstrate the performance of both algorithms experimentally on synthetic data, grayscale and color images, and several benchmark data sets such as MNIST, COIL and WebKB. We also make use of fast numerical solvers for finding the eigenvectors and eigenvalues of the graph Laplacian, and take advantage of the sparsity of the matrix. Experiments indicate that the results are competitive with or better than the current state-of-the-art multiclass segmentation algorithms.Comment: 14 page

    Community detection in networks via nonlinear modularity eigenvectors

    Get PDF
    Revealing a community structure in a network or dataset is a central problem arising in many scientific areas. The modularity function QQ is an established measure quantifying the quality of a community, being identified as a set of nodes having high modularity. In our terminology, a set of nodes with positive modularity is called a \textit{module} and a set that maximizes QQ is thus called \textit{leading module}. Finding a leading module in a network is an important task, however the dimension of real-world problems makes the maximization of QQ unfeasible. This poses the need of approximation techniques which are typically based on a linear relaxation of QQ, induced by the spectrum of the modularity matrix MM. In this work we propose a nonlinear relaxation which is instead based on the spectrum of a nonlinear modularity operator M\mathcal M. We show that extremal eigenvalues of M\mathcal M provide an exact relaxation of the modularity measure QQ, however at the price of being more challenging to be computed than those of MM. Thus we extend the work made on nonlinear Laplacians, by proposing a computational scheme, named \textit{generalized RatioDCA}, to address such extremal eigenvalues. We show monotonic ascent and convergence of the method. We finally apply the new method to several synthetic and real-world data sets, showing both effectiveness of the model and performance of the method

    Efficient Flow-based Approximation Algorithms for Submodular Hypergraph Partitioning via a Generalized Cut-Matching Game

    Full text link
    In the past 20 years, increasing complexity in real world data has lead to the study of higher-order data models based on partitioning hypergraphs. However, hypergraph partitioning admits multiple formulations as hyperedges can be cut in multiple ways. Building upon a class of hypergraph partitioning problems introduced by Li & Milenkovic, we study the problem of minimizing ratio-cut objectives over hypergraphs given by a new class of cut functions, monotone submodular cut functions (mscf's), which captures hypergraph expansion and conductance as special cases. We first define the ratio-cut improvement problem, a family of local relaxations of the minimum ratio-cut problem. This problem is a natural extension of the Andersen & Lang cut improvement problem to the hypergraph setting. We demonstrate the existence of efficient algorithms for approximately solving this problem. These algorithms run in almost-linear time for the case of hypergraph expansion, and when the hypergraph rank is at most O(1)O(1). Next, we provide an efficient O(logn)O(\log n)-approximation algorithm for finding the minimum ratio-cut of GG. We generalize the cut-matching game framework of Khandekar et. al. to allow for the cut player to play unbalanced cuts, and matching player to route approximate single-commodity flows. Using this framework, we bootstrap our algorithms for the ratio-cut improvement problem to obtain approximation algorithms for minimum ratio-cut problem for all mscf's. This also yields the first almost-linear time O(logn)O(\log n)-approximation algorithms for hypergraph expansion, and constant hypergraph rank. Finally, we extend a result of Louis & Makarychev to a broader set of objective functions by giving a polynomial time O(logn)O\big(\sqrt{\log n}\big)-approximation algorithm for the minimum ratio-cut problem based on rounding 22\ell_2^2-metric embeddings.Comment: Comments and feedback welcom
    corecore