268 research outputs found
Multiclass Total Variation Clustering
Ideas from the image processing literature have recently motivated a new set
of clustering algorithms that rely on the concept of total variation. While
these algorithms perform well for bi-partitioning tasks, their recursive
extensions yield unimpressive results for multiclass clustering tasks. This
paper presents a general framework for multiclass total variation clustering
that does not rely on recursion. The results greatly outperform previous total
variation algorithms and compare well with state-of-the-art NMF approaches
Tight Continuous Relaxation of the Balanced -Cut Problem
Spectral Clustering as a relaxation of the normalized/ratio cut has become
one of the standard graph-based clustering methods. Existing methods for the
computation of multiple clusters, corresponding to a balanced -cut of the
graph, are either based on greedy techniques or heuristics which have weak
connection to the original motivation of minimizing the normalized cut. In this
paper we propose a new tight continuous relaxation for any balanced -cut
problem and show that a related recently proposed relaxation is in most cases
loose leading to poor performance in practice. For the optimization of our
tight continuous relaxation we propose a new algorithm for the difficult
sum-of-ratios minimization problem which achieves monotonic descent. Extensive
comparisons show that our method outperforms all existing approaches for ratio
cut and other balanced -cut criteria.Comment: Long version of paper accepted at NIPS 201
An Adaptive Total Variation Algorithm for Computing the Balanced Cut of a Graph
We propose an adaptive version of the total variation algorithm proposed in
[3] for computing the balanced cut of a graph. The algorithm from [3] used a
sequence of inner total variation minimizations to guarantee descent of the
balanced cut energy as well as convergence of the algorithm. In practice the
total variation minimization step is never solved exactly. Instead, an accuracy
parameter is specified and the total variation minimization terminates once
this level of accuracy is reached. The choice of this parameter can vastly
impact both the computational time of the overall algorithm as well as the
accuracy of the result. Moreover, since the total variation minimization step
is not solved exactly, the algorithm is not guarantied to be monotonic. In the
present work we introduce a new adaptive stopping condition for the total
variation minimization that guarantees monotonicity. This results in an
algorithm that is actually monotonic in practice and is also significantly
faster than previous, non-adaptive algorithms
Convergence of a Steepest Descent Algorithm for Ratio Cut Clustering
Unsupervised clustering of scattered, noisy and high-dimensional data points
is an important and difficult problem. Tight continuous relaxations of balanced
cut problems have recently been shown to provide excellent clustering results.
In this paper, we present an explicit-implicit gradient flow scheme for the
relaxed ratio cut problem, and prove that the algorithm converges to a critical
point of the energy. We also show the efficiency of the proposed algorithm on
the two moons dataset
Multiclass Data Segmentation using Diffuse Interface Methods on Graphs
We present two graph-based algorithms for multiclass segmentation of
high-dimensional data. The algorithms use a diffuse interface model based on
the Ginzburg-Landau functional, related to total variation compressed sensing
and image processing. A multiclass extension is introduced using the Gibbs
simplex, with the functional's double-well potential modified to handle the
multiclass case. The first algorithm minimizes the functional using a convex
splitting numerical scheme. The second algorithm is a uses a graph adaptation
of the classical numerical Merriman-Bence-Osher (MBO) scheme, which alternates
between diffusion and thresholding. We demonstrate the performance of both
algorithms experimentally on synthetic data, grayscale and color images, and
several benchmark data sets such as MNIST, COIL and WebKB. We also make use of
fast numerical solvers for finding the eigenvectors and eigenvalues of the
graph Laplacian, and take advantage of the sparsity of the matrix. Experiments
indicate that the results are competitive with or better than the current
state-of-the-art multiclass segmentation algorithms.Comment: 14 page
Community detection in networks via nonlinear modularity eigenvectors
Revealing a community structure in a network or dataset is a central problem
arising in many scientific areas. The modularity function is an established
measure quantifying the quality of a community, being identified as a set of
nodes having high modularity. In our terminology, a set of nodes with positive
modularity is called a \textit{module} and a set that maximizes is thus
called \textit{leading module}. Finding a leading module in a network is an
important task, however the dimension of real-world problems makes the
maximization of unfeasible. This poses the need of approximation techniques
which are typically based on a linear relaxation of , induced by the
spectrum of the modularity matrix . In this work we propose a nonlinear
relaxation which is instead based on the spectrum of a nonlinear modularity
operator . We show that extremal eigenvalues of
provide an exact relaxation of the modularity measure , however at the price
of being more challenging to be computed than those of . Thus we extend the
work made on nonlinear Laplacians, by proposing a computational scheme, named
\textit{generalized RatioDCA}, to address such extremal eigenvalues. We show
monotonic ascent and convergence of the method. We finally apply the new method
to several synthetic and real-world data sets, showing both effectiveness of
the model and performance of the method
Efficient Flow-based Approximation Algorithms for Submodular Hypergraph Partitioning via a Generalized Cut-Matching Game
In the past 20 years, increasing complexity in real world data has lead to
the study of higher-order data models based on partitioning hypergraphs.
However, hypergraph partitioning admits multiple formulations as hyperedges can
be cut in multiple ways. Building upon a class of hypergraph partitioning
problems introduced by Li & Milenkovic, we study the problem of minimizing
ratio-cut objectives over hypergraphs given by a new class of cut functions,
monotone submodular cut functions (mscf's), which captures hypergraph expansion
and conductance as special cases.
We first define the ratio-cut improvement problem, a family of local
relaxations of the minimum ratio-cut problem. This problem is a natural
extension of the Andersen & Lang cut improvement problem to the hypergraph
setting. We demonstrate the existence of efficient algorithms for approximately
solving this problem. These algorithms run in almost-linear time for the case
of hypergraph expansion, and when the hypergraph rank is at most .
Next, we provide an efficient -approximation algorithm for finding
the minimum ratio-cut of . We generalize the cut-matching game framework of
Khandekar et. al. to allow for the cut player to play unbalanced cuts, and
matching player to route approximate single-commodity flows. Using this
framework, we bootstrap our algorithms for the ratio-cut improvement problem to
obtain approximation algorithms for minimum ratio-cut problem for all mscf's.
This also yields the first almost-linear time -approximation
algorithms for hypergraph expansion, and constant hypergraph rank.
Finally, we extend a result of Louis & Makarychev to a broader set of
objective functions by giving a polynomial time -approximation algorithm for the minimum ratio-cut problem based on
rounding -metric embeddings.Comment: Comments and feedback welcom
- …