143,189 research outputs found
Local Guarantees in Graph Cuts and Clustering
Correlation Clustering is an elegant model that captures fundamental graph
cut problems such as Min Cut, Multiway Cut, and Multicut, extensively
studied in combinatorial optimization. Here, we are given a graph with edges
labeled or and the goal is to produce a clustering that agrees with the
labels as much as possible: edges within clusters and edges across
clusters. The classical approach towards Correlation Clustering (and other
graph cut problems) is to optimize a global objective. We depart from this and
study local objectives: minimizing the maximum number of disagreements for
edges incident on a single node, and the analogous max min agreements
objective. This naturally gives rise to a family of basic min-max graph cut
problems. A prototypical representative is Min Max Cut: find an cut
minimizing the largest number of cut edges incident on any node. We present the
following results: an -approximation for the problem of
minimizing the maximum total weight of disagreement edges incident on any node
(thus providing the first known approximation for the above family of min-max
graph cut problems), a remarkably simple -approximation for minimizing
local disagreements in complete graphs (improving upon the previous best known
approximation of ), and a -approximation for
maximizing the minimum total weight of agreement edges incident on any node,
hence improving upon the -approximation that follows from
the study of approximate pure Nash equilibria in cut and party affiliation
games
Balanced k-Means and Min-Cut Clustering
Clustering is an effective technique in data mining to generate groups that are the matter of interest. Among various clustering approaches, the family of k-means algorithms and min-cut algorithms gain most popularity due to their simplicity and efficacy. The classical k-means algorithm partitions a number of data points into several subsets by iteratively updating the clustering centers and the associated data points. By contrast, a weighted undirected graph is constructed in min-cut algorithms which partition the vertices of the graph into two sets. However, existing clustering algorithms tend to cluster minority of data points into a subset, which shall be avoided when the target dataset is balanced. To achieve more accurate clustering for balanced dataset, we propose to leverage exclusive lasso on k-means and min-cut to regulate the balance degree of the clustering results. By optimizing our objective functions that build atop the exclusive lasso, we can make the clustering result as much balanced as possible. Extensive experiments on several large-scale datasets validate the advantage of the proposed algorithms compared to the state-of-the-art clustering algorithms
Causal clustering: design of cluster experiments under network interference
This paper studies the design of cluster experiments to estimate the global
treatment effect in the presence of spillovers on a single network. We provide
an econometric framework to choose the clustering that minimizes the worst-case
mean-squared error of the estimated global treatment effect. We show that the
optimal clustering can be approximated as the solution of a novel penalized
min-cut optimization problem computed via off-the-shelf semi-definite
programming algorithms. Our analysis also characterizes easy-to-check
conditions to choose between a cluster or individual-level randomization. We
illustrate the method's properties using unique network data from the universe
of Facebook's users and existing network data from a field experiment
Local Hypergraph Clustering using Capacity Releasing Diffusion
Local graph clustering is an important machine learning task that aims to
find a well-connected cluster near a set of seed nodes. Recent results have
revealed that incorporating higher order information significantly enhances the
results of graph clustering techniques. The majority of existing research in
this area focuses on spectral graph theory-based techniques. However, an
alternative perspective on local graph clustering arises from using max-flow
and min-cut on the objectives, which offer distinctly different guarantees. For
instance, a new method called capacity releasing diffusion (CRD) was recently
proposed and shown to preserve local structure around the seeds better than
spectral methods. The method was also the first local clustering technique that
is not subject to the quadratic Cheeger inequality by assuming a good cluster
near the seed nodes. In this paper, we propose a local hypergraph clustering
technique called hypergraph CRD (HG-CRD) by extending the CRD process to
cluster based on higher order patterns, encoded as hyperedges of a hypergraph.
Moreover, we theoretically show that HG-CRD gives results about a quantity
called motif conductance, rather than a biased version used in previous
experiments. Experimental results on synthetic datasets and real world graphs
show that HG-CRD enhances the clustering quality.Comment: 18 pages, 6 figure
Tight Continuous Relaxation of the Balanced -Cut Problem
Spectral Clustering as a relaxation of the normalized/ratio cut has become
one of the standard graph-based clustering methods. Existing methods for the
computation of multiple clusters, corresponding to a balanced -cut of the
graph, are either based on greedy techniques or heuristics which have weak
connection to the original motivation of minimizing the normalized cut. In this
paper we propose a new tight continuous relaxation for any balanced -cut
problem and show that a related recently proposed relaxation is in most cases
loose leading to poor performance in practice. For the optimization of our
tight continuous relaxation we propose a new algorithm for the difficult
sum-of-ratios minimization problem which achieves monotonic descent. Extensive
comparisons show that our method outperforms all existing approaches for ratio
cut and other balanced -cut criteria.Comment: Long version of paper accepted at NIPS 201
Spectral Clustering with Imbalanced Data
Spectral clustering is sensitive to how graphs are constructed from data
particularly when proximal and imbalanced clusters are present. We show that
Ratio-Cut (RCut) or normalized cut (NCut) objectives are not tailored to
imbalanced data since they tend to emphasize cut sizes over cut values. We
propose a graph partitioning problem that seeks minimum cut partitions under
minimum size constraints on partitions to deal with imbalanced data. Our
approach parameterizes a family of graphs, by adaptively modulating node
degrees on a fixed node set, to yield a set of parameter dependent cuts
reflecting varying levels of imbalance. The solution to our problem is then
obtained by optimizing over these parameters. We present rigorous limit cut
analysis results to justify our approach. We demonstrate the superiority of our
method through unsupervised and semi-supervised experiments on synthetic and
real data sets.Comment: 24 pages, 7 figures. arXiv admin note: substantial text overlap with
arXiv:1302.513
- …