1,743,221 research outputs found
Optimal Bounds for the -cut Problem
In the -cut problem, we want to find the smallest set of edges whose
deletion breaks a given (multi)graph into connected components. Algorithms
of Karger & Stein and Thorup showed how to find such a minimum -cut in time
approximately . The best lower bounds come from conjectures about
the solvability of the -clique problem, and show that solving -cut is
likely to require time . Recent results of Gupta, Lee & Li have
given special-purpose algorithms that solve the problem in time , and ones that have better performance for special classes of graphs
(e.g., for small integer weights).
In this work, we resolve the problem for general graphs, by showing that the
Contraction Algorithm of Karger outputs any fixed -cut of weight with probability , where
denotes the minimum -cut size. This also gives an extremal bound of
on the number of minimum -cuts and an algorithm to compute a
minimum -cut in similar runtime. Both are tight up to lower-order factors,
with the algorithmic lower bound assuming hardness of max-weight -clique.
The first main ingredient in our result is a fine-grained analysis of how the
graph shrinks -- and how the average degree evolves -- in the Karger process.
The second ingredient is an extremal bound on the number of cuts of size less
than , using the Sunflower lemma.Comment: Final version of arXiv:1911.09165 with new and more general proof
Diversifying Top-K Results
Top-k query processing finds a list of k results that have largest scores
w.r.t the user given query, with the assumption that all the k results are
independent to each other. In practice, some of the top-k results returned can
be very similar to each other. As a result some of the top-k results returned
are redundant. In the literature, diversified top-k search has been studied to
return k results that take both score and diversity into consideration. Most
existing solutions on diversified top-k search assume that scores of all the
search results are given, and some works solve the diversity problem on a
specific problem and can hardly be extended to general cases. In this paper, we
study the diversified top-k search problem. We define a general diversified
top-k search problem that only considers the similarity of the search results
themselves. We propose a framework, such that most existing solutions for top-k
query processing can be extended easily to handle diversified top-k search, by
simply applying three new functions, a sufficient stop condition sufficient(),
a necessary stop condition necessary(), and an algorithm for diversified top-k
search on the current set of generated results, div-search-current(). We
propose three new algorithms, namely, div-astar, div-dp, and div-cut to solve
the div-search-current() problem. div-astar is an A* based algorithm, div-dp is
an algorithm that decomposes the results into components which are searched
using div-astar independently and combined using dynamic programming. div-cut
further decomposes the current set of generated results using cut points and
combines the results using sophisticated operations. We conducted extensive
performance studies using two real datasets, enwiki and reuters. Our div-cut
algorithm finds the optimal solution for diversified top-k search problem in
seconds even for k as large as 2,000.Comment: VLDB201
Tight Continuous Relaxation of the Balanced -Cut Problem
Spectral Clustering as a relaxation of the normalized/ratio cut has become
one of the standard graph-based clustering methods. Existing methods for the
computation of multiple clusters, corresponding to a balanced -cut of the
graph, are either based on greedy techniques or heuristics which have weak
connection to the original motivation of minimizing the normalized cut. In this
paper we propose a new tight continuous relaxation for any balanced -cut
problem and show that a related recently proposed relaxation is in most cases
loose leading to poor performance in practice. For the optimization of our
tight continuous relaxation we propose a new algorithm for the difficult
sum-of-ratios minimization problem which achieves monotonic descent. Extensive
comparisons show that our method outperforms all existing approaches for ratio
cut and other balanced -cut criteria.Comment: Long version of paper accepted at NIPS 201
Sparsest Cut on Bounded Treewidth Graphs: Algorithms and Hardness Results
We give a 2-approximation algorithm for Non-Uniform Sparsest Cut that runs in
time , where is the treewidth of the graph. This improves on the
previous -approximation in time \poly(n) 2^{O(k)} due to
Chlamt\'a\v{c} et al.
To complement this algorithm, we show the following hardness results: If the
Non-Uniform Sparsest Cut problem has a -approximation for series-parallel
graphs (where ), then the Max Cut problem has an algorithm with
approximation factor arbitrarily close to . Hence, even for such
restricted graphs (which have treewidth 2), the Sparsest Cut problem is NP-hard
to approximate better than for ; assuming the
Unique Games Conjecture the hardness becomes . For
graphs with large (but constant) treewidth, we show a hardness result of assuming the Unique Games Conjecture.
Our algorithm rounds a linear program based on (a subset of) the
Sherali-Adams lift of the standard Sparsest Cut LP. We show that even for
treewidth-2 graphs, the LP has an integrality gap close to 2 even after
polynomially many rounds of Sherali-Adams. Hence our approach cannot be
improved even on such restricted graphs without using a stronger relaxation
- …