13,199 research outputs found
Improved Cheeger's Inequality: Analysis of Spectral Partitioning Algorithms through Higher Order Spectral Gap
Let \phi(G) be the minimum conductance of an undirected graph G, and let
0=\lambda_1 <= \lambda_2 <=... <= \lambda_n <= 2 be the eigenvalues of the
normalized Laplacian matrix of G. We prove that for any graph G and any k >= 2,
\phi(G) = O(k) \lambda_2 / \sqrt{\lambda_k}, and this performance guarantee
is achieved by the spectral partitioning algorithm. This improves Cheeger's
inequality, and the bound is optimal up to a constant factor for any k. Our
result shows that the spectral partitioning algorithm is a constant factor
approximation algorithm for finding a sparse cut if \lambda_k$ is a constant
for some constant k. This provides some theoretical justification to its
empirical performance in image segmentation and clustering problems. We extend
the analysis to other graph partitioning problems, including multi-way
partition, balanced separator, and maximum cut
Relaxation-Based Coarsening for Multilevel Hypergraph Partitioning
Multilevel partitioning methods that are inspired by principles of
multiscaling are the most powerful practical hypergraph partitioning solvers.
Hypergraph partitioning has many applications in disciplines ranging from
scientific computing to data science. In this paper we introduce the concept of
algebraic distance on hypergraphs and demonstrate its use as an algorithmic
component in the coarsening stage of multilevel hypergraph partitioning
solvers. The algebraic distance is a vertex distance measure that extends
hyperedge weights for capturing the local connectivity of vertices which is
critical for hypergraph coarsening schemes. The practical effectiveness of the
proposed measure and corresponding coarsening scheme is demonstrated through
extensive computational experiments on a diverse set of problems. Finally, we
propose a benchmark of hypergraph partitioning problems to compare the quality
of other solvers
Partitioning into Expanders
Let G=(V,E) be an undirected graph, lambda_k be the k-th smallest eigenvalue
of the normalized laplacian matrix of G. There is a basic fact in algebraic
graph theory that lambda_k > 0 if and only if G has at most k-1 connected
components. We prove a robust version of this fact. If lambda_k>0, then for
some 1\leq \ell\leq k-1, V can be {\em partitioned} into l sets P_1,\ldots,P_l
such that each P_i is a low-conductance set in G and induces a high conductance
induced subgraph. In particular, \phi(P_i)=O(l^3\sqrt{\lambda_l}) and
\phi(G[P_i]) >= \lambda_k/k^2).
We make our results algorithmic by designing a simple polynomial time
spectral algorithm to find such partitioning of G with a quadratic loss in the
inside conductance of P_i's. Unlike the recent results on higher order
Cheeger's inequality [LOT12,LRTV12], our algorithmic results do not use higher
order eigenfunctions of G. If there is a sufficiently large gap between
lambda_k and lambda_{k+1}, more precisely, if \lambda_{k+1} >= \poly(k)
lambda_{k}^{1/4} then our algorithm finds a k partitioning of V into sets
P_1,...,P_k such that the induced subgraph G[P_i] has a significantly larger
conductance than the conductance of P_i in G. Such a partitioning may represent
the best k clustering of G. Our algorithm is a simple local search that only
uses the Spectral Partitioning algorithm as a subroutine. We expect to see
further applications of this simple algorithm in clustering applications
Recent Advances in Graph Partitioning
We survey recent trends in practical algorithms for balanced graph
partitioning together with applications and future research directions
Algorithmic and Statistical Perspectives on Large-Scale Data Analysis
In recent years, ideas from statistics and scientific computing have begun to
interact in increasingly sophisticated and fruitful ways with ideas from
computer science and the theory of algorithms to aid in the development of
improved worst-case algorithms that are useful for large-scale scientific and
Internet data analysis problems. In this chapter, I will describe two recent
examples---one having to do with selecting good columns or features from a (DNA
Single Nucleotide Polymorphism) data matrix, and the other having to do with
selecting good clusters or communities from a data graph (representing a social
or information network)---that drew on ideas from both areas and that may serve
as a model for exploiting complementary algorithmic and statistical
perspectives in order to solve applied large-scale data analysis problems.Comment: 33 pages. To appear in Uwe Naumann and Olaf Schenk, editors,
"Combinatorial Scientific Computing," Chapman and Hall/CRC Press, 201
- …