28,439 research outputs found
Recent Advances in Graph Partitioning
We survey recent trends in practical algorithms for balanced graph
partitioning together with applications and future research directions
Advanced Multilevel Node Separator Algorithms
A node separator of a graph is a subset S of the nodes such that removing S
and its incident edges divides the graph into two disconnected components of
about equal size. In this work, we introduce novel algorithms to find small
node separators in large graphs. With focus on solution quality, we introduce
novel flow-based local search algorithms which are integrated in a multilevel
framework. In addition, we transfer techniques successfully used in the graph
partitioning field. This includes the usage of edge ratings tailored to our
problem to guide the graph coarsening algorithm as well as highly localized
local search and iterated multilevel cycles to improve solution quality even
further. Experiments indicate that flow-based local search algorithms on its
own in a multilevel framework are already highly competitive in terms of
separator quality. Adding additional local search algorithms further improves
solution quality. Our strongest configuration almost always outperforms
competing systems while on average computing 10% and 62% smaller separators
than Metis and Scotch, respectively
Bipartite graph partitioning and data clustering
Many data types arising from data mining applications can be modeled as
bipartite graphs, examples include terms and documents in a text corpus,
customers and purchasing items in market basket analysis and reviewers and
movies in a movie recommender system. In this paper, we propose a new data
clustering method based on partitioning the underlying bipartite graph. The
partition is constructed by minimizing a normalized sum of edge weights between
unmatched pairs of vertices of the bipartite graph. We show that an approximate
solution to the minimization problem can be obtained by computing a partial
singular value decomposition (SVD) of the associated edge weight matrix of the
bipartite graph. We point out the connection of our clustering algorithm to
correspondence analysis used in multivariate analysis. We also briefly discuss
the issue of assigning data objects to multiple clusters. In the experimental
results, we apply our clustering algorithm to the problem of document
clustering to illustrate its effectiveness and efficiency.Comment: Proceedings of ACM CIKM 2001, the Tenth International Conference on
Information and Knowledge Management, 200
Scheduling Storms and Streams in the Cloud
Motivated by emerging big streaming data processing paradigms (e.g., Twitter
Storm, Streaming MapReduce), we investigate the problem of scheduling graphs
over a large cluster of servers. Each graph is a job, where nodes represent
compute tasks and edges indicate data-flows between these compute tasks. Jobs
(graphs) arrive randomly over time, and upon completion, leave the system. When
a job arrives, the scheduler needs to partition the graph and distribute it
over the servers to satisfy load balancing and cost considerations.
Specifically, neighboring compute tasks in the graph that are mapped to
different servers incur load on the network; thus a mapping of the jobs among
the servers incurs a cost that is proportional to the number of "broken edges".
We propose a low complexity randomized scheduling algorithm that, without
service preemptions, stabilizes the system with graph arrivals/departures; more
importantly, it allows a smooth trade-off between minimizing average
partitioning cost and average queue lengths. Interestingly, to avoid service
preemptions, our approach does not rely on a Gibbs sampler; instead, we show
that the corresponding limiting invariant measure has an interpretation
stemming from a loss system.Comment: 14 page
- …