2,759 research outputs found
Tree-based Coarsening and Partitioning of Complex Networks
Many applications produce massive complex networks whose analysis would
benefit from parallel processing. Parallel algorithms, in turn, often require a
suitable network partition. For solving optimization tasks such as graph
partitioning on large networks, multilevel methods are preferred in practice.
Yet, complex networks pose challenges to established multilevel algorithms, in
particular to their coarsening phase.
One way to specify a (recursive) coarsening of a graph is to rate its edges
and then contract the edges as prioritized by the rating. In this paper we (i)
define weights for the edges of a network that express the edges' importance
for connectivity, (ii) compute a minimum weight spanning tree with
respect to these weights, and (iii) rate the network edges based on the
conductance values of 's fundamental cuts. To this end, we also (iv)
develop the first optimal linear-time algorithm to compute the conductance
values of \emph{all} fundamental cuts of a given spanning tree. We integrate
the new edge rating into a leading multilevel graph partitioner and equip the
latter with a new greedy postprocessing for optimizing the maximum
communication volume (MCV). Experiments on bipartitioning frequently used
benchmark networks show that the postprocessing already reduces MCV by 11.3%.
Our new edge rating further reduces MCV by 10.3% compared to the previously
best rating with the postprocessing in place for both ratings. In total, with a
modest increase in running time, our new approach reduces the MCV of complex
network partitions by 20.4%
Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable
There has been significant recent interest in parallel graph processing due
to the need to quickly analyze the large graphs available today. Many graph
codes have been designed for distributed memory or external memory. However,
today even the largest publicly-available real-world graph (the Hyperlink Web
graph with over 3.5 billion vertices and 128 billion edges) can fit in the
memory of a single commodity multicore server. Nevertheless, most experimental
work in the literature report results on much smaller graphs, and the ones for
the Hyperlink graph use distributed or external memory. Therefore, it is
natural to ask whether we can efficiently solve a broad class of graph problems
on this graph in memory.
This paper shows that theoretically-efficient parallel graph algorithms can
scale to the largest publicly-available graphs using a single machine with a
terabyte of RAM, processing them in minutes. We give implementations of
theoretically-efficient parallel algorithms for 20 important graph problems. We
also present the optimizations and techniques that we used in our
implementations, which were crucial in enabling us to process these large
graphs quickly. We show that the running times of our implementations
outperform existing state-of-the-art implementations on the largest real-world
graphs. For many of the problems that we consider, this is the first time they
have been solved on graphs at this scale. We have made the implementations
developed in this work publicly-available as the Graph-Based Benchmark Suite
(GBBS).Comment: This is the full version of the paper appearing in the ACM Symposium
on Parallelism in Algorithms and Architectures (SPAA), 201
Dense subgraph mining with a mixed graph model
In this paper we introduce a graph clustering method based on
dense bipartite subgraph mining. The method applies a mixed
graph model (both standard and bipartite) in a three-phase
algorithm. First a seed mining method is applied to find seeds
of clusters, the second phase consists of refining the seeds,
and in the third phase vertices outside the seeds are clustered.
The method is able to detect overlapping clusters, can handle
outliers and applicable without restrictions on the degrees of
vertices or the size of the clusters. The running time of the
method is polynomial. A theoretical result is introduced on
density bounds of bipartite subgraphs with size and local
density conditions. Test results on artificial datasets and
social interaction graphs are also presented
Massively Parallel Algorithms for Distance Approximation and Spanners
Over the past decade, there has been increasing interest in
distributed/parallel algorithms for processing large-scale graphs. By now, we
have quite fast algorithms -- usually sublogarithmic-time and often
-time, or even faster -- for a number of fundamental graph
problems in the massively parallel computation (MPC) model. This model is a
widely-adopted theoretical abstraction of MapReduce style settings, where a
number of machines communicate in an all-to-all manner to process large-scale
data. Contributing to this line of work on MPC graph algorithms, we present
round MPC algorithms for computing
-spanners in the strongly sublinear regime of local memory. To
the best of our knowledge, these are the first sublogarithmic-time MPC
algorithms for spanner construction. As primary applications of our spanners,
we get two important implications, as follows:
-For the MPC setting, we get an -round algorithm for
approximation of all pairs shortest paths (APSP) in the
near-linear regime of local memory. To the best of our knowledge, this is the
first sublogarithmic-time MPC algorithm for distance approximations.
-Our result above also extends to the Congested Clique model of distributed
computing, with the same round complexity and approximation guarantee. This
gives the first sub-logarithmic algorithm for approximating APSP in weighted
graphs in the Congested Clique model
GBMST: An Efficient Minimum Spanning Tree Clustering Based on Granular-Ball Computing
Most of the existing clustering methods are based on a single granularity of
information, such as the distance and density of each data. This most
fine-grained based approach is usually inefficient and susceptible to noise.
Therefore, we propose a clustering algorithm that combines multi-granularity
Granular-Ball and minimum spanning tree (MST). We construct coarsegrained
granular-balls, and then use granular-balls and MST to implement the clustering
method based on "large-scale priority", which can greatly avoid the influence
of outliers and accelerate the construction process of MST. Experimental
results on several data sets demonstrate the power of the algorithm. All codes
have been released at https://github.com/xjnine/GBMST
- …