33,629 research outputs found
PT-Scotch: A tool for efficient parallel graph ordering
The parallel ordering of large graphs is a difficult problem, because on the
one hand minimum degree algorithms do not parallelize well, and on the other
hand the obtainment of high quality orderings with the nested dissection
algorithm requires efficient graph bipartitioning heuristics, the best
sequential implementations of which are also hard to parallelize. This paper
presents a set of algorithms, implemented in the PT-Scotch software package,
which allows one to order large graphs in parallel, yielding orderings the
quality of which is only slightly worse than the one of state-of-the-art
sequential algorithms. Our implementation uses the classical nested dissection
approach but relies on several novel features to solve the parallel graph
bipartitioning problem. Thanks to these improvements, PT-Scotch produces
consistently better orderings than ParMeTiS on large numbers of processors
Graphs with many strong orientations
We establish mild conditions under which a possibly irregular, sparse graph
has "many" strong orientations. Given a graph on vertices, orient
each edge in either direction with probability independently. We show
that if satisfies a minimum degree condition of and has
Cheeger constant at least , then the
resulting randomly oriented directed graph is strongly connected with high
probability. This Cheeger constant bound can be replaced by an analogous
spectral condition via the Cheeger inequality. Additionally, we provide an
explicit construction to show our minimum degree condition is tight while the
Cheeger constant bound is tight up to a factor.Comment: 14 pages, 4 figures; revised version includes more background and
minor changes that better clarify the expositio
Performance Characterization of Multi-threaded Graph Processing Applications on Intel Many-Integrated-Core Architecture
Intel Xeon Phi many-integrated-core (MIC) architectures usher in a new era of
terascale integration. Among emerging killer applications, parallel graph
processing has been a critical technique to analyze connected data. In this
paper, we empirically evaluate various computing platforms including an Intel
Xeon E5 CPU, a Nvidia Geforce GTX1070 GPU and an Xeon Phi 7210 processor
codenamed Knights Landing (KNL) in the domain of parallel graph processing. We
show that the KNL gains encouraging performance when processing graphs, so that
it can become a promising solution to accelerating multi-threaded graph
applications. We further characterize the impact of KNL architectural
enhancements on the performance of a state-of-the art graph framework.We have
four key observations: 1 Different graph applications require distinctive
numbers of threads to reach the peak performance. For the same application,
various datasets need even different numbers of threads to achieve the best
performance. 2 Only a few graph applications benefit from the high bandwidth
MCDRAM, while others favor the low latency DDR4 DRAM. 3 Vector processing units
executing AVX512 SIMD instructions on KNLs are underutilized when running the
state-of-the-art graph framework. 4 The sub-NUMA cache clustering mode offering
the lowest local memory access latency hurts the performance of graph
benchmarks that are lack of NUMA awareness. At last, We suggest future works
including system auto-tuning tools and graph framework optimizations to fully
exploit the potential of KNL for parallel graph processing.Comment: published as L. Jiang, L. Chen and J. Qiu, "Performance
Characterization of Multi-threaded Graph Processing Applications on
Many-Integrated-Core Architecture," 2018 IEEE International Symposium on
Performance Analysis of Systems and Software (ISPASS), Belfast, United
Kingdom, 2018, pp. 199-20
Multiscale approach for the network compression-friendly ordering
We present a fast multiscale approach for the network minimum logarithmic
arrangement problem. This type of arrangement plays an important role in a
network compression and fast node/link access operations. The algorithm is of
linear complexity and exhibits good scalability which makes it practical and
attractive for using on large-scale instances. Its effectiveness is
demonstrated on a large set of real-life networks. These networks with
corresponding best-known minimization results are suggested as an open
benchmark for a research community to evaluate new methods for this problem
- …