1,349 research outputs found
A visual Analytics System for Optimizing Communications in Massively Parallel Applications
Current and future supercomputers have tens of thousands of compute nodes interconnected with high-dimensional networks and complex network topologies for improved performance. Application developers are required to write scalable parallel programs in order to achieve high throughput on these machines. Application performance is largely determined by efficient inter-process communication. A common way to analyze and optimize performance is through profiling parallel codes to identify communication bottlenecks. However, understanding gigabytes of profile data is not a trivial task. In this paper, we present a visual analytics system for identifying the scalability bottlenecks and improving the communication efficiency of massively parallel applications. Visualization methods used in this system are designed to comprehend large-scale and varied communication patterns on thousands of nodes in complex networks such as the 5D torus and the dragonfly. We also present efficient rerouting and remapping algorithms that can be coupled with our interactive visual analytics design for performance optimization. We demonstrate the utility of our system with several case studies using three benchmark applications on two leading supercomputers. The mapping suggestion from our system led to 38% improvement in hop-bytes for MiniAMR application on 4,096 MPI processes.This research has been sponsored in part by the U.S. National Science Foundation through grant IIS-1320229, and the U.S. Department of Energy through grants DE-SC0012610 and DE-SC0014917. This research has been funded in part and used resources of the Argonne Leadership Computing Facility at Argonne National Lab- oratory, which is supported by the Office of Science of the U.S. Department of Energy under contract no. DE-AC02-06CH11357. This work was supported in part by the DOE Office of Science, ASCR, under award numbers 57L38, 57L32, 57L11, 57K50, and 508050
Recent Advances in Graph Partitioning
We survey recent trends in practical algorithms for balanced graph
partitioning together with applications and future research directions
Distributed-Memory Breadth-First Search on Massive Graphs
This chapter studies the problem of traversing large graphs using the
breadth-first search order on distributed-memory supercomputers. We consider
both the traditional level-synchronous top-down algorithm as well as the
recently discovered direction optimizing algorithm. We analyze the performance
and scalability trade-offs in using different local data structures such as CSR
and DCSC, enabling in-node multithreading, and graph decompositions such as 1D
and 2D decomposition.Comment: arXiv admin note: text overlap with arXiv:1104.451
PT-Scotch: A tool for efficient parallel graph ordering
The parallel ordering of large graphs is a difficult problem, because on the
one hand minimum degree algorithms do not parallelize well, and on the other
hand the obtainment of high quality orderings with the nested dissection
algorithm requires efficient graph bipartitioning heuristics, the best
sequential implementations of which are also hard to parallelize. This paper
presents a set of algorithms, implemented in the PT-Scotch software package,
which allows one to order large graphs in parallel, yielding orderings the
quality of which is only slightly worse than the one of state-of-the-art
sequential algorithms. Our implementation uses the classical nested dissection
approach but relies on several novel features to solve the parallel graph
bipartitioning problem. Thanks to these improvements, PT-Scotch produces
consistently better orderings than ParMeTiS on large numbers of processors
Parallel Graph Partitioning for Complex Networks
Processing large complex networks like social networks or web graphs has
recently attracted considerable interest. In order to do this in parallel, we
need to partition them into pieces of about equal size. Unfortunately, previous
parallel graph partitioners originally developed for more regular mesh-like
networks do not work well for these networks. This paper addresses this problem
by parallelizing and adapting the label propagation technique originally
developed for graph clustering. By introducing size constraints, label
propagation becomes applicable for both the coarsening and the refinement phase
of multilevel graph partitioning. We obtain very high quality by applying a
highly parallel evolutionary algorithm to the coarsened graph. The resulting
system is both more scalable and achieves higher quality than state-of-the-art
systems like ParMetis or PT-Scotch. For large complex networks the performance
differences are very big. For example, our algorithm can partition a web graph
with 3.3 billion edges in less than sixteen seconds using 512 cores of a high
performance cluster while producing a high quality partition -- none of the
competing systems can handle this graph on our system.Comment: Review article. Parallelization of our previous approach
arXiv:1402.328
SelectionConv: Convolutional Neural Networks for Non-rectilinear Image Data
Convolutional Neural Networks have revolutionized vision applications. There
are image domains and representations, however, that cannot be handled by
standard CNNs (e.g., spherical images, superpixels). Such data are usually
processed using networks and algorithms specialized for each type. In this
work, we show that it may not always be necessary to use specialized neural
networks to operate on such spaces. Instead, we introduce a new structured
graph convolution operator that can copy 2D convolution weights, transferring
the capabilities of already trained traditional CNNs to our new graph network.
This network can then operate on any data that can be represented as a
positional graph. By converting non-rectilinear data to a graph, we can apply
these convolutions on these irregular image domains without requiring training
on large domain-specific datasets. Results of transferring pre-trained image
networks for segmentation, stylization, and depth prediction are demonstrated
for a variety of such data forms.Comment: To be presented at ECCV 202
- …