5 research outputs found
Graphs, Matrices, and the GraphBLAS: Seven Good Reasons
The analysis of graphs has become increasingly important to a wide range of
applications. Graph analysis presents a number of unique challenges in the
areas of (1) software complexity, (2) data complexity, (3) security, (4)
mathematical complexity, (5) theoretical analysis, (6) serial performance, and
(7) parallel performance. Implementing graph algorithms using matrix-based
approaches provides a number of promising solutions to these challenges. The
GraphBLAS standard (istc- bigdata.org/GraphBlas) is being developed to bring
the potential of matrix based graph algorithms to the broadest possible
audience. The GraphBLAS mathematically defines a core set of matrix-based graph
operations that can be used to implement a wide class of graph algorithms in a
wide range of programming environments. This paper provides an introduction to
the GraphBLAS and describes how the GraphBLAS can be used to address many of
the challenges associated with analysis of graphs.Comment: 10 pages; International Conference on Computational Science workshop
on the Applications of Matrix Computational Methods in the Analysis of Modern
Dat
Enabling Massive Deep Neural Networks with the GraphBLAS
Deep Neural Networks (DNNs) have emerged as a core tool for machine learning.
The computations performed during DNN training and inference are dominated by
operations on the weight matrices describing the DNN. As DNNs incorporate more
stages and more nodes per stage, these weight matrices may be required to be
sparse because of memory limitations. The GraphBLAS.org math library standard
was developed to provide high performance manipulation of sparse weight
matrices and input/output vectors. For sufficiently sparse matrices, a sparse
matrix library requires significantly less memory than the corresponding dense
matrix implementation. This paper provides a brief description of the
mathematics underlying the GraphBLAS. In addition, the equations of a typical
DNN are rewritten in a form designed to use the GraphBLAS. An implementation of
the DNN is given using a preliminary GraphBLAS C library. The performance of
the GraphBLAS implementation is measured relative to a standard dense linear
algebra library implementation. For various sizes of DNN weight matrices, it is
shown that the GraphBLAS sparse implementation outperforms a BLAS dense
implementation as the weight matrix becomes sparser.Comment: 10 pages, 7 figures, to appear in the 2017 IEEE High Performance
Extreme Computing (HPEC) conferenc
Distributed Community Detection with the WCC Metric
Community detection has become an extremely active area of research in recent
years, with researchers proposing various new metrics and algorithms to address
the problem. Recently, the Weighted Community Clustering (WCC) metric was
proposed as a novel way to judge the quality of a community partitioning based
on the distribution of triangles in the graph, and was demonstrated to yield
superior results over other commonly used metrics like modularity. The same
authors later presented a parallel algorithm for optimizing WCC on large
graphs. In this paper, we propose a new distributed, vertex-centric algorithm
for community detection using the WCC metric. Results are presented that
demonstrate the algorithm's performance and scalability on up to 32 worker
machines and real graphs of up to 1.8 billion vertices. The algorithm scales
best with the largest graphs, and to our knowledge, it is the first distributed
algorithm for optimizing the WCC metric.Comment: 6 pages, 6 figure
Parallel heuristics for scalable community detection
AbstractCommunity detection has become a fundamental operation in numerous graph-theoretic applications. It is used to reveal natural divisions that exist within real world networks without imposing prior size or cardinality constraints on the set of communities. Despite its potential for application, there is only limited support for community detection on large-scale parallel computers, largely owing to the irregular and inherently sequential nature of the underlying heuristics. In this paper, we present parallelization heuristics for fast community detection using the Louvain method as the serial template. The Louvain method is a multi-phase, iterative heuristic for modularity optimization. Originally developed by Blondel et al. (2008), the method has become increasingly popular owing to its ability to detect high modularity community partitions in a fast and memory-efficient manner. However, the method is also inherently sequential, thereby limiting its scalability. Here, we observe certain key properties of this method that present challenges for its parallelization, and consequently propose heuristics that are designed to break the sequential barrier. For evaluation purposes, we implemented our heuristics using OpenMP multithreading, and tested them over real world graphs derived from multiple application domains (e.g., internet, citation, biological). Compared to the serial Louvain implementation, our parallel implementation is able to produce community outputs with a higher modularity for most of the inputs tested, in comparable number or fewer iterations, while providing absolute speedups of up to 16Ă— using 32 threads
Scalable multi-threaded community detection in social networks
Abstract—The volume of existing graphstructured data requires improved parallel tools and algorithms. Finding communities, smaller subgraphs densely connected within the subgraph than to the rest of the graph, plays a role both in developing new parallel algorithms as well as opening smaller portions of the data to current analysis tools. We improve performance of our parallel community detection algorithm by 20 % on the massively multithreaded Cray XMT, evaluate its performance on the next-generation Cray XMT2, and extend its reach to Intel-based platforms with OpenMP. To our knowledge, not only is this the first massively parallel community detection algorithm but also the only such algorithm that achieves excellent performance and good parallel scalability across all these platforms. Our implementation analyzes a moderate sized graph with 105 million vertices and 3.3 billion edges in around 500 seconds on a four processor, 80-logical-core Intel-based system and 1100 seconds on a 64-processor Cray XMT2