204 research outputs found

    Multi-GPU Graph Analytics

    Full text link
    We present a single-node, multi-GPU programmable graph processing library that allows programmers to easily extend single-GPU graph algorithms to achieve scalable performance on large graphs with billions of edges. Directly using the single-GPU implementations, our design only requires programmers to specify a few algorithm-dependent concerns, hiding most multi-GPU related implementation details. We analyze the theoretical and practical limits to scalability in the context of varying graph primitives and datasets. We describe several optimizations, such as direction optimizing traversal, and a just-enough memory allocation scheme, for better performance and smaller memory consumption. Compared to previous work, we achieve best-of-class performance across operations and datasets, including excellent strong and weak scalability on most primitives as we increase the number of GPUs in the system.Comment: 12 pages. Final version submitted to IPDPS 201

    Using Graph Properties to Speed-up GPU-based Graph Traversal: A Model-driven Approach

    Get PDF
    While it is well-known and acknowledged that the performance of graph algorithms is heavily dependent on the input data, there has been surprisingly little research to quantify and predict the impact the graph structure has on performance. Parallel graph algorithms, running on many-core systems such as GPUs, are no exception: most research has focused on how to efficiently implement and tune different graph operations on a specific GPU. However, the performance impact of the input graph has only been taken into account indirectly as a result of the graphs used to benchmark the system. In this work, we present a case study investigating how to use the properties of the input graph to improve the performance of the breadth-first search (BFS) graph traversal. To do so, we first study the performance variation of 15 different BFS implementations across 248 graphs. Using this performance data, we show that significant speed-up can be achieved by combining the best implementation for each level of the traversal. To make use of this data-dependent optimization, we must correctly predict the relative performance of algorithms per graph level, and enable dynamic switching to the optimal algorithm for each level at runtime. We use the collected performance data to train a binary decision tree, to enable high-accuracy predictions and fast switching. We demonstrate empirically that our decision tree is both fast enough to allow dynamic switching between implementations, without noticeable overhead, and accurate enough in its prediction to enable significant BFS speedup. We conclude that our model-driven approach (1) enables BFS to outperform state of the art GPU algorithms, and (2) can be adapted for other BFS variants, other algorithms, or more specific datasets

    Extracting Multi-objective Multigraph Features for the Shortest Path Cost Prediction: Statistics-based or Learning-based?

    Get PDF
    Efficient airport airside ground movement (AAGM) is key to successful operations of urban air mobility. Recent studies have introduced the use of multi-objective multigraphs (MOMGs) as the conceptual prototype to formulate AAGM. Swift calculation of the shortest path costs is crucial for the algorithmic heuristic search on MOMGs, however, previous work chiefly focused on single-objective simple graphs (SOSGs), treated cost enquires as search problems, and failed to keep a low level of computational time and storage complexity. This paper concentrates on the conceptual prototype MOMG, and investigates its node feature extraction, which lays the foundation for efficient prediction of shortest path costs. Two extraction methods are implemented and compared: a statistics-based method that summarises 22 node physical patterns from graph theory principles, and a learning-based method that employs node embedding technique to encode graph structures into a discriminative vector space. The former method can effectively evaluate the node physical patterns and reveals their individual importance for distance prediction, while the latter provides novel practices on processing multigraphs for node embedding algorithms that can merely handle SOSGs. Three regression models are applied to predict the shortest path costs to demonstrate the performance of each. Our experiments on randomly generated benchmark MOMGs show that (i) the statistics-based method underperforms on characterising small distance values due to severe overestimation, (ii) a subset of essential physical patterns can achieve comparable or slightly better prediction accuracy than that based on a complete set of patterns, and (iii) the learning-based method consistently outperforms the statistics-based method, while maintaining a competitive level of computational complexity

    Extracting Multi-objective Multigraph Features for the Shortest Path Cost Prediction: Statistics-based or Learning-based?

    Get PDF
    Efficient airport airside ground movement (AAGM) is key to successful operations of urban air mobility. Recent studies have introduced the use of multi-objective multigraphs (MOMGs) as the conceptual prototype to formulate AAGM. Swift calculation of the shortest path costs is crucial for the algorithmic heuristic search on MOMGs, however, previous work chiefly focused on single-objective simple graphs (SOSGs), treated cost enquires as search problems, and failed to keep a low level of computational time and storage complexity. This paper concentrates on the conceptual prototype MOMG, and investigates its node feature extraction, which lays the foundation for efficient prediction of shortest path costs. Two extraction methods are implemented and compared: a statistics-based method that summarises 22 node physical patterns from graph theory principles, and a learning-based method that employs node embedding technique to encode graph structures into a discriminative vector space. The former method can effectively evaluate the node physical patterns and reveals their individual importance for distance prediction, while the latter provides novel practices on processing multigraphs for node embedding algorithms that can merely handle SOSGs. Three regression models are applied to predict the shortest path costs to demonstrate the performance of each. Our experiments on randomly generated benchmark MOMGs show that (i) the statistics-based method underperforms on characterising small distance values due to severe overestimation, (ii) a subset of essential physical patterns can achieve comparable or slightly better prediction accuracy than that based on a complete set of patterns, and (iii) the learning-based method consistently outperforms the statistics-based method, while maintaining a competitive level of computational complexity

    Algorithms for Analyzing and Mining Real-World Graphs

    Get PDF
    This thesis is about algorithms for analyzing large real-world graphs (or networks). Examples include (online) social networks, webgraphs, information networks, biological networks and scientific collaboration and citation networks. Although these graphs differ in terms of what kind of information the objects and relationships represent, it turns out that the structure of each these networks is surprisingly similar.For computer scientists, there is an obvious challenge to design efficient algorithms that allow large graphs to be processed and analyzed in a practical setting, facing the challenges of processing millions of nodes and billions of edges. Specifically, there is an opportunity to exploit the non-random structure of real-world graphs to efficiently compute or approximate various properties and measures that would be too hard to compute using traditional graph algorithms. Examples include computation of node-to-node distances and extreme distance measures such as the exact diameter and radius of a graph.NWOAlgorithms and the Foundations of Software technolog

    Subgraph Similarity Search in Large Graphs

    Get PDF
    One of the major challenges in applications related to social networks, computational biology, collaboration networks etc., is to efficiently search for similar patterns in their underlying graphs. These graphs are typically noisy and contain thousands of vertices and millions of edges. In many cases, the graphs are unlabelled and the notion of similarity is also not well defined. We study the problem of searching an induced sub graph in a large target graph that is most similar to the given query graph. We assume that the query graph and target graph are undirected and unlabelled. We use graph let kernels [1] to define graph similarity. Graph let kernels are known to perform better than other kernels in different applications

    Review of Extreme Multilabel Classification

    Full text link
    Extreme multilabel classification or XML, is an active area of interest in machine learning. Compared to traditional multilabel classification, here the number of labels is extremely large, hence, the name extreme multilabel classification. Using classical one versus all classification wont scale in this case due to large number of labels, same is true for any other classifiers. Embedding of labels as well as features into smaller label space is an essential first step. Moreover, other issues include existence of head and tail labels, where tail labels are labels which exist in relatively smaller number of given samples. The existence of tail labels creates issues during embedding. This area has invited application of wide range of approaches ranging from bit compression motivated from compressed sensing, tree based embeddings, deep learning based latent space embedding including using attention weights, linear algebra based embeddings such as SVD, clustering, hashing, to name a few. The community has come up with a useful set of metrics to identify correctly the prediction for head or tail labels.Comment: 46 pages, 13 figure
    corecore