277 research outputs found
An Efficient Implementation of a Subgraph Isomorphism Algorithm for GPUs.
The subgraph isomorphism problem is a computational task that applies to a wide range of today's applications, ranging from the understanding of biological networks to the analysis of social networks. Even though different implementations for CPUs have been proposed to improve the efficiency of such a graph search algorithm, they have shown to be bounded by the intrinsic sequential nature of the algorithm. More recently, graphics processing units (GPUs) have become widespread platforms that provide massive parallelism at low cost. Nevertheless, parallelizing any efficient and optimized sequential algorithm for subgraph isomorphism on many-core architectures is a very challenging task. This article presents
, a parallel implementation of the subgraph isomorphism algorithm for GPUs. Different strategies are implemented in
to deal with the space complexity of the graph searching algorithm, the potential workload imbalance, and the thread divergence involved by the non-homogeneity of actual graphs. The paper presents the results obtained on several graphs of different sizes and characteristics to understand the efficiency of the proposed approach
cuTS: Scaling Subgraph Isomorphism on Distributed Multi-GPU Systems Using Trie Based Data Structure
Subgraph isomorphism is a pattern-matching algorithm widely used in many domains such as chem-informatics, bioinformatics, databases, and social network analysis. It is computationally expensive and is a proven NP-hard problem. The massive parallelism in GPUs is well suited for solving subgraph isomorphism. However, current GPU implementations are far from the achievable performance. Moreover, the enormous memory requirement of current approaches limits the problem size that can be handled. This work analyzes the fundamental challenges associated with processing subgraph isomorphism on GPUs and develops an efficient GPU implementation. We also develop a GPU-friendly trie-based data structure to drastically reduce the intermediate storage space requirement, enabling large benchmarks to be processed. We also develop the first distributed sub-graph isomorphism algorithm for GPUs. Our experimental evaluation demonstrates the efficacy of our approach by comparing the execution time and number of cases that can be handled against the state-of-the-art GPU implementations
Activity recognition from videos with parallel hypergraph matching on GPUs
In this paper, we propose a method for activity recognition from videos based
on sparse local features and hypergraph matching. We benefit from special
properties of the temporal domain in the data to derive a sequential and fast
graph matching algorithm for GPUs.
Traditionally, graphs and hypergraphs are frequently used to recognize
complex and often non-rigid patterns in computer vision, either through graph
matching or point-set matching with graphs. Most formulations resort to the
minimization of a difficult discrete energy function mixing geometric or
structural terms with data attached terms involving appearance features.
Traditional methods solve this minimization problem approximately, for instance
with spectral techniques.
In this work, instead of solving the problem approximatively, the exact
solution for the optimal assignment is calculated in parallel on GPUs. The
graphical structure is simplified and regularized, which allows to derive an
efficient recursive minimization algorithm. The algorithm distributes
subproblems over the calculation units of a GPU, which solves them in parallel,
allowing the system to run faster than real-time on medium-end GPUs
Efficient Strategies for Graph Pattern Mining Algorithms on GPUs
Graph Pattern Mining (GPM) is an important, rapidly evolving, and computation
demanding area. GPM computation relies on subgraph enumeration, which consists
in extracting subgraphs that match a given property from an input graph.
Graphics Processing Units (GPUs) have been an effective platform to accelerate
applications in many areas. However, the irregularity of subgraph enumeration
makes it challenging for efficient execution on GPU due to typical uncoalesced
memory access, divergence, and load imbalance. Unfortunately, these aspects
have not been fully addressed in previous work. Thus, this work proposes novel
strategies to design and implement subgraph enumeration efficiently on GPU. We
support a depth-first search style search (DFS-wide) that maximizes memory
performance while providing enough parallelism to be exploited by the GPU,
along with a warp-centric design that minimizes execution divergence and
improves utilization of the computing capabilities. We also propose a low-cost
load balancing layer to avoid idleness and redistribute work among thread warps
in a GPU. Our strategies have been deployed in a system named DuMato, which
provides a simple programming interface to allow efficient implementation of
GPM algorithms. Our evaluation has shown that DuMato is often an order of
magnitude faster than state-of-the-art GPM systems and can mine larger
subgraphs (up to 12 vertices).Comment: Accepted for publication on IEEE 34th International Symposium on
Computer Architecture and High Performance Computing (SBAC-PAD'22
- …