221 research outputs found
The Maximum Common Subgraph Problem: A Parallel and Multi-Engine Approach
The maximum common subgraph of two graphs is the largest possible common subgraph,
i.e., the common subgraph with as many vertices as possible. Even if this problem is very challenging,
as it has been long proven NP-hard, its countless practical applications still motivates searching
for exact solutions. This work discusses the possibility to extend an existing, very effective
branch-and-bound procedure on parallel multi-core and many-core architectures. We analyze
a parallel multi-core implementation that exploits a divide-and-conquer approach based on a
thread pool, which does not deteriorate the original algorithmic efficiency and it minimizes
data structure repetitions. We also extend the original algorithm to parallel many-core GPU
architectures adopting the CUDA programming framework, and we show how to handle the heavily
workload-unbalance and the massive data dependency. Then, we suggest new heuristics to reorder
the adjacency matrix, to deal with “dead-ends”, and to randomize the search with automatic restarts.
These heuristics can achieve significant speed-ups on specific instances, even if they may not be competitive with the original strategy on average. Finally, we propose a portfolio approach, which integrates all the different local search algorithms as component tools; such portfolio, rather than choosing the best tool for a given instance up-front, takes the decision on-line. The proposed approach drastically limits memory bandwidth constraints and avoids other typical portfolio fragility as CPU and GPU versions often show a complementary efficiency and run on separated platforms. Experimental results support the claims and motivate further research to better exploit GPUs in embedded task-intensive and multi-engine parallel applications
DPP-PMRF: Rethinking Optimization for a Probabilistic Graphical Model Using Data-Parallel Primitives
We present a new parallel algorithm for probabilistic graphical model
optimization. The algorithm relies on data-parallel primitives (DPPs), which
provide portable performance over hardware architecture. We evaluate results on
CPUs and GPUs for an image segmentation problem. Compared to a serial baseline,
we observe runtime speedups of up to 13X (CPU) and 44X (GPU). We also compare
our performance to a reference, OpenMP-based algorithm, and find speedups of up
to 7X (CPU).Comment: LDAV 2018, October 201
Parallelizing Maximal Clique Enumeration on GPUs
We present a GPU solution for exact maximal clique enumeration (MCE) that
performs a search tree traversal following the Bron-Kerbosch algorithm. Prior
works on parallelizing MCE on GPUs perform a breadth-first traversal of the
tree, which has limited scalability because of the explosion in the number of
tree nodes at deep levels. We propose to parallelize MCE on GPUs by performing
depth-first traversal of independent subtrees in parallel. Since MCE suffers
from high load imbalance and memory capacity requirements, we propose a worker
list for dynamic load balancing, as well as partial induced subgraphs and a
compact representation of excluded vertex sets to regulate memory consumption.
Our evaluation shows that our GPU implementation on a single GPU outperforms
the state-of-the-art parallel CPU implementation by a geometric mean of 4.9x
(up to 16.7x), and scales efficiently to multiple GPUs. Our code has been
open-sourced to enable further research on accelerating MCE
Exhaustive Search-based Model for Hybrid Sensor Network
A new model for a cluster of hybrid sensors network with multi sub-clusters
is proposed. The model is in particular relevant to the early warning system in
a large scale monitoring system in, for example, a nuclear power plant. It
mainly addresses to a safety critical system which requires real-time processes
with high accuracy. The mathematical model is based on the extended
conventional search algorithm with certain interactions among the nearest
neighborhood of sensors. It is argued that the model could realize a highly
accurate decision support system with less number of parameters. A case of one
dimensional interaction function is discussed, and a simple algorithm for the
model is also given.Comment: 6 pages, Proceeding of the International Conference on Intelligent &
Advanced Systems 2012 pp. 557-56
Efficient Strategies for Graph Pattern Mining Algorithms on GPUs
Graph Pattern Mining (GPM) is an important, rapidly evolving, and computation
demanding area. GPM computation relies on subgraph enumeration, which consists
in extracting subgraphs that match a given property from an input graph.
Graphics Processing Units (GPUs) have been an effective platform to accelerate
applications in many areas. However, the irregularity of subgraph enumeration
makes it challenging for efficient execution on GPU due to typical uncoalesced
memory access, divergence, and load imbalance. Unfortunately, these aspects
have not been fully addressed in previous work. Thus, this work proposes novel
strategies to design and implement subgraph enumeration efficiently on GPU. We
support a depth-first search style search (DFS-wide) that maximizes memory
performance while providing enough parallelism to be exploited by the GPU,
along with a warp-centric design that minimizes execution divergence and
improves utilization of the computing capabilities. We also propose a low-cost
load balancing layer to avoid idleness and redistribute work among thread warps
in a GPU. Our strategies have been deployed in a system named DuMato, which
provides a simple programming interface to allow efficient implementation of
GPM algorithms. Our evaluation has shown that DuMato is often an order of
magnitude faster than state-of-the-art GPM systems and can mine larger
subgraphs (up to 12 vertices).Comment: Accepted for publication on IEEE 34th International Symposium on
Computer Architecture and High Performance Computing (SBAC-PAD'22
Problema de asignación quadrática (pac) sobre gpu a través de una pga maestro-esclavo
This document describes the implementation of a Master–Slave Parallel Genetic Algorithm (PGA) on Graphic Processing Units (GPU) to find solutions or solutions close to optimal solutions to particular instances of the Quadratic Assignment Problem (QAP). The efficiency of the algorithm is tested on a set of QAPLIB standard library problems.Este documento describe la implementación de un algoritmo genético paralelo maestroesclavo (AGP) en unidades de procesamiento gráfico (UPG) para encontrar soluciones o soluciones cercanas a soluciones óptimas para casos particulares del Problema de asignación Cuadrática (PAC). La eficiencia del algoritmo se prueba en un conjunto de problemas de la biblioteca estándar QAPLIB
- …