221 research outputs found

    The Maximum Common Subgraph Problem: A Parallel and Multi-Engine Approach

    Get PDF
    The maximum common subgraph of two graphs is the largest possible common subgraph, i.e., the common subgraph with as many vertices as possible. Even if this problem is very challenging, as it has been long proven NP-hard, its countless practical applications still motivates searching for exact solutions. This work discusses the possibility to extend an existing, very effective branch-and-bound procedure on parallel multi-core and many-core architectures. We analyze a parallel multi-core implementation that exploits a divide-and-conquer approach based on a thread pool, which does not deteriorate the original algorithmic efficiency and it minimizes data structure repetitions. We also extend the original algorithm to parallel many-core GPU architectures adopting the CUDA programming framework, and we show how to handle the heavily workload-unbalance and the massive data dependency. Then, we suggest new heuristics to reorder the adjacency matrix, to deal with “dead-ends”, and to randomize the search with automatic restarts. These heuristics can achieve significant speed-ups on specific instances, even if they may not be competitive with the original strategy on average. Finally, we propose a portfolio approach, which integrates all the different local search algorithms as component tools; such portfolio, rather than choosing the best tool for a given instance up-front, takes the decision on-line. The proposed approach drastically limits memory bandwidth constraints and avoids other typical portfolio fragility as CPU and GPU versions often show a complementary efficiency and run on separated platforms. Experimental results support the claims and motivate further research to better exploit GPUs in embedded task-intensive and multi-engine parallel applications

    DPP-PMRF: Rethinking Optimization for a Probabilistic Graphical Model Using Data-Parallel Primitives

    Full text link
    We present a new parallel algorithm for probabilistic graphical model optimization. The algorithm relies on data-parallel primitives (DPPs), which provide portable performance over hardware architecture. We evaluate results on CPUs and GPUs for an image segmentation problem. Compared to a serial baseline, we observe runtime speedups of up to 13X (CPU) and 44X (GPU). We also compare our performance to a reference, OpenMP-based algorithm, and find speedups of up to 7X (CPU).Comment: LDAV 2018, October 201

    Parallelizing Maximal Clique Enumeration on GPUs

    Full text link
    We present a GPU solution for exact maximal clique enumeration (MCE) that performs a search tree traversal following the Bron-Kerbosch algorithm. Prior works on parallelizing MCE on GPUs perform a breadth-first traversal of the tree, which has limited scalability because of the explosion in the number of tree nodes at deep levels. We propose to parallelize MCE on GPUs by performing depth-first traversal of independent subtrees in parallel. Since MCE suffers from high load imbalance and memory capacity requirements, we propose a worker list for dynamic load balancing, as well as partial induced subgraphs and a compact representation of excluded vertex sets to regulate memory consumption. Our evaluation shows that our GPU implementation on a single GPU outperforms the state-of-the-art parallel CPU implementation by a geometric mean of 4.9x (up to 16.7x), and scales efficiently to multiple GPUs. Our code has been open-sourced to enable further research on accelerating MCE

    Exhaustive Search-based Model for Hybrid Sensor Network

    Full text link
    A new model for a cluster of hybrid sensors network with multi sub-clusters is proposed. The model is in particular relevant to the early warning system in a large scale monitoring system in, for example, a nuclear power plant. It mainly addresses to a safety critical system which requires real-time processes with high accuracy. The mathematical model is based on the extended conventional search algorithm with certain interactions among the nearest neighborhood of sensors. It is argued that the model could realize a highly accurate decision support system with less number of parameters. A case of one dimensional interaction function is discussed, and a simple algorithm for the model is also given.Comment: 6 pages, Proceeding of the International Conference on Intelligent & Advanced Systems 2012 pp. 557-56

    Efficient Strategies for Graph Pattern Mining Algorithms on GPUs

    Full text link
    Graph Pattern Mining (GPM) is an important, rapidly evolving, and computation demanding area. GPM computation relies on subgraph enumeration, which consists in extracting subgraphs that match a given property from an input graph. Graphics Processing Units (GPUs) have been an effective platform to accelerate applications in many areas. However, the irregularity of subgraph enumeration makes it challenging for efficient execution on GPU due to typical uncoalesced memory access, divergence, and load imbalance. Unfortunately, these aspects have not been fully addressed in previous work. Thus, this work proposes novel strategies to design and implement subgraph enumeration efficiently on GPU. We support a depth-first search style search (DFS-wide) that maximizes memory performance while providing enough parallelism to be exploited by the GPU, along with a warp-centric design that minimizes execution divergence and improves utilization of the computing capabilities. We also propose a low-cost load balancing layer to avoid idleness and redistribute work among thread warps in a GPU. Our strategies have been deployed in a system named DuMato, which provides a simple programming interface to allow efficient implementation of GPM algorithms. Our evaluation has shown that DuMato is often an order of magnitude faster than state-of-the-art GPM systems and can mine larger subgraphs (up to 12 vertices).Comment: Accepted for publication on IEEE 34th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'22

    Problema de asignación quadrática (pac) sobre gpu a través de una pga maestro-esclavo

    Get PDF
    This document describes the implementation of a Master–Slave Parallel Genetic Algorithm (PGA) on Graphic Processing Units (GPU) to find solutions or solutions close to optimal solutions to particular instances of the Quadratic Assignment Problem (QAP). The efficiency of the algorithm is tested on a set of QAPLIB standard library problems.Este documento describe la implementación de un algoritmo genético paralelo maestroesclavo (AGP) en unidades de procesamiento gráfico (UPG) para encontrar soluciones o soluciones cercanas a soluciones óptimas para casos particulares del Problema de asignación Cuadrática (PAC). La eficiencia del algoritmo se prueba en un conjunto de problemas de la biblioteca estándar QAPLIB
    corecore