16 research outputs found

    Dynamic Configuration of CUDA Runtime Variables for CDP-based Divide-and-Conquer Algorithms

    Get PDF
    International audienceCUDA Dynamic Parallelism (CDP) is an extension of the GPGPU programming model proposed to better address irregular applications and recursive patterns of computation. However, processing memory demanding problems by using CDP is not straightforward, because of its particular memory organization. This work presents an algorithm to deal with such an issue. It dynamically calculates and configures the CDP runtime variables and the GPU heap on the basis of an analysis of the partial backtracking tree. The proposed algorithm was implemented for solving permutation combinatorial problems and experimented on two test-cases: N-Queens and the Asymmetric Travelling Salesman Problem. The proposed algorithm allows different CDP-based backtracking from the literature to solve memory demanding problems, adaptively with respect to the number of recursive kernel generations and the presence of dynamic allocations on GPU

    Building and Combining Matching Algorithms

    Get PDF
    International audienceThe concept of matching is ubiquitous in declarative programming and in automated reasoning. For instance, it is a key mechanism to run rule-based programs and to simplify clauses generated by theorem provers. A matching problem can be seen as a particular conjunction of equations where each equation has a ground side. We give an overview of techniques that can be applied to build and combine matching algorithms. First, we survey mutation-based techniques as a way to build a generic matching algorithm for a large class of equational theories. Second, combination techniques are introduced to get combined matching algorithms for disjoint unions of theories. Then we show how these combination algorithms can be extended to handle non-disjoint unions of theories sharing only constructors. These extensions are possible if an appropriate notion of normal form is computable

    A GPU-Based Backtracking Algorithm for Permutation Combinatorial Problems

    No full text
    International audienceThis work presents a GPU-based backtracking algorithm for permutation combinatorial problems based on the Integer-Vector-Matrix (IVM) data structure. IVM is a data structure dedicated to permutation combinatorial optimization problems. In this algorithm, the load balancing is performed without intervention of the CPU, inside a work stealing phase invoked after each node expansion phase. The proposed work stealing approach uses a virtual n-dimensional hypercube topology and a triggering mechanism to reduce the overhead incurred by dynamic load balancing. We have implemented this new algorithm for solving instances of the Asymmetric Travelling Salesman Problem by implicit enumeration, a scenario where the cost of node evaluation is low, compared to the overall search procedure. Experimental results show that the dynamically load balanced IVM-algorithm reaches speed-ups up to 17X over a serial implementation using a bitset-data structure and up to 2X over its GPU counterpart

    11

    Get PDF
    Linguistic primitives for replica-aware coordination offer suitable solutions to the challenging problems of data distribution and locality in large-scale high-performance computing. The data replication mechanisms that had previously been designed to extend Klaim with replicated tuples are now used to experiment with X10, a parallel programming language primarily targeting clusters of multi-core processors linked in a large-scale system via high-performance networks. Our approach aims at allowing the programmer to specify and coordinate the replication of shared data items by taking into account the desired consistency properties. The programmer can hence exploit such flexible mechanisms to adapt data distribution and locality to the needs of the application, in order to improve performance in terms of concurrency and data access. We investigate issues related to replica consistency and provide a performance analysis, which includes scenarios where replica-based specifications and relaxed consistency provide significant performance gains
    corecore