215 research outputs found

    An Incremental Parallel PGAS-based Tree Search Algorithm

    Get PDF
    International audienceIn this work, we show that the Chapel high-productivity language is suitable for the design and implementation of all aspects involved in the conception of parallel tree search algorithms for solving combinatorial problems. Initially, it is possible to hand-optimize the data structures involved in the search process in a way equivalent to C. As a consequence, the single-threaded search in Chapel is on average only 7% slower than its counterpart written in C. Whereas programming a multicore tree search in Chapel is equivalent to C-OpenMP in terms of performance and programmability, its productivity-aware features for distributed programming stand out. It is possible to incrementally conceive a distributed tree search algorithm starting from its multicore counterpart by adding few lines of code. The distributed implementation performs load balancing among different computer nodes and also exploits all CPU cores of the system. Chapel presents an interesting trade-off between programmability and performance despite the high level of its features. The distributed tree search in Chapel is on average 16% slower and reaches up to 80% of the scalability achieved by its C-MPI+OpenMP counterpart

    A GPU-accelerated Branch-and-Bound Algorithm for the Flow-Shop Scheduling Problem

    Get PDF
    Branch-and-Bound (B&B) algorithms are time intensive tree-based exploration methods for solving to optimality combinatorial optimization problems. In this paper, we investigate the use of GPU computing as a major complementary way to speed up those methods. The focus is put on the bounding mechanism of B&B algorithms, which is the most time consuming part of their exploration process. We propose a parallel B&B algorithm based on a GPU-accelerated bounding model. The proposed approach concentrate on optimizing data access management to further improve the performance of the bounding mechanism which uses large and intermediate data sets that do not completely fit in GPU memory. Extensive experiments of the contribution have been carried out on well known FSP benchmarks using an Nvidia Tesla C2050 GPU card. We compared the obtained performances to a single and a multithreaded CPU-based execution. Accelerations up to x100 are achieved for large problem instances

    An Adaptative Multi-GPU based Branch-and-Bound. A Case Study: the Flow-Shop Scheduling Problem

    Get PDF
    Solving exactly Combinatorial Optimization Problems (COPs) using a Branch-and-Bound (B&B) algorithm requires a huge amount of computational resources. Therefore, we recently investigated designing B&B algorithms on top of graphics processing units (GPUs) using a parallel bounding model. The proposed model assumes parallelizing the evaluation of the lower bounds on pools of sub-problems. The results demonstrated that the size of the evaluated pool has a significant impact on the performance of B&B and that it depends strongly on the problem instance being solved. In this paper, we design an adaptative parallel B&B algorithm for solving permutation-based combinatorial optimization problems such as FSP (Flow-shop Scheduling Problem) on GPU accelerators. To do so, we propose a dynamic heuristic for parameter auto-tuning at runtime. Another challenge of this work is to exploit larger degrees of parallelism by using the combined computational power of multiple GPU devices. The approach has been applied to the permutation flow-shop problem. Extensive experiments have been carried out on well-known FSP benchmarks using an Nvidia Tesla S1070 Computing System equipped with two Tesla T10 GPUs. Compared to a CPU-based execution, accelerations up to 105 are achieved for large problem instances.Comment: 14th IEEE International Conference on High Performance Computing and Communications, HPCC 2012 (2012

    Adaptive Dynamic Load Balancing in Heterogenous Multiple GPUs-CPUs Distributed Setting: Case Study of B&B Tree Search

    Get PDF
    International audienceThe emergence of new hybrid and heterogenous multi-GPU multi-CPU large scale platforms offers new opportunities and pauses new challenges when solving difficult optimization problems. This paper targets irregular tree search algorithms in which workload is unpredictable. We propose an adaptive distributed approach allowing to distribute the load dynamically at runtime while taking into account the computing abilities of either GPUs or CPUs. Using Branch-and-Bound and Flowshop as a case study, we deployed our approach using up to 20 GPUs jointly to up to 128 CPUs. Through extensive experiments in different system configurations, we report near optimal speedups, thus providing new insights into how to take full advantage of both GPUs and CPUs power in modern computing platforms

    Reducing Thread Divergence in GPU-based B&B Applied to the Flow-shop problem

    Get PDF
    International audienceIn this paper,we propose a pioneering work on designing and programming B&B algorithms on GPU. To the best of our knowledge, no contribution has been proposed to raise such challenge. We focus on the parallel evaluation of the bounds for the Flow-shop scheduling problem. To deal with thread divergence caused by the bounding operation, we investigate two software based approaches called thread data reordering and branch refactoring. Experiments reported that parallel evaluation of bounds speeds up execution up to 54.5 times compared to a CPU version

    B&B@Grid : une approche efficace pour la gridification d'un algorithme Branch and Bound

    Get PDF
    La résolution exacte de problèmes d'optimisation combinatoire de grande taille, tels que les problèmes d'ordonnancement, constitue un vrai défi pour les grilles informatiques. En effet, il est nécessaire de repenser les algorithmes de résolution pour prendre en compte les caractéristiques de tels environnements, notamment leur grande échelle, l'hétérogénéité et la disponibilité dynamique de leurs ressources, et leur nature multi-domaine d'administration. Dans cet article, nous proposons une nouvelle approche de passage sur grilles de calcul des méthodes exactes de type Branch-and-Bound appelée B&B@Grid. Cette approche est basée sur un codage des unités de travail (sous problèmes) sous forme d'intervalles permettant de minimiser le coût des communications induites par les opérations de régulation de charge, de tolérance aux pannes et de détection de la terminaison. Cette approche, beaucoup plus performante en terme de coût de communication et de sauvegarde que les meilleures approches connues dans la littérature, a permis la résolution optimale sur la grille nationale Grid'5000 d'une instance standard du problème du Flow-Shop restée non résolue depuis une quinzaine d'années. Le Flow-Shop est l'un des problèmes d'ordonnancement les plus étudiés

    A Multi-start Local Search Scheduler for an Energy-aware Cloud Manager

    Get PDF
    International audienceThe field of cloud computing uses different management techniques for data center virtualization such as OpenNebula. However, computers composing the cloud infrastructure use a significant and growing portion of energy in the world specifically when dealing with virtualization for high performance computing (HPC). Therefore, energy-aware computing is crucial for large-scale systems that consume considerable amount of energy. In this paper, we present a new work that aims to deal with the energy consumption within a realistic cloud infrastructure using OpenNebula as a software management solution. Our scheduler is based on a multi-start local search heuristic that helps to find the best scheduling by dispatching the arriving of virtual machines (VM) according to the minimum energy consumption

    An Incremental Parallel PGAS-based Tree Search Algorithm

    Get PDF
    International audienceIn this work, we show that the Chapel high-productivity language is suitable for the design and implementation of all aspects involved in the conception of parallel tree search algorithms for solving combinatorial problems. Initially, it is possible to hand-optimize the data structures involved in the search process in a way equivalent to C. As a consequence, the single-threaded search in Chapel is on average only 7% slower than its counterpart written in C. Whereas programming a multicore tree search in Chapel is equivalent to C-OpenMP in terms of performance and programmability, its productivity-aware features for distributed programming stand out. It is possible to incrementally conceive a distributed tree search algorithm starting from its multicore counterpart by adding few lines of code. The distributed implementation performs load balancing among different computer nodes and also exploits all CPU cores of the system. Chapel presents an interesting trade-off between programmability and performance despite the high level of its features. The distributed tree search in Chapel is on average 16% slower and reaches up to 80% of the scalability achieved by its C-MPI+OpenMP counterpart

    A Pareto-based GA for Scheduling HPC Applications on Distributed Cloud Infrastructures

    Get PDF
    International audienceReducing energy consumption is an increasingly important issue in cloud computing, more specifically when dealing with High Performance Computing (HPC). Minimizing energy consumption can significantly reduce the amount of energy bills and then increases the provider's profit. In addition, the reduction of energy decreases greenhouse gas emissions. Therefore, many researches are carried out to develop new methods in order to consume less energy. In this paper, we present a multi-objective genetic algorithm (MO-GA) that optimizes the energy consumption, CO2 emissions and the generated profit of a geographically distributed cloud computing infrastructure. We also propose a greedy heuristic that aims to maximize the number of scheduled applications in order to compare it with the MO-GA. The two approaches have been experimented using realistic workload traces from Feitelson's PWA Parallel Workload Archive. The results show that MO-GA outperforms the greedy heuristic by a significant margin in terms of energy consumption and CO2 emissions. In addition, MO-GA is also proved to be slightly better in terms of profit while scheduling more applications
    • …
    corecore