Search CORE

538 research outputs found

A GPU-accelerated Branch-and-Bound Algorithm for the Flow-Shop Scheduling Problem

Author: Chakroun Imen
Mohand Mezmaz
Nouredine Melab
Tuyttens Daniel
Publication venue
Publication date: 01/01/2012
Field of study

Branch-and-Bound (B&B) algorithms are time intensive tree-based exploration methods for solving to optimality combinatorial optimization problems. In this paper, we investigate the use of GPU computing as a major complementary way to speed up those methods. The focus is put on the bounding mechanism of B&B algorithms, which is the most time consuming part of their exploration process. We propose a parallel B&B algorithm based on a GPU-accelerated bounding model. The proposed approach concentrate on optimizing data access management to further improve the performance of the bounding mechanism which uses large and intermediate data sets that do not completely fit in GPU memory. Extensive experiments of the contribution have been carried out on well known FSP benchmarks using an Nvidia Tesla C2050 GPU card. We compared the obtained performances to a single and a multithreaded CPU-based execution. Accelerations up to x100 are achieved for large problem instances

arXiv.org e-Print Archive

HAL - Lille 3

CiteSeerX

Crossref

INRIA a CCSD electronic archive server

Reducing thread divergence in a GPU-accelerated branch-and-bound algorithm

Author: Bendjoudi Ahcène
Chakroun Imen
Melab Nouredine
Mezmaz Mohand
Publication venue: 'Wiley'
Publication date: 01/01/2012
Field of study

International audienceIn this paper, we address the design and implementation of GPU-accelerated Branch-and-Bound algorithms (B&B) for solving Flow-shop scheduling optimization problems (FSP). Such applications are CPU-time consuming and highly irregular. On the other hand, GPUs are massively multi-threaded accelerators using the SIMD model at execution. A major issue which arises when executing on GPU a B&B applied to FSP is thread or branch divergence. Such divergence is caused by the lower bound function of FSP which contains many irregular loops and conditional instructions. Our challenge is therefore to revisit the design and implementation of B&B applied to FSP dealing with thread divergence. Extensive experiments of the proposed approach have been carried out on well-known FSP benchmarks using an Nvidia Tesla C2050 GPU card. Compared to a CPU-based execution, accelerations up to ×77.46 are achieved for large problem instances

HAL - Lille 3

INRIA a CCSD electronic archive server

Hal-Diderot

An Adaptative Multi-GPU based Branch-and-Bound. A Case Study: the Flow-Shop Scheduling Problem

Author: Chakroun Imen
Melab Nouredine
Publication venue
Publication date: 21/06/2012
Field of study

Solving exactly Combinatorial Optimization Problems (COPs) using a Branch-and-Bound (B&B) algorithm requires a huge amount of computational resources. Therefore, we recently investigated designing B&B algorithms on top of graphics processing units (GPUs) using a parallel bounding model. The proposed model assumes parallelizing the evaluation of the lower bounds on pools of sub-problems. The results demonstrated that the size of the evaluated pool has a significant impact on the performance of B&B and that it depends strongly on the problem instance being solved. In this paper, we design an adaptative parallel B&B algorithm for solving permutation-based combinatorial optimization problems such as FSP (Flow-shop Scheduling Problem) on GPU accelerators. To do so, we propose a dynamic heuristic for parameter auto-tuning at runtime. Another challenge of this work is to exploit larger degrees of parallelism by using the combined computational power of multiple GPU devices. The approach has been applied to the permutation flow-shop problem. Extensive experiments have been carried out on well-known FSP benchmarks using an Nvidia Tesla S1070 Computing System equipped with two Tesla T10 GPUs. Compared to a CPU-based execution, accelerations up to 105 are achieved for large problem instances.Comment: 14th IEEE International Conference on High Performance Computing and Communications, HPCC 2012 (2012

arXiv.org e-Print Archive

HAL - Lille 3

INRIA a CCSD electronic archive server

Many-core Branch-and-Bound for GPU accelerators and MIC coprocessors

Author: B Lageweg
E. L. Lawler
I. Chakroun
I. Chakroun
N. Melab
Raphaël Bolze
S Johnson
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 02/06/2019
Field of study

International audienceCoprocessors are increasingly becoming key building blocks of High Performance Computing platforms. These many-core energy-efficient devices boost the performance of traditional processors. On the other hand, Branch-and-Bound (B&B) algorithms are tree-based exact methods for solving to optimality combinatorial optimization problems (COPs). Solving large COPs results in the generation of a very large pool of subproblems and the evaluation of their associated lower bounds. Generating and evaluating those subproblems on coprocessors raises several issues including processor-coprocessor data transfer optimization, vectorization, thread divergence, and so on. In this paper, we investigate the offload-based parallel design and implementation of B&B algorithms for coprocessors addressing these issues. Two major many-core architectures are considered and compared: Nvidia GPU and Intel MIC. The proposed approaches have been experimented using the Flow-Shop scheduling problem and two hardware configurations equivalent in terms of energy consumption: Nvidia Tesla K40 and Intel Xeon Phi 5110P. The reported results show that the GPU-accelerated approach outperforms the MIC offload-based one even in its vectorized version. Moreover, vectorization improves the efficiency of the MIC offload-based approach with a factor of two

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Parallel Branch-and-Bound in Multi-core Multi-CPU Multi-GPU Heterogeneous Environments

Author: Derbel Bilel
Vu Trong-Tuan
Publication venue: 'Elsevier BV'
Publication date: 20/03/2016
Field of study

International audienceWe investigate the design of parallel B&B in large scale heterogeneous compute environments where processing units can be composed of a mixture of multiple shared memory cores, multiple distributed CPUs and multiple GPUs devices. We describe two approaches addressing the critical issue of how to map B&B workload with the different levels of parallelism exposed by the target compute platform. We also contribute a throughout large scale experimental study which allows us to derive a comprehensive and fair analysis of the proposed approaches under different system configurations using up to 16 GPUs and up to 512 CPU-cores. Our results shed more light on the main challenges one has to face when tackling B&B algorithms while describing efficient techniques to address them. In particular, we are able to obtain linear speed-ups at moderate scales where adaptive load balancing among the heterogeneous compute resources is shown to have a significant impact on performance. At the largest scales, intra-node parallelism and hybrid decentralized load balancing is shown to have a crucial importance in order to alleviate locking issues among shared memory threads and to scale the distributed resources while optimizing communication costs and minimizing idle time

HAL - Lille 3

INRIA a CCSD electronic archive server

Solving large permutation flow-shop scheduling problems on GPU-accelerated supercomputers

Author: Gmys Jan
Publication venue: HAL CCSD
Publication date: 17/12/2020
Field of study

Makespan minimization in permutation flow-shop scheduling is a well-known hard combinatorial optimization problem. Among the 120 standard benchmark instances proposed by E. Taillard in 1993, 23 have remained unsolved for almost three decades. In this paper, we present our attempts to solve these instances to optimality using parallel Branch-and-Bound tree search on the GPU-accelerated Jean Zay supercomputer. We report the exact solution of 11 previously unsolved problem instances and improved upper bounds for 8 instances. The solution of these problems requires both algorithmic improvements and leveraging the computing power of peta-scale high-performance computing platforms. The challenge consists in efficiently performing parallel depth-first traversal of a highly irregular, fine-grained search tree on distributed systems composed of hundreds of massively parallel accelerator devices and multi-core processors. We present and discuss the design and implementation of our permutation-based B&B and experimentally evaluate its parallel performance on up to 384 V100 GPUs (2 million CUDA cores) and 3840 CPU cores. The optimality proof for the largest solved instance requires about 64 CPU-years of computation-using 256 GPUs and over 4 million parallel search agents, the traversal of the search tree is completed in 13 hours, exploring 339 Tera-nodes

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Adaptive Dynamic Load Balancing in Heterogenous Multiple GPUs-CPUs Distributed Setting: Case Study of B&B Tree Search

Author: Derbel Bilel
Melab Nouredine
Vu Trong-Tuan
Publication venue: Lecture Notes in Computer Science
Publication date: 07/01/2013
Field of study

International audienceThe emergence of new hybrid and heterogenous multi-GPU multi-CPU large scale platforms offers new opportunities and pauses new challenges when solving difficult optimization problems. This paper targets irregular tree search algorithms in which workload is unpredictable. We propose an adaptive distributed approach allowing to distribute the load dynamically at runtime while taking into account the computing abilities of either GPUs or CPUs. Using Branch-and-Bound and Flowshop as a case study, we deployed our approach using up to 20 GPUs jointly to up to 128 CPUs. Through extensive experiments in different system configurations, we report near optimal speedups, thus providing new insights into how to take full advantage of both GPUs and CPUs power in modern computing platforms

HAL - Lille 3

INRIA a CCSD electronic archive server

Recent Advances on GPU Computing in Operations Research

Author: Didier El Baz
Vincent Boyer
Publication venue
Publication date: 24/04/2020
Field of study

In the last decade, Graphics Processing Units (GPUs) have gained an increasing popularity as accelerators for High Performance Computing (HPC) applications. Recent GPUs are not only powerful graphics engines but also highly threaded parallel computing processors that can achieve sustainable speedup as compared with CPUs. In this context, researchers try to exploit the capability of this architecture to solve difficult problems in many domains in science and engineering. In this article, we present recent advances on GPU Computing in Operations Research. We focus in particular on Integer Programming and Linear Programming

CiteSeerX

Recent Advances on GPU Computing in Operations Research

Author: Didier El Baz
Vincent Boyer
Publication venue
Publication date: 24/04/2020
Field of study

Abstract-In the last decade, Graphics Processing Units (GPUs) have gained an increasing popularity as accelerators for High Performance Computing (HPC) applications. Recent GPUs are not only powerful graphics engines but also highly threaded parallel computing processors that can achieve sustainable speedup as compared with CPUs. In this context, researchers try to exploit the capability of this architecture to solve difficult problems in many domains in science and engineering. In this article, we present recent advances on GPU Computing in Operations Research. We focus in particular on Integer Programming and Linear Programming

CiteSeerX