    A scheduling theory framework for GPU tasks efficient execution

    Concurrent execution of tasks in GPUs can reduce the computation time of a workload by overlapping data transfer and execution commands. However it is difficult to implement an efficient run- time scheduler that minimizes the workload makespan as many execution orderings should be evaluated. In this paper, we employ scheduling theory to build a model that takes into account the device capabili- ties, workload characteristics, constraints and objec- tive functions. In our model, GPU tasks schedul- ing is reformulated as a flow shop scheduling prob- lem, which allow us to apply and compare well known methods already developed in the operations research field. In addition we develop a new heuristic, specif- ically focused on executing GPU commands, that achieves better scheduling results than previous tech- niques. Finally, a comprehensive evaluation, showing the suitability and robustness of this new approach, is conducted in three different NVIDIA architectures (Kepler, Maxwell and Pascal).Proyecto TIN2016- 0920R, Universidad de Málaga (Campus de Excelencia Internacional Andalucía Tech) y programa de donación de NVIDIA Corporation

    A simulation-based approach for solving the flowshop problem

    A simulation-based algorithm for the Permutation Flowshop Sequencing Problem (PFSP) is presented. The algorithm uses Monte Carlo Simulation and a discrete version of the triangular distribution to incorporate a randomness criterion in the classical Nawaz, Enscore, and Ham (NEH) heuristic and starts an iterative process in order to obtain a set of alternative solutions to the PFSP. Thus, a random but biased lo We can then consider several properties per solution other than the makespan, such as balanced idle times among machines, number of completed jobs at a given target time, etc. This allows the decision-maker to consider multiple solution characteristics apart from those defined by the aprioristic objective function. Therefore, our methodology provides flexibility during the sequence selection process, which may help to improve the scheduling process. Several tests have been performed to discuss the effectiveness of this approach. The results obtained so far are promising enough to encourage further developments and improvements on the algorithm and its applications in real-life scenarios. In particular, Multi-Agent Simulation is proposed as a promising technique to be explored in future works.Postprint (published version

    Rámec pro plánování problémy

    Import 22/07/2015Scheduling problems form an important subclass of combinatorial optimisation problems with many applications in manufacturing and logistics. Predominately these problems are NP-complete (decision based) and NP-hard (optimisation based), hence the main course of research in solving them concentrates on the design of efficient heuristic algorithms. Two main categories of these algorithms exist: deterministic algorithms and evolutionary metaheuristics. The deterministic algorithms comprise local improvement techniques, such as k-opt algorithm, which try to improve existing feasible solution, and constructive heuristics, such as NEH, which build a solution starting from scratch, adding one job at a time. Evolutionary metaheuristics have prospered in the past decades, owing to their efficiency and flexibility. Drawing inspiration from the theory of natural evolution or swarm behavioural patterns, the most popular of these algorithms in practice include for instance Genetic Algorithms, Differential Evolution, Particle Swarm Optimisation, amongst others. However, even though these heuristics provide in most cases close to optimal solution at reasonable execution time, this time is still impractically long for many applications. Therefore much effort has been dedicated to accelerating these algorithms. Since the development of hardware turns away from increasing the clock speed towards the parallel processing units, owing to reaching the limits of technology due to the increased power consumption and heat dissipation, this effort goes into parallelisation of the existing algorithms, to enable exploitation of the computing power of multi-core or many-core platforms. This is the goal of the first part of the thesis, accelerating two of the deterministic algorithms, NEH and 2-opt, with interesting results. Another approach has been taken in the second part, with the core premise of exploring the influence of stochasticity on the performance of an evolutionary algorithm, selecting the relatively recent and promising Discrete Artificial Bee Colony algorithm. The pseudo-random number generator has been replaced with the different types of dissipative chaos maps, with some of them improving the algorithm significantly. It has been shown that the population based evolutionary algorithms often form complex networks, taken from the point of view of the information exchange between individual solutions during the course of population development. The final part of this thesis puts this observation into practice by embedding the complex network analysis based self-adaptive mechanism into the ABC algorithm, a continuous optimisation problems solving evolutionary algorithm, which is however the basis for the afore mentioned DABC algorithm, and proving the effectiveness for some of the developed versions, currently on the standard continuous optimisation test functions, with the possibility to extend this modification to the combinatorial optimisations problems in the future being discussed in the conclusion.Rozvrhovací problémy jsou důležitou podtřídou úloh kombinatorické optimalizace s řadou aplikací ve výrobě a logistice. Většina těchto problémů je NP-úplných (rozhodovací forma) a NP-těžkých (optimalizační forma), proto se výzkum zaměřuje na návrh efektivních heuristických algoritmů. Dvě hlavní kategorie těchto algoritmů jsou deterministické algoritmy a evoluční metaheuristiky. Deterministické algoritmy zahrnují techniky lokálního prohledávání, například algoritmus k-opt, jejichž cílem je zlepšení existujícího přípustného řešení problému, dále pak konstruktivní heuristiky, jejichž příkladem je algoritmus NEH, které hledané řešení vytvářejí inkrementálně, bez potřeby znalosti vstupního bodu v prohledávaném prostoru řešení. Evoluční metaheuristiky mají za sebou historii úspěšného vývoje v posledních desetiletích, zejména díky jejich efektivitě a flexibilitě. Jejich inspirací jsou poznatky převzaté z biologie, teorie evoluce a inteligence hejna. Mezi nejpopulárnějšími z těchto algoritmů jsou, mimo jiné, genetické algoritmy, diferenciální evoluce, rojení částic (Particle Swarm Optimisation). Ačkoli tyto heuristiky nalézají ve většině případů řešení blížící se globálnímu optimu v přípustném výpočetním čase, pro řadu aplikací mohou být stále ještě nepřijatelně pomalé. Velké úsilí bylo věnováno zrychlení těchto algoritmů. Protože se vývoj hardware díky dosažení technologických limitů, vzhledem ke zvyšující se spotřebě energie a tepelnému vyzařování, obrací od zvyšování frekvence jednojádrového procesoru k vícejádrovým procesorům a paralelnímu zpracování, je tato snaha většinou orientovaná na paralelizaci existujících algoritmů, aby bylo umožněno využití výpočetní síly vícejádrových platforem (multi-core a many-core). Prvním cílem této práce je tudíž akcelerace dvou deterministických algoritmů, NEH a 2-opt, přičemž bylo dosaženo zajímavých výsledků. Jiný přístup byl zvolen ve druhé části, s hlavní myšlenkou prozkoumání vlivu náhodnosti na výkon evolučního algoritmu. Za tímto účelem byl zvolen relativně nový a slibný algoritmus Discrete Artificial Bee Colony. Generátor pseudonáhodných čísel byl nahrazen několika různými chaotickými mapami, z nichž některé znatelně zlepšily výsledky algoritmu. Bylo ukázáno, že evoluční algoritmy založené na populaci často formují komplexní sítě, vzato z pohledu výměny informací mezi jednotlivými řešeními v populaci během jejího vývoje. Závěrečná část práce aplikuje toto pozorování vložením samo přizpůsobivého mechanismu založeném na analýze komplexní sítě do algoritmu ABC, který je evolučním algoritmem pro spojitou optimalizaci a zároveň základem dříve zmíněného DABC algoritmu. Efektivita několika verzí algoritmu založeném na této myšlence je dokázána na standardní sadě testovacích funkcí pro spojitou optimalizaci. Možnost rozšíření této modifikace na kombinatorické optimalizační problémy je diskutována v závěru práce.460 - Katedra informatikyvýborn

    GPU-based Approaches for Multiobjective Local Search Algorithms. A Case Study: the Flowshop Scheduling Problem

    International audienceMultiobjective local search algorithms are efficient methods to solve complex problems in science and industry. Even if these heuristics allow to significantly reduce the computational time of the solution search space exploration, this latter cost remains exorbitant when very large problem instances are to be solved. As a result, the use of GPU computing has been recently revealed as an efficient way to accelerate the search process. This paper presents a new methodology to design and implement efficiently GPU-based multiobjective local search algorithms. The experimental results show that the approach is promising especially for large problem instances

    Innovative hybrid MOEA/AD variants for solving multi-objective combinatorial optimization problems

    Orientador : Aurora Trinidad Ramirez PozoCoorientador : Roberto SantanaTese (doutorado) - Universidade Federal do Paraná, Setor de Ciências Exatas, Programa de Pós-Graduação em Informática. Defesa: Curitiba, 16/12/2016Inclui referências : f. 103-116Resumo: Muitos problemas do mundo real podem ser representados como um problema de otimização combinatória. Muitas vezes, estes problemas são caracterizados pelo grande número de variáveis e pela presença de múltiplos objetivos a serem otimizados ao mesmo tempo. Muitas vezes estes problemas são difíceis de serem resolvidos de forma ótima. Suas resoluções tem sido considerada um desafio nas últimas décadas. Os algoritimos metaheurísticos visam encontrar uma aproximação aceitável do ótimo em um tempo computacional razoável. Os algoritmos metaheurísticos continuam sendo um foco de pesquisa científica, recebendo uma atenção crescente pela comunidade. Uma das têndencias neste cenário é a arbordagem híbrida, na qual diferentes métodos e conceitos são combinados objetivando propor metaheurísticas mais eficientes. Nesta tese, nós propomos algoritmos metaheurísticos híbridos para a solução de problemas combinatoriais multiobjetivo. Os principais ingredientes das nossas propostas são: (i) o algoritmo evolutivo multiobjetivo baseado em decomposição (MOEA/D framework), (ii) a otimização por colônias de formigas e (iii) e os algoritmos de estimação de distribuição. Em nossos frameworks, além dos operadores genéticos tradicionais, podemos instanciar diferentes modelos como mecanismo de reprodução dos algoritmos. Além disso, nós introduzimos alguns componentes nos frameworks objetivando balancear a convergência e a diversidade durante a busca. Nossos esforços foram direcionados para a resolução de problemas considerados difíceis na literatura. São eles: a programação quadrática binária sem restrições multiobjetivo, o problema de programação flow-shop permutacional multiobjetivo, e também os problemas caracterizados como deceptivos. Por meio de estudos experimentais, mostramos que as abordagens propostas são capazes de superar os resultados do estado-da-arte em grande parte dos casos considerados. Mostramos que as diretrizes do MOEA/D hibridizadas com outras metaheurísticas é uma estratégia promissora para a solução de problemas combinatoriais multiobjetivo. Palavras-chave: metaheuristicas, otimização multiobjetivo, problemas combinatoriais, MOEA/D, otimização por colônia de formigas, algoritmos de estimação de distribuição, programação quadrática binária sem restrições multiobjetivo, problema de programação flow-shop permutacional multiobjetivo, abordagens híbridas.Abstract: Several real-world problems can be stated as a combinatorial optimization problem. Very often, they are characterized by the large number of variables and the presence of multiple conflicting objectives to be optimized at the same time. These kind of problems are, usually, hard to be solved optimally, and their solutions have been considered a challenge for a long time. Metaheuristic algorithms aim at finding an acceptable approximation to the optimal solution in a reasonable computational time. The research on metaheuristics remains an attractive area and receives growing attention. One of the trends in this scenario are the hybrid approaches, in which different methods and concepts are combined aiming to propose more efficient approaches. In this thesis, we have proposed hybrid metaheuristic algorithms for solving multi-objective combinatorial optimization problems. Our proposals are based on (i) the multi-objective evolutionary algorithm based on decomposition (MOEA/D framework), (ii) the bio-inspired metaheuristic ant colony optimization, and (iii) the probabilistic models from the estimation of distribution algorithms. Our algorithms are considered MOEA/D variants. In our MOEA/D variants, besides the traditional genetic operators, we can instantiate different models as the variation step (reproduction). Moreover, we include some design modifications into the frameworks to control the convergence and the diversity during their search (evolution). We have addressed some important problems from the literature, e.g., the multi-objective unconstrained binary quadratic programming, the multiobjective permutation flowshop scheduling problem, and the problems characterized by deception. As a result, we show that our proposed frameworks are able to solve these problems efficiently by outperforming the state-of-the-art approaches in most of the cases considered. We show that the MOEA/D guidelines hybridized to other metaheuristic components and concepts is a powerful strategy for solving multi-objective combinatorial optimization problems. Keywords: meta-heuristics, multi-objective optimization, combinatorial problems, MOEA/D, ant colony optimization, estimation of distribution algorithms, unconstrained binary quadratic programming, permutation flowshop scheduling problem, hybrid approaches

    Solving large permutation flow-shop scheduling problems on GPU-accelerated supercomputers

    Makespan minimization in permutation flow-shop scheduling is a well-known hard combinatorial optimization problem. Among the 120 standard benchmark instances proposed by E. Taillard in 1993, 23 have remained unsolved for almost three decades. In this paper, we present our attempts to solve these instances to optimality using parallel Branch-and-Bound tree search on the GPU-accelerated Jean Zay supercomputer. We report the exact solution of 11 previously unsolved problem instances and improved upper bounds for 8 instances. The solution of these problems requires both algorithmic improvements and leveraging the computing power of peta-scale high-performance computing platforms. The challenge consists in efficiently performing parallel depth-first traversal of a highly irregular, fine-grained search tree on distributed systems composed of hundreds of massively parallel accelerator devices and multi-core processors. We present and discuss the design and implementation of our permutation-based B&B and experimentally evaluate its parallel performance on up to 384 V100 GPUs (2 million CUDA cores) and 3840 CPU cores. The optimality proof for the largest solved instance requires about 64 CPU-years of computation-using 256 GPUs and over 4 million parallel search agents, the traversal of the search tree is completed in 13 hours, exploring 339 Tera-nodes

    Optimisation massivement multi-tâche sur grappes de calcul hétérogènes – Application aux problèmes de permutation

    Branch-and-Bound (B&B) is a frequently used tree-search exploratory method for the exact resolution of combinatorial optimization problems (COPs). However, in practice, only small problem instances can be solved on a sequential computer, as B&B generates often generates a huge amount of subproblems to be evaluated. In order to solve large COPs, we revisit the design and implementation of massively parallel B&B on top of large heterogeneous clusters, integrating multi-core CPUs, many-core processors and GPUs.For the efficient storage and management of subproblems an original data structure (IVM) dedicated to permutation problems is used. Because of the highly irregular and unpredictable shape of the B&B tree, dynamic load balancing between parallel exploration processes is one of the main issues addressed in this thesis. Based on a compact encoding of the search space in the form of intervals, work stealing strategies for multi-core and GPU are proposed, as well as hierarchical approaches for load balancing in distributed memory multi-CPU/multi-GPU systems. Three permutation problems, the Flowshop Scheduling Problem (FSP), the Quadratic Assignment Problem (QAP) and the n-Queens puzzle problem are used as test-cases.The resolution, in 9 hours, of a FSP instance with an estimated sequential execution time of 22 years demonstrates the scalability of the proposed algorithms on a cluster composed of 36 GPUs.L'algorithme Branch-and-Bound (B&B) est une méthode de recherche arborescente fréquemment utilisé pour la résolution exacte de problèmes d'optimisation combinatoire (POC). Néanmoins, seules des petites instances peuvent être effectivement résolues sur une machine séquentielle, le nombre de sous-problèmes à évaluer étant souvent très grand. Visant la resolution de POC de grande taille, nous réexaminons la conception et l'implémentation d'algorithmes B&B massivement parallèles sur de larges plateformes hétérogènes de calcul, intégrant des processeurs multi-coeurs, many-cores et et processeurs graphiques (GPUs). Pour une représentation compacte en mémoire des sous-problèmes une structure de données originale (IVM), dédiée aux problèmes de permutation est utilisée. En raison de la forte irrégularité de l'arbre de recherche, l'équilibrage de charge dynamique entre processus d'exploration parallèles occupe une place centrale dans cette thèse. Basés sur un encodage compact de l'espace de recherche sous forme d'intervalles, des stratégies de vol de tâches sont proposées pour processeurs multi-core et GPU, ainsi une approche hiérarchique pour l'équilibrage de charge dans les systèmes multi-GPU et multi-CPU à mémoire distribuée. Trois problèmes d'optimisation définis sur l'ensemble des permutations, le problème d'ordonnancement Flow-Shop (FSP), d'affectation quadratique (QAP) et le problème des n-dames sont utilisés comme cas d'étude. La resolution en 9 heures d'une instance du FSP dont le temps de résolution séquentiel est estimé à 22 ans demontre la capacité de passage à l'échelle des algorithmes proposés sur une grappe de calcul composé de 36 GPUs

    Parallel Branch-and-Bound in Multi-core Multi-CPU Multi-GPU Heterogeneous Environments

    International audienceWe investigate the design of parallel B&B in large scale heterogeneous compute environments where processing units can be composed of a mixture of multiple shared memory cores, multiple distributed CPUs and multiple GPUs devices. We describe two approaches addressing the critical issue of how to map B&B workload with the different levels of parallelism exposed by the target compute platform. We also contribute a throughout large scale experimental study which allows us to derive a comprehensive and fair analysis of the proposed approaches under different system configurations using up to 16 GPUs and up to 512 CPU-cores. Our results shed more light on the main challenges one has to face when tackling B&B algorithms while describing efficient techniques to address them. In particular, we are able to obtain linear speed-ups at moderate scales where adaptive load balancing among the heterogeneous compute resources is shown to have a significant impact on performance. At the largest scales, intra-node parallelism and hybrid decentralized load balancing is shown to have a crucial importance in order to alleviate locking issues among shared memory threads and to scale the distributed resources while optimizing communication costs and minimizing idle time

    A simheuristic for bi-objective stochastic permutation flow shop scheduling problem

    This paper addresses the stochastic permutation flow shop problem (SPFSP) in which the stochastic parameters are the processing times. This allows the modeling of setups and machine breakdowns. Likewise, it is proposed a multi-objective greedy randomized adaptive search procedure (GRASP) coupled with Monte-Carlo Simulation to obtain expected makespan and expected tardiness. To manage the bi-objective function, a sequential combined method is considered in the construction phase of the meta-heuristic. Moreover, the local Search combines 2-optimal interchanges with a Pareto Archived Evolution Strategy (PAES) to obtain the Pareto front. Also, some Taillard benchmark instances of deterministic permutation flow shop problem were adapted in order to include the variation in processing times. Accordingly, two coefficients of variation (CVs) were tested: one depending on expected processing times values defined as twice the expected processing time of a job, and a fixed value of 0.25. Thus, the computational results on benchmark instances show that the variable CV provided lower values of the expected makespan and tardiness, while the con-stant CV presented higher expected measures. The computational results present insights for further analysis on the behavior of stochastic scheduling problems for a better approach in real-life scenarios at industrial and service systems

    An Improved Multiobjective PSO for the Scheduling Problem of Panel Block Construction

    Uncertainty is common in ship construction. However, few studies have focused on scheduling problems under uncertainty in shipbuilding. This paper formulates the scheduling problem of panel block construction as a multiobjective fuzzy flow shop scheduling problem (FSSP) with a fuzzy processing time, a fuzzy due date, and the just-in-time (JIT) concept. An improved multiobjective particle swarm optimization called MOPSO-M is developed to solve the scheduling problem. MOPSO-M utilizes a ranked-order-value rule to convert the continuous position of particles into the discrete permutations of jobs, and an available mapping is employed to obtain the precedence-based permutation of the jobs. In addition, to improve the performance of MOPSO-M, archive maintenance is combined with global best position selection, and mutation and a velocity constriction mechanism are introduced into the algorithm. The feasibility and effectiveness of MOPSO-M are assessed in comparison with general MOPSO and nondominated sorting genetic algorithm-II (NSGA-II)