Search CORE

15,486 research outputs found

Local search performance guarantees for restricted related parallel machine scheduling

Author: Recalde Diego
Rutten Cyriel
Schuurman Petra
Vredeveld Tjark
Publication venue
Publication date
Field of study

We consider the problem of minimizing the makespan on restricted related parallel machines. In restricted machine scheduling each job is only allowed to be scheduled on a subset of machines. We study the worst-case behavior of local search algorithms. In particular, we analyze the quality of local optima with respect to the jump, swap, push and lexicographical jump neighborhood.operations research and management science;

Research Papers in Economics

Local search performance guarantees for restricted related parallel machine scheduling

Author: Recalde D.
Rutten C.
Schuurman P.
Vredeveld T.
Publication venue: 'University of Maastricht'
Publication date: 01/01/2009
Field of study

Maastricht University Research Portal

Algorithms for Hierarchical and Semi-Partitioned Parallel Scheduling

Author: Bonifaci Vincenzo
Dangelo Gianlorenzo
Marchetti-Spaccamela Alberto
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

We propose a model for scheduling jobs in a parallel machine setting that takes into account the cost of migrations by assuming that the processing time of a job may depend on the specific set of machines among which the job is migrated. For the makespan minimization objective, the model generalizes classical scheduling problems such as unrelated parallel machine scheduling, as well as novel ones such as semi-partitioned and clustered scheduling. In the case of a hierarchical family of machines, we derive a compact integer linear programming formulation of the problem and leverage its fractional relaxation to obtain a polynomial-time 2-approximation algorithm. Extensions that incorporate memory capacity constraints are also discussed

Crossref

Archivio della ricerca- Università di Roma La Sapienza

Non-clairvoyant Scheduling Games

Author: Cohen Johanne
Durr Christoph
Thang Nguyen Kim
Publication venue
Publication date: 01/01/2011
Field of study

In a scheduling game, each player owns a job and chooses a machine to execute it. While the social cost is the maximal load over all machines (makespan), the cost (disutility) of each player is the completion time of its own job. In the game, players may follow selfish strategies to optimize their cost and therefore their behaviors do not necessarily lead the game to an equilibrium. Even in the case there is an equilibrium, its makespan might be much larger than the social optimum, and this inefficiency is measured by the price of anarchy -- the worst ratio between the makespan of an equilibrium and the optimum. Coordination mechanisms aim to reduce the price of anarchy by designing scheduling policies that specify how jobs assigned to a same machine are to be scheduled. Typically these policies define the schedule according to the processing times as announced by the jobs. One could wonder if there are policies that do not require this knowledge, and still provide a good price of anarchy. This would make the processing times be private information and avoid the problem of truthfulness. In this paper we study these so-called non-clairvoyant policies. In particular, we study the RANDOM policy that schedules the jobs in a random order without preemption, and the EQUI policy that schedules the jobs in parallel using time-multiplexing, assigning each job an equal fraction of CPU time

arXiv.org e-Print Archive

Base de publications de l'université Paris-Dauphine

Data Structures for Task-based Priority Scheduling

Author: Cederman Daniel
Träff Jesper Larsson
Tsigas Philippas
Versaci Francesco
Wimmer Martin
Publication venue
Publication date: 09/12/2013
Field of study

Many task-parallel applications can benefit from attempting to execute tasks in a specific order, as for instance indicated by priorities associated with the tasks. We present three lock-free data structures for priority scheduling with different trade-offs on scalability and ordering guarantees. First we propose a basic extension to work-stealing that provides good scalability, but cannot provide any guarantees for task-ordering in-between threads. Next, we present a centralized priority data structure based on

k

-fifo queues, which provides strong (but still relaxed with regard to a sequential specification) guarantees. The parameter

k

allows to dynamically configure the trade-off between scalability and the required ordering guarantee. Third, and finally, we combine both data structures into a hybrid,

k

-priority data structure, which provides scalability similar to the work-stealing based approach for larger

k

, while giving strong ordering guarantees for smaller

k

. We argue for using the hybrid data structure as the best compromise for generic, priority-based task-scheduling. We analyze the behavior and trade-offs of our data structures in the context of a simple parallelization of Dijkstra's single-source shortest path algorithm. Our theoretical analysis and simulations show that both the centralized and the hybrid

k

-priority based data structures can give strong guarantees on the useful work performed by the parallel Dijkstra algorithm. We support our results with experimental evidence on an 80-core Intel Xeon system

arXiv.org e-Print Archive

Crossref

Chalmers Research

Towards a High-Level Implementation of Execution Primitives for Unrestricted, Independent And-Parallelism

Author: Carro Liñares Manuel
Casas Amadeo
Hermenegildo Manuel V.
Publication venue: Facultad de Informática (UPM)
Publication date: 01/01/2007
Field of study

Most efficient implementations of parallel logic programming rely on complex low-level machinery which is arguably difficult to implement and modify. We explore an alternative approach aimed at taming that complexity by raising core parts of the implementation to the source language level for the particular case of and-parallellism. We handle a significant portion of the parallel implementation at the Prolog level with the help of a comparatively small number of concurrency.related primitives which take case of lower-level tasks such as locking, thread management, stack set management, etc. The approach does not eliminate altogether modifications to the abstract machine, but it does greatly simplify them and it also facilitates experimenting with different alternatives. We show how this approach allows implementing both restricted and unrestricted (i.e., non fork-join) parallelism. Preliminary esperiments show thay the performance safcrifieced is reasonable, although granularity of unrestricted parallelism contributes to better observed speedups

CiteSeerX

Archivo Digital UPM

Towards high-level execution primitives for and-parallelism: preliminary results

Author: Carro Liñares Manuel
Casas Amadeo
Hermenegildo Manuel V.
Publication venue: Facultad de Informática (UPM)
Publication date: 01/01/2007
Field of study

Most implementations of parallel logic programming rely on complex low-level machinery which is arguably difflcult to implement and modify. We explore an alternative approach aimed at taming that complexity by raising core parts of the implementation to the source language level for the particular case of and-parallelism. Therefore, we handle a signiflcant portion of the parallel implementation mechanism at the Prolog level with the help of a comparatively small number of concurrency-related primitives which take care of lower-level tasks such as locking, thread management, stack set management, etc. The approach does not eliminate altogether modiflcations to the abstract machine, but it does greatly simplify them and it also facilitates experimenting with different alternatives. We show how this approach allows implementing both restricted and unrestricted (i.e., non fork-join) parallelism. Preliminary experiments show that the amount of performance sacriflced is reasonable, although granularity control is required in some cases. Also, we observe that the availability of unrestricted parallelism contributes to better observed speedups

CiteSeerX

Archivo Digital UPM

AutoAccel: Automated Accelerator Generation and Optimization with Composable, Parallel and Pipeline Architecture

Author: Cong Jason
Wei Peng
Yu Cody Hao
Zhang Peng
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 29/07/2018
Field of study

CPU-FPGA heterogeneous architectures are attracting ever-increasing attention in an attempt to advance computational capabilities and energy efficiency in today's datacenters. These architectures provide programmers with the ability to reprogram the FPGAs for flexible acceleration of many workloads. Nonetheless, this advantage is often overshadowed by the poor programmability of FPGAs whose programming is conventionally a RTL design practice. Although recent advances in high-level synthesis (HLS) significantly improve the FPGA programmability, it still leaves programmers facing the challenge of identifying the optimal design configuration in a tremendous design space. This paper aims to address this challenge and pave the path from software programs towards high-quality FPGA accelerators. Specifically, we first propose the composable, parallel and pipeline (CPP) microarchitecture as a template of accelerator designs. Such a well-defined template is able to support efficient accelerator designs for a broad class of computation kernels, and more importantly, drastically reduce the design space. Also, we introduce an analytical model to capture the performance and resource trade-offs among different design configurations of the CPP microarchitecture, which lays the foundation for fast design space exploration. On top of the CPP microarchitecture and its analytical model, we develop the AutoAccel framework to make the entire accelerator generation automated. AutoAccel accepts a software program as an input and performs a series of code transformations based on the result of the analytical-model-based design space exploration to construct the desired CPP microarchitecture. Our experiments show that the AutoAccel-generated accelerators outperform their corresponding software implementations by an average of 72x for a broad class of computation kernels

arXiv.org e-Print Archive

Crossref

Scipedia