Search CORE

4,430 research outputs found

Co-Scheduling Algorithms for High-Throughput Workload Execution

Author: Aupy Guillaume
Benoit Anne
Raghavan Padma
Robert Yves
Shantharam Manu
Publication venue
Publication date: 29/04/2013
Field of study

This paper investigates co-scheduling algorithms for processing a set of parallel applications. Instead of executing each application one by one, using a maximum degree of parallelism for each of them, we aim at scheduling several applications concurrently. We partition the original application set into a series of packs, which are executed one by one. A pack comprises several applications, each of them with an assigned number of processors, with the constraint that the total number of processors assigned within a pack does not exceed the maximum number of available processors. The objective is to determine a partition into packs, and an assignment of processors to applications, that minimize the sum of the execution times of the packs. We thoroughly study the complexity of this optimization problem, and propose several heuristics that exhibit very good performance on a variety of workloads, whose application execution times model profiles of parallel scientific codes. We show that co-scheduling leads to to faster workload completion time and to faster response times on average (hence increasing system throughput and saving energy), for significant benefits over traditional scheduling from both the user and system perspectives

arXiv.org e-Print Archive

HAL-ENS-LYON

CiteSeerX

INRIA a CCSD electronic archive server

Hal-Diderot

Recommended from our members

The scheduling of sparse matrix-vector multiplication on a massively parallel dap computer

Author: Andersen J
Mitra G
Parkinson D
Publication venue: Brunel University
Publication date: 01/01/1991
Field of study

An efficient data structure is presented which supports general unstructured sparse matrix-vector multiplications on a Distributed Array of Processors (DAP). This approach seeks to reduce the inter-processor data movements and organises the operations in batches of massively parallel steps by a heuristic scheduling procedure performed on the host computer. The resulting data structure is of particular relevance to iterative schemes for solving linear systems. Performance results for matrices taken from well known Linear Programming (LP) test problems are presented and analysed

Brunel University Research Archive

Survey on Combinatorial Register Allocation and Instruction Scheduling

Author: Lozano Roberto Castañeda
Schulte Christian
Publication venue
Publication date: 01/01/2018
Field of study

Register allocation (mapping variables to processor registers or memory) and instruction scheduling (reordering instructions to increase instruction-level parallelism) are essential tasks for generating efficient assembly code in a compiler. In the last three decades, combinatorial optimization has emerged as an alternative to traditional, heuristic algorithms for these two tasks. Combinatorial optimization approaches can deliver optimal solutions according to a model, can precisely capture trade-offs between conflicting decisions, and are more flexible at the expense of increased compilation time. This paper provides an exhaustive literature review and a classification of combinatorial optimization approaches to register allocation and instruction scheduling, with a focus on the techniques that are most applied in this context: integer programming, constraint programming, partitioned Boolean quadratic programming, and enumeration. Researchers in compilers and combinatorial optimization can benefit from identifying developments, trends, and challenges in the area; compiler practitioners may discern opportunities and grasp the potential benefit of applying combinatorial optimization

arXiv.org e-Print Archive

Publikationer från KTH

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Energy-Aware Lease Scheduling in Virtualized Data Centers

Author: A. Beloglazov
D.G. Feitelson
L.A. Barroso
R. Buyya
R. Panigrahy
S. Albers
X. Fan
Publication venue
Publication date: 28/10/2014
Field of study

Energy efficiency has become an important measurement of scheduling algorithms in virtualized data centers. One of the challenges of energy-efficient scheduling algorithms, however, is the trade-off between minimizing energy consumption and satisfying quality of service (e.g. performance, resource availability on time for reservation requests). We consider resource needs in the context of virtualized data centers of a private cloud system, which provides resource leases in terms of virtual machines (VMs) for user applications. In this paper, we propose heuristics for scheduling VMs that address the above challenge. On performance evaluation, simulated results have shown a significant reduction on total energy consumption of our proposed algorithms compared with an existing First-Come-First-Serve (FCFS) scheduling algorithm with the same fulfillment of performance requirements. We also discuss the improvement of energy saving when additionally using migration policies to the above mentioned algorithms.Comment: 10 pages, 2 figures, Proceedings of the Fifth International Conference on High Performance Scientific Computing, March 5-9, 2012, Hanoi, Vietna

arXiv.org e-Print Archive

Crossref

SLO-aware Colocation of Data Center Tasks Based on Instantaneous Processor Requirements

Author: Boutin Eric
Goel Ashish
Wang Meng
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 05/09/2017
Field of study

In a cloud data center, a single physical machine simultaneously executes dozens of highly heterogeneous tasks. Such colocation results in more efficient utilization of machines, but, when tasks' requirements exceed available resources, some of the tasks might be throttled down or preempted. We analyze version 2.1 of the Google cluster trace that shows short-term (1 second) task CPU usage. Contrary to the assumptions taken by many theoretical studies, we demonstrate that the empirical distributions do not follow any single distribution. However, high percentiles of the total processor usage (summed over at least 10 tasks) can be reasonably estimated by the Gaussian distribution. We use this result for a probabilistic fit test, called the Gaussian Percentile Approximation (GPA), for standard bin-packing algorithms. To check whether a new task will fit into a machine, GPA checks whether the resulting distribution's percentile corresponding to the requested service level objective, SLO is still below the machine's capacity. In our simulation experiments, GPA resulted in colocations exceeding the machines' capacity with a frequency similar to the requested SLO.Comment: Author's version of a paper published in ACM SoCC'1

arXiv.org e-Print Archive

Crossref

Packing Sporadic Real-Time Tasks on Identical Multiprocessor Systems

Author: Bansal Nikhil
Chakraborty Samarjit
Chen Jian-Jia
von der Brüggen Georg
Publication venue
Publication date: 01/01/2018
Field of study

In real-time systems, in addition to the functional correctness recurrent tasks must fulfill timing constraints to ensure the correct behavior of the system. Partitioned scheduling is widely used in real-time systems, i.e., the tasks are statically assigned onto processors while ensuring that all timing constraints are met. The decision version of the problem, which is to check whether the deadline constraints of tasks can be satisfied on a given number of identical processors, has been known

{\cal NP}

-complete in the strong sense. Several studies on this problem are based on approximations involving resource augmentation, i.e., speeding up individual processors. This paper studies another type of resource augmentation by allocating additional processors, a topic that has not been explored until recently. We provide polynomial-time algorithms and analysis, in which the approximation factors are dependent upon the input instances. Specifically, the factors are related to the maximum ratio of the period to the relative deadline of a task in the given task set. We also show that these algorithms unfortunately cannot achieve a constant approximation factor for general cases. Furthermore, we prove that the problem does not admit any asymptotic polynomial-time approximation scheme (APTAS) unless

{\cal P}={\cal NP}

when the task set has constrained deadlines, i.e., the relative deadline of a task is no more than the period of the task.Comment: Accepted and to appear in ISAAC 2018, Yi-Lan, Taiwa

arXiv.org e-Print Archive

Repository TU/e

CWI's Institutional Repository

Pure OAI Repository

Parallel Gaussian elimination of a block tridiagonal matrix using multiple microcomputers

Author: Blech Richard A.
Publication venue
Publication date
Field of study

The solution of a block tridiagonal matrix using parallel processing is demonstrated. The multiprocessor system on which results were obtained and the software environment used to program that system are described. Theoretical partitioning and resource allocation for the Gaussian elimination method used to solve the matrix are discussed. The results obtained from running 1, 2 and 3 processor versions of the block tridiagonal solver are presented. The PASCAL source code for these solvers is given in the appendix, and may be transportable to other shared memory parallel processors provided that the synchronization outlines are reproduced on the target system

University of Michigan Library Repository

NASA Technical Reports Server