4,430 research outputs found
Co-Scheduling Algorithms for High-Throughput Workload Execution
This paper investigates co-scheduling algorithms for processing a set of
parallel applications. Instead of executing each application one by one, using
a maximum degree of parallelism for each of them, we aim at scheduling several
applications concurrently. We partition the original application set into a
series of packs, which are executed one by one. A pack comprises several
applications, each of them with an assigned number of processors, with the
constraint that the total number of processors assigned within a pack does not
exceed the maximum number of available processors. The objective is to
determine a partition into packs, and an assignment of processors to
applications, that minimize the sum of the execution times of the packs. We
thoroughly study the complexity of this optimization problem, and propose
several heuristics that exhibit very good performance on a variety of
workloads, whose application execution times model profiles of parallel
scientific codes. We show that co-scheduling leads to to faster workload
completion time and to faster response times on average (hence increasing
system throughput and saving energy), for significant benefits over traditional
scheduling from both the user and system perspectives
Recommended from our members
The scheduling of sparse matrix-vector multiplication on a massively parallel dap computer
An efficient data structure is presented which supports general unstructured sparse matrix-vector multiplications on a Distributed Array of Processors (DAP). This approach seeks to reduce the inter-processor data movements and organises the operations in batches of massively parallel steps by a heuristic scheduling procedure performed on the host computer.
The resulting data structure is of particular relevance to iterative schemes for solving linear systems. Performance results for matrices taken from well known Linear Programming (LP) test problems are presented and analysed
Survey on Combinatorial Register Allocation and Instruction Scheduling
Register allocation (mapping variables to processor registers or memory) and
instruction scheduling (reordering instructions to increase instruction-level
parallelism) are essential tasks for generating efficient assembly code in a
compiler. In the last three decades, combinatorial optimization has emerged as
an alternative to traditional, heuristic algorithms for these two tasks.
Combinatorial optimization approaches can deliver optimal solutions according
to a model, can precisely capture trade-offs between conflicting decisions, and
are more flexible at the expense of increased compilation time.
This paper provides an exhaustive literature review and a classification of
combinatorial optimization approaches to register allocation and instruction
scheduling, with a focus on the techniques that are most applied in this
context: integer programming, constraint programming, partitioned Boolean
quadratic programming, and enumeration. Researchers in compilers and
combinatorial optimization can benefit from identifying developments, trends,
and challenges in the area; compiler practitioners may discern opportunities
and grasp the potential benefit of applying combinatorial optimization
Energy-Aware Lease Scheduling in Virtualized Data Centers
Energy efficiency has become an important measurement of scheduling
algorithms in virtualized data centers. One of the challenges of
energy-efficient scheduling algorithms, however, is the trade-off between
minimizing energy consumption and satisfying quality of service (e.g.
performance, resource availability on time for reservation requests). We
consider resource needs in the context of virtualized data centers of a private
cloud system, which provides resource leases in terms of virtual machines (VMs)
for user applications. In this paper, we propose heuristics for scheduling VMs
that address the above challenge. On performance evaluation, simulated results
have shown a significant reduction on total energy consumption of our proposed
algorithms compared with an existing First-Come-First-Serve (FCFS) scheduling
algorithm with the same fulfillment of performance requirements. We also
discuss the improvement of energy saving when additionally using migration
policies to the above mentioned algorithms.Comment: 10 pages, 2 figures, Proceedings of the Fifth International
Conference on High Performance Scientific Computing, March 5-9, 2012, Hanoi,
Vietna
SLO-aware Colocation of Data Center Tasks Based on Instantaneous Processor Requirements
In a cloud data center, a single physical machine simultaneously executes
dozens of highly heterogeneous tasks. Such colocation results in more efficient
utilization of machines, but, when tasks' requirements exceed available
resources, some of the tasks might be throttled down or preempted. We analyze
version 2.1 of the Google cluster trace that shows short-term (1 second) task
CPU usage. Contrary to the assumptions taken by many theoretical studies, we
demonstrate that the empirical distributions do not follow any single
distribution. However, high percentiles of the total processor usage (summed
over at least 10 tasks) can be reasonably estimated by the Gaussian
distribution. We use this result for a probabilistic fit test, called the
Gaussian Percentile Approximation (GPA), for standard bin-packing algorithms.
To check whether a new task will fit into a machine, GPA checks whether the
resulting distribution's percentile corresponding to the requested service
level objective, SLO is still below the machine's capacity. In our simulation
experiments, GPA resulted in colocations exceeding the machines' capacity with
a frequency similar to the requested SLO.Comment: Author's version of a paper published in ACM SoCC'1
Packing Sporadic Real-Time Tasks on Identical Multiprocessor Systems
In real-time systems, in addition to the functional correctness recurrent
tasks must fulfill timing constraints to ensure the correct behavior of the
system. Partitioned scheduling is widely used in real-time systems, i.e., the
tasks are statically assigned onto processors while ensuring that all timing
constraints are met. The decision version of the problem, which is to check
whether the deadline constraints of tasks can be satisfied on a given number of
identical processors, has been known -complete in the strong sense.
Several studies on this problem are based on approximations involving resource
augmentation, i.e., speeding up individual processors. This paper studies
another type of resource augmentation by allocating additional processors, a
topic that has not been explored until recently. We provide polynomial-time
algorithms and analysis, in which the approximation factors are dependent upon
the input instances. Specifically, the factors are related to the maximum ratio
of the period to the relative deadline of a task in the given task set. We also
show that these algorithms unfortunately cannot achieve a constant
approximation factor for general cases. Furthermore, we prove that the problem
does not admit any asymptotic polynomial-time approximation scheme (APTAS)
unless when the task set has constrained deadlines, i.e.,
the relative deadline of a task is no more than the period of the task.Comment: Accepted and to appear in ISAAC 2018, Yi-Lan, Taiwa
Parallel Gaussian elimination of a block tridiagonal matrix using multiple microcomputers
The solution of a block tridiagonal matrix using parallel processing is demonstrated. The multiprocessor system on which results were obtained and the software environment used to program that system are described. Theoretical partitioning and resource allocation for the Gaussian elimination method used to solve the matrix are discussed. The results obtained from running 1, 2 and 3 processor versions of the block tridiagonal solver are presented. The PASCAL source code for these solvers is given in the appendix, and may be transportable to other shared memory parallel processors provided that the synchronization outlines are reproduced on the target system
- …