24,871 research outputs found
Co-Scheduling Algorithms for High-Throughput Workload Execution
This paper investigates co-scheduling algorithms for processing a set of
parallel applications. Instead of executing each application one by one, using
a maximum degree of parallelism for each of them, we aim at scheduling several
applications concurrently. We partition the original application set into a
series of packs, which are executed one by one. A pack comprises several
applications, each of them with an assigned number of processors, with the
constraint that the total number of processors assigned within a pack does not
exceed the maximum number of available processors. The objective is to
determine a partition into packs, and an assignment of processors to
applications, that minimize the sum of the execution times of the packs. We
thoroughly study the complexity of this optimization problem, and propose
several heuristics that exhibit very good performance on a variety of
workloads, whose application execution times model profiles of parallel
scientific codes. We show that co-scheduling leads to to faster workload
completion time and to faster response times on average (hence increasing
system throughput and saving energy), for significant benefits over traditional
scheduling from both the user and system perspectives
3E: Energy-Efficient Elastic Scheduling for Independent Tasks in Heterogeneous Computing Systems
Reducing energy consumption is a major design constraint for modern heterogeneous computing systems to minimize electricity cost, improve system reliability and protect environment. Conventional energy-efficient scheduling strategies developed on these systems do not sufficiently exploit the system elasticity and adaptability for maximum energy savings, and do not simultaneously take account of user expected finish time. In this paper, we develop a novel scheduling strategy named energy-efficient elastic (3E) scheduling for aperiodic, independent and non-real-time tasks with user expected finish times on DVFS-enabled heterogeneous computing systems. The 3E strategy adjusts processors’ supply voltages and frequencies according to the system workload, and makes trade-offs between energy consumption and user expected finish times. Compared with other energy-efficient strategies, 3E significantly improves the scheduling quality and effectively enhances the system elasticity
Reclaiming the energy of a schedule: models and algorithms
We consider a task graph to be executed on a set of processors. We assume
that the mapping is given, say by an ordered list of tasks to execute on each
processor, and we aim at optimizing the energy consumption while enforcing a
prescribed bound on the execution time. While it is not possible to change the
allocation of a task, it is possible to change its speed. Rather than using a
local approach such as backfilling, we consider the problem as a whole and
study the impact of several speed variation models on its complexity. For
continuous speeds, we give a closed-form formula for trees and series-parallel
graphs, and we cast the problem into a geometric programming problem for
general directed acyclic graphs. We show that the classical dynamic voltage and
frequency scaling (DVFS) model with discrete modes leads to a NP-complete
problem, even if the modes are regularly distributed (an important particular
case in practice, which we analyze as the incremental model). On the contrary,
the VDD-hopping model leads to a polynomial solution. Finally, we provide an
approximation algorithm for the incremental model, which we extend for the
general DVFS model.Comment: A two-page extended abstract of this work appeared as a short
presentation in SPAA'2011, while the long version has been accepted for
publication in "Concurrency and Computation: Practice and Experience
A Survey of Techniques For Improving Energy Efficiency in Embedded Computing Systems
Recent technological advances have greatly improved the performance and
features of embedded systems. With the number of just mobile devices now
reaching nearly equal to the population of earth, embedded systems have truly
become ubiquitous. These trends, however, have also made the task of managing
their power consumption extremely challenging. In recent years, several
techniques have been proposed to address this issue. In this paper, we survey
the techniques for managing power consumption of embedded systems. We discuss
the need of power management and provide a classification of the techniques on
several important parameters to highlight their similarities and differences.
This paper is intended to help the researchers and application-developers in
gaining insights into the working of power management techniques and designing
even more efficient high-performance embedded systems of tomorrow
Task scheduling techniques for asymmetric multi-core systems
As performance and energy efficiency have become the main challenges for next-generation high-performance computing, asymmetric multi-core architectures can provide solutions to tackle these issues. Parallel programming models need to be able to suit the needs of such systems and keep on increasing the application’s portability and efficiency. This paper proposes two task scheduling approaches that target asymmetric systems. These dynamic scheduling policies reduce total execution time either by detecting the longest or the critical path of the dynamic task dependency graph of the application, or by finding the earliest executor of a task. They use dynamic scheduling and information discoverable during execution, fact that makes them implementable and functional without the need of off-line profiling. In our evaluation we compare these scheduling approaches with two existing state-of the art heterogeneous schedulers and we track their improvement over a FIFO baseline scheduler. We show that the heterogeneous schedulers improve the baseline by up to 1.45 in a real 8-core asymmetric system and up to 2.1 in a simulated 32-core asymmetric chip.This work has been supported by the Spanish Government (SEV2015-0493), by the Spanish Ministry of Science and Innovation (contract TIN2015-65316-P), by Generalitat de
Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272), by the RoMoL ERC Advanced Grant (GA 321253) and the
European HiPEAC Network of Excellence. The Mont-Blanc project receives funding from the EU’s Seventh Framework Programme (FP7/2007-2013) under grant agreement
no 610402 and from the EU’s H2020 Framework Programme (H2020/2014-2020) under grant agreement no 671697. M.
MoretĂł has been partially supported by the Ministry of Economy and Competitiveness under Juan de la Cierva postdoctoral fellowship number JCI-2012-15047. M. Casas
is supported by the Secretary for Universities and Research of the Ministry of Economy and Knowledge of the Government of Catalonia and the Cofund programme of the Marie
Curie Actions of the 7th R&D Framework Programme of the European Union (Contract 2013 BP B 00243).Peer ReviewedPostprint (author's final draft
A Framework to Analyze the Performance of Load Balancing Schemes for Ensembles of Stochastic Simulations
Ensembles of simulations are employed to estimate the statistics of possible future states of a system, and are widely used in important applications such as climate change and biological modeling. Ensembles of runs can naturally be executed in parallel. However, when the CPU times of individual simulations vary considerably, a simple strategy of assigning an equal number of tasks per processor can lead to serious work imbalances and low parallel efficiency. This paper presents a new probabilistic framework to analyze the performance of dynamic load balancing algorithms for ensembles of simulations where many tasks are mapped onto each processor, and where the individual compute times vary considerably among tasks. Four load balancing strategies are discussed: most-dividing, all-redistribution, random-polling, and neighbor-redistribution. Simulation results with a stochastic budding yeast cell cycle model is consistent with the theoretical analysis. It is especially significant that there is a provable global decrease in load imbalance for the local rebalancing algorithms due to scalability concerns for the global rebalancing algorithms. The overall simulation time is reduced by up to 25%, and the total processor idle time by 85%
- …