Search CORE

34,795 research outputs found

Reclaiming the energy of a schedule: models and algorithms

Author: Aupy Guillaume
Benoit Anne
Dufossé Fanny
Robert Yves
Publication venue: 'Wiley'
Publication date: 01/04/2011
Field of study

We consider a task graph to be executed on a set of processors. We assume that the mapping is given, say by an ordered list of tasks to execute on each processor, and we aim at optimizing the energy consumption while enforcing a prescribed bound on the execution time. While it is not possible to change the allocation of a task, it is possible to change its speed. Rather than using a local approach such as backfilling, we consider the problem as a whole and study the impact of several speed variation models on its complexity. For continuous speeds, we give a closed-form formula for trees and series-parallel graphs, and we cast the problem into a geometric programming problem for general directed acyclic graphs. We show that the classical dynamic voltage and frequency scaling (DVFS) model with discrete modes leads to a NP-complete problem, even if the modes are regularly distributed (an important particular case in practice, which we analyze as the incremental model). On the contrary, the VDD-hopping model leads to a polynomial solution. Finally, we provide an approximation algorithm for the incremental model, which we extend for the general DVFS model.Comment: A two-page extended abstract of this work appeared as a short presentation in SPAA'2011, while the long version has been accepted for publication in "Concurrency and Computation: Practice and Experience

arXiv.org e-Print Archive

HAL-ENS-LYON

Crossref

INRIA a CCSD electronic archive server

Hal-Diderot

Recommended from our members

Computer-aided programming for multiprocessing systems

Author: Gajski Daniel D.
Wu Min-You
Publication venue: eScholarship, University of California
Publication date: 30/06/1988
Field of study

As both the number of processors and the complexity of problems to be solved increase, programming multiprocessing systems becomes more difficult and error-prone. This report discusses parallel models of computation and tools for computer-aided programming (CAP). Program development tools are necessary since programmers are not able to develop complex parallel programs efficiently. In particular, a CAP tool, named Hypertool, is described here. It performs scheduling and handles the communication primitive insertion automatically so that many errors are eliminated. It also generates the performance estimates and other program quality measures to help programmers in improving their algorithms and programs. Experiments have shown that up to a 300% performance improvement can be achieved by computer-aided programming

eScholarship - University of California

Task scheduling techniques for asymmetric multi-core systems

Author: Ayguadé Parra Eduard
Badia Sala Rosa Maria
Casas Marc
Chronaki Kallia
Labarta Mancho Jesús José
Moreto Planas Miquel
Rico Alejandro
Valero Cortés Mateo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

As performance and energy efficiency have become the main challenges for next-generation high-performance computing, asymmetric multi-core architectures can provide solutions to tackle these issues. Parallel programming models need to be able to suit the needs of such systems and keep on increasing the application’s portability and efficiency. This paper proposes two task scheduling approaches that target asymmetric systems. These dynamic scheduling policies reduce total execution time either by detecting the longest or the critical path of the dynamic task dependency graph of the application, or by finding the earliest executor of a task. They use dynamic scheduling and information discoverable during execution, fact that makes them implementable and functional without the need of off-line profiling. In our evaluation we compare these scheduling approaches with two existing state-of the art heterogeneous schedulers and we track their improvement over a FIFO baseline scheduler. We show that the heterogeneous schedulers improve the baseline by up to 1.45 in a real 8-core asymmetric system and up to 2.1 in a simulated 32-core asymmetric chip.This work has been supported by the Spanish Government (SEV2015-0493), by the Spanish Ministry of Science and Innovation (contract TIN2015-65316-P), by Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272), by the RoMoL ERC Advanced Grant (GA 321253) and the European HiPEAC Network of Excellence. The Mont-Blanc project receives funding from the EU’s Seventh Framework Programme (FP7/2007-2013) under grant agreement no 610402 and from the EU’s H2020 Framework Programme (H2020/2014-2020) under grant agreement no 671697. M. Moretó has been partially supported by the Ministry of Economy and Competitiveness under Juan de la Cierva postdoctoral fellowship number JCI-2012-15047. M. Casas is supported by the Secretary for Universities and Research of the Ministry of Economy and Knowledge of the Government of Catalonia and the Cofund programme of the Marie Curie Actions of the 7th R&D Framework Programme of the European Union (Contract 2013 BP B 00243).Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Digital.CSIC

GraphLab: A New Framework for Parallel Machine Learning

Author: Bickson Danny
Gonzalez Joseph
Guestrin Carlos
Hellerstein Joseph M.
Kyrola Aapo
Low Yucheng
Publication venue
Publication date: 01/01/2010
Field of study

Designing and implementing efficient, provably correct parallel machine learning (ML) algorithms is challenging. Existing high-level parallel abstractions like MapReduce are insufficiently expressive while low-level tools like MPI and Pthreads leave ML experts repeatedly solving the same design challenges. By targeting common patterns in ML, we developed GraphLab, which improves upon abstractions like MapReduce by compactly expressing asynchronous iterative algorithms with sparse computational dependencies while ensuring data consistency and achieving a high degree of parallel performance. We demonstrate the expressiveness of the GraphLab framework by designing and implementing parallel versions of belief propagation, Gibbs sampling, Co-EM, Lasso and Compressed Sensing. We show that using GraphLab we can achieve excellent parallel performance on large scale real-world problems

arXiv.org e-Print Archive

CiteSeerX

Learning Scheduling Algorithms for Data Processing Clusters

Author: Abadi Martín
Addanki Ravichandra
Dai Hanjun
Finn Chelsea
Ghodsi Ali
Gog Ionel
Grandl Robert
Greensmith Evan
Hindman Benjamin
Kingma Diederik P
Mao Hongzi
Mao Hongzi
Marcus Ryan
Mirhoseini Azalia
Mirhoseini Azalia
Pinto Lerrel
Schulman John
Spark Apache
Sutton S.
Weaver Lex
Zaharia Matei
Publication venue
Publication date: 21/08/2019
Field of study

Efficiently scheduling data processing jobs on distributed compute clusters requires complex algorithms. Current systems, however, use simple generalized heuristics and ignore workload characteristics, since developing and tuning a scheduling policy for each workload is infeasible. In this paper, we show that modern machine learning techniques can generate highly-efficient policies automatically. Decima uses reinforcement learning (RL) and neural networks to learn workload-specific scheduling algorithms without any human instruction beyond a high-level objective such as minimizing average job completion time. Off-the-shelf RL techniques, however, cannot handle the complexity and scale of the scheduling problem. To build Decima, we had to develop new representations for jobs' dependency graphs, design scalable RL models, and invent RL training methods for dealing with continuous stochastic job arrivals. Our prototype integration with Spark on a 25-node cluster shows that Decima improves the average job completion time over hand-tuned scheduling heuristics by at least 21%, achieving up to 2x improvement during periods of high cluster load

arXiv.org e-Print Archive

Crossref

DSpace@MIT