Search CORE

7,027 research outputs found

3E: Energy-Efficient Elastic Scheduling for Independent Tasks in Heterogeneous Computing Systems

Author: Ge Rong
He Chuan
Sun Jinguang
Zhu Xiaomin
Publication venue: e-Publications@Marquette
Publication date: 01/02/2013
Field of study

Reducing energy consumption is a major design constraint for modern heterogeneous computing systems to minimize electricity cost, improve system reliability and protect environment. Conventional energy-efficient scheduling strategies developed on these systems do not sufficiently exploit the system elasticity and adaptability for maximum energy savings, and do not simultaneously take account of user expected finish time. In this paper, we develop a novel scheduling strategy named energy-efficient elastic (3E) scheduling for aperiodic, independent and non-real-time tasks with user expected finish times on DVFS-enabled heterogeneous computing systems. The 3E strategy adjusts processors’ supply voltages and frequencies according to the system workload, and makes trade-offs between energy consumption and user expected finish times. Compared with other energy-efficient strategies, 3E significantly improves the scheduling quality and effectively enhances the system elasticity

epublications@Marquette

Energy-Efficient Multiprocessor Scheduling for Flow Time and Makespan

Author: Agrawal
Albers
Albers
Andrew
Andrew
Bansal
Bansal
Becchetti
Bender
Blumofe
Borodin
Boyd
Brecht
Brecht
Brooks
Chan
Chan
Chan
Chan
Chen
Deng
Edmonds
Edmonds
Edmonds
Fox
Greiner
Grunwald
Hardy
He
Herbert
Hongyang Sun
Im
Irani
Jaffe
Kalyanasundaram
Kim
Kim
Lam
Lam
Mudge
Pruhs
Pruhs
Robert
Rui Fan
Shmoys
Sun
Sun
Sun
Trick
Weiser
Wen-Jing Hsu
Yao
Yuxiong He
Zhang
Zhao
Publication venue: 'Elsevier BV'
Publication date: 19/01/2014
Field of study

We consider energy-efficient scheduling on multiprocessors, where the speed of each processor can be individually scaled, and a processor consumes power

s^{\alpha}

when running at speed

s

, for

\alpha>1

. A scheduling algorithm needs to decide at any time both processor allocations and processor speeds for a set of parallel jobs with time-varying parallelism. The objective is to minimize the sum of the total energy consumption and certain performance metric, which in this paper includes total flow time and makespan. For both objectives, we present instantaneous parallelism clairvoyant (IP-clairvoyant) algorithms that are aware of the instantaneous parallelism of the jobs at any time but not their future characteristics, such as remaining parallelism and work. For total flow time plus energy, we present an

O(1)

-competitive algorithm, which significantly improves upon the best known non-clairvoyant algorithm and is the first constant competitive result on multiprocessor speed scaling for parallel jobs. In the case of makespan plus energy, which is considered for the first time in the literature, we present an

O(\ln^{1-1/\alpha}P)

-competitive algorithm, where

P

is the total number of processors. We show that this algorithm is asymptotically optimal by providing a matching lower bound. In addition, we also study non-clairvoyant scheduling for total flow time plus energy, and present an algorithm that achieves

O(\ln P)

-competitive for jobs with arbitrary release time and

O(\ln^{1/\alpha}P)

-competitive for jobs with identical release time. Finally, we prove an

\Omega(\ln^{1/\alpha}P)

lower bound on the competitive ratio of any non-clairvoyant algorithm, matching the upper bound of our algorithm for jobs with identical release time

arXiv.org e-Print Archive

Crossref

Climbing depth-bounded adjacent discrepancy search for solving hybrid flow shop scheduling problems with multiprocessor tasks

Author: A. Ben Hmida
A. Ben Hmida
A. Jouglet
A. Sprecher
C. Oğuz
C. Oğuz
F.S. Şerifoğlu
F.S. Şerifoğlu
G. Brooks
J. Chen
J.E. Kelley Jr
M. Fischetti
S. Bertel
Z. Kiziltan
Publication venue
Publication date: 01/01/2011
Field of study

This paper considers multiprocessor task scheduling in a multistage hybrid flow-shop environment. The problem even in its simplest form is NP-hard in the strong sense. The great deal of interest for this problem, besides its theoretical complexity, is animated by needs of various manufacturing and computing systems. We propose a new approach based on limited discrepancy search to solve the problem. Our method is tested with reference to a proposed lower bound as well as the best-known solutions in literature. Computational results show that the developed approach is efficient in particular for large-size problems

arXiv.org e-Print Archive

CiteSeerX

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

HAL-INSA Toulouse

Optimization of Discrete-parameter Multiprocessor Systems using a Novel Ergodic Interpolation Technique

Author: Desai Madhav P.
Karanjkar Neha V.
Publication venue
Publication date: 14/07/2015
Field of study

Modern multi-core systems have a large number of design parameters, most of which are discrete-valued, and this number is likely to keep increasing as chip complexity rises. Further, the accurate evaluation of a potential design choice is computationally expensive because it requires detailed cycle-accurate system simulation. If the discrete parameter space can be embedded into a larger continuous parameter space, then continuous space techniques can, in principle, be applied to the system optimization problem. Such continuous space techniques often scale well with the number of parameters. We propose a novel technique for embedding the discrete parameter space into an extended continuous space so that continuous space techniques can be applied to the embedded problem using cycle accurate simulation for evaluating the objective function. This embedding is implemented using simulation-based ergodic interpolation, which, unlike spatial interpolation, produces the interpolated value within a single simulation run irrespective of the number of parameters. We have implemented this interpolation scheme in a cycle-based system simulator. In a characterization study, we observe that the interpolated performance curves are continuous, piece-wise smooth, and have low statistical error. We use the ergodic interpolation-based approach to solve a large multi-core design optimization problem with 31 design parameters. Our results indicate that continuous space optimization using ergodic interpolation-based embedding can be a viable approach for large multi-core design optimization problems.Comment: A short version of this paper will be published in the proceedings of IEEE MASCOTS 2015 conferenc

arXiv.org e-Print Archive

CiteSeerX

Highly accelerated simulations of glassy dynamics using GPUs: caveats on limited floating-point precision

Author: Anderson
Block
Dekker
Felix Höfling
Flenner
Frenkel
Götze
Hansen
Harvey
Knuth
Knuth
Kob
Kob
Kob
Lippert
Liu
Mosayebi
Peter H. Colberg
Plimpton
Preis
Rapaport
Sagan
Stone
van Meel
Voelz
Weeks
Xu
Yang
Zagha
Publication venue: 'Elsevier BV'
Publication date: 01/01/2011
Field of study

Modern graphics processing units (GPUs) provide impressive computing resources, which can be accessed conveniently through the CUDA programming interface. We describe how GPUs can be used to considerably speed up molecular dynamics (MD) simulations for system sizes ranging up to about 1 million particles. Particular emphasis is put on the numerical long-time stability in terms of energy and momentum conservation, and caveats on limited floating-point precision are issued. Strict energy conservation over 10^8 MD steps is obtained by double-single emulation of the floating-point arithmetic in accuracy-critical parts of the algorithm. For the slow dynamics of a supercooled binary Lennard-Jones mixture, we demonstrate that the use of single-floating point precision may result in quantitatively and even physically wrong results. For simulations of a Lennard-Jones fluid, the described implementation shows speedup factors of up to 80 compared to a serial implementation for the CPU, and a single GPU was found to compare with a parallelised MD simulation using 64 distributed cores.Comment: 12 pages, 7 figures, to appear in Comp. Phys. Comm., HALMD package licensed under the GPL, see http://research.colberg.org/projects/halm

arXiv.org e-Print Archive

Institute of Transport Research:Publications

Crossref

Cycle-accurate evaluation of reconfigurable photonic networks-on-chip

Author: Artundo Iñigo
Debaes Christof
Heirman Wim
Thienpont Hugo
Van Campenhout Jan
Publication venue: 'SPIE-Intl Soc Optical Eng'
Publication date: 01/01/2010
Field of study

There is little doubt that the most important limiting factors of the performance of next-generation Chip Multiprocessors (CMPs) will be the power efficiency and the available communication speed between cores. Photonic Networks-on-Chip (NoCs) have been suggested as a viable route to relieve the off- and on-chip interconnection bottleneck. Low-loss integrated optical waveguides can transport very high-speed data signals over longer distances as compared to on-chip electrical signaling. In addition, with the development of silicon microrings, photonic switches can be integrated to route signals in a data-transparent way. Although several photonic NoC proposals exist, their use is often limited to the communication of large data messages due to a relatively long set-up time of the photonic channels. In this work, we evaluate a reconfigurable photonic NoC in which the topology is adapted automatically (on a microsecond scale) to the evolving traffic situation by use of silicon microrings. To evaluate this system's performance, the proposed architecture has been implemented in a detailed full-system cycle-accurate simulator which is capable of generating realistic workloads and traffic patterns. In addition, a model was developed to estimate the power consumption of the full interconnection network which was compared with other photonic and electrical NoC solutions. We find that our proposed network architecture significantly lowers the average memory access latency (35% reduction) while only generating a modest increase in power consumption (20%), compared to a conventional concentrated mesh electrical signaling approach. When comparing our solution to high-speed circuit-switched photonic NoCs, long photonic channel set-up times can be tolerated which makes our approach directly applicable to current shared-memory CMPs

Crossref

Ghent University Academic Bibliography