Search CORE

42,399 research outputs found

Speed-scaling with no Preemptions

Author: Bampis Evripidis
Letsios Dimitrios
Lucarelli Giorgio
Publication venue
Publication date: 29/07/2014
Field of study

We revisit the non-preemptive speed-scaling problem, in which a set of jobs have to be executed on a single or a set of parallel speed-scalable processor(s) between their release dates and deadlines so that the energy consumption to be minimized. We adopt the speed-scaling mechanism first introduced in [Yao et al., FOCS 1995] according to which the power dissipated is a convex function of the processor's speed. Intuitively, the higher is the speed of a processor, the higher is the energy consumption. For the single-processor case, we improve the best known approximation algorithm by providing a

(1+\epsilon)^{\alpha}\tilde{B}_{\alpha}

-approximation algorithm, where

\tilde{B}_{\alpha}

is a generalization of the Bell number. For the multiprocessor case, we present an approximation algorithm of ratio

\tilde{B}_{\alpha}((1+\epsilon)(1+\frac{w_{\max}}{w_{\min}}))^{\alpha}

improving the best known result by a factor of

(\frac{5}{2})^{\alpha-1}(\frac{w_{\max}}{w_{\min}})^{\alpha}

. Notice that our result holds for the fully heterogeneous environment while the previous known result holds only in the more restricted case of parallel processors with identical power functions

arXiv.org e-Print Archive

CiteSeerX

Hal - Université Grenoble Alpes

Parallel Algorithm and Dynamic Exponent for Diffusion-limited Aggregation

Author: A. Gibbons
C. Amitrano
C. Amitrano
C. H. Bennett
C. H. Bennett
C. H. Papadimitriou
H. Kaufman
J. Machta
J. Machta
J. Machta
P. Ossadnik
P. Ossadnik
R. C. Ball
R. F. Voss
R. F. Voss
R. Greenlaw
R. J. Anderson
S. Tolman
T. A. Witten
T. C. Halsey
T. Vicsek
Publication venue: 'American Physical Society (APS)'
Publication date: 17/12/1996
Field of study

A parallel algorithm for ``diffusion-limited aggregation'' (DLA) is described and analyzed from the perspective of computational complexity. The dynamic exponent z of the algorithm is defined with respect to the probabilistic parallel random-access machine (PRAM) model of parallel computation according to

T \sim L^{z}

, where L is the cluster size, T is the running time, and the algorithm uses a number of processors polynomial in L\@. It is argued that z=D-D_2/2, where D is the fractal dimension and D_2 is the second generalized dimension. Simulations of DLA are carried out to measure D_2 and to test scaling assumptions employed in the complexity analysis of the parallel algorithm. It is plausible that the parallel algorithm attains the minimum possible value of the dynamic exponent in which case z characterizes the intrinsic history dependence of DLA.Comment: 24 pages Revtex and 2 figures. A major improvement to the algorithm and smaller dynamic exponent in this versio

arXiv.org e-Print Archive

Crossref

Energy-Efficient Multiprocessor Scheduling for Flow Time and Makespan

Author: Agrawal
Albers
Albers
Andrew
Andrew
Bansal
Bansal
Becchetti
Bender
Blumofe
Borodin
Boyd
Brecht
Brecht
Brooks
Chan
Chan
Chan
Chan
Chen
Deng
Edmonds
Edmonds
Edmonds
Fox
Greiner
Grunwald
Hardy
He
Herbert
Hongyang Sun
Im
Irani
Jaffe
Kalyanasundaram
Kim
Kim
Lam
Lam
Mudge
Pruhs
Pruhs
Robert
Rui Fan
Shmoys
Sun
Sun
Sun
Trick
Weiser
Wen-Jing Hsu
Yao
Yuxiong He
Zhang
Zhao
Publication venue: 'Elsevier BV'
Publication date: 19/01/2014
Field of study

We consider energy-efficient scheduling on multiprocessors, where the speed of each processor can be individually scaled, and a processor consumes power

s^{\alpha}

when running at speed

s

, for

\alpha>1

. A scheduling algorithm needs to decide at any time both processor allocations and processor speeds for a set of parallel jobs with time-varying parallelism. The objective is to minimize the sum of the total energy consumption and certain performance metric, which in this paper includes total flow time and makespan. For both objectives, we present instantaneous parallelism clairvoyant (IP-clairvoyant) algorithms that are aware of the instantaneous parallelism of the jobs at any time but not their future characteristics, such as remaining parallelism and work. For total flow time plus energy, we present an

O(1)

-competitive algorithm, which significantly improves upon the best known non-clairvoyant algorithm and is the first constant competitive result on multiprocessor speed scaling for parallel jobs. In the case of makespan plus energy, which is considered for the first time in the literature, we present an

O(\ln^{1-1/\alpha}P)

-competitive algorithm, where

P

is the total number of processors. We show that this algorithm is asymptotically optimal by providing a matching lower bound. In addition, we also study non-clairvoyant scheduling for total flow time plus energy, and present an algorithm that achieves

O(\ln P)

-competitive for jobs with arbitrary release time and

O(\ln^{1/\alpha}P)

-competitive for jobs with identical release time. Finally, we prove an

\Omega(\ln^{1/\alpha}P)

lower bound on the competitive ratio of any non-clairvoyant algorithm, matching the upper bound of our algorithm for jobs with identical release time

arXiv.org e-Print Archive

Crossref

A Parallel Adaptive P3M code with Hierarchical Particle Reordering

Author: Anderson
Bagla
Balsara
Barnes
Becciani
Blumenthal
Bode
Boris
Brieu
Couchman
Couchman
Dave
Decyk
Dubinski
Dubinski
Eastwood
Efstathiou
Evrard
Ferrell
Frenk
Frigo
Gingold
Greengard
H.M.P. Couchman
Hernquist
Hernquist
Hockney
Kawata
Kravtsov
Li
Lia
MacFarland
Miocchi
Monaghan
Navarro
Pearce
Robert J. Thacker
Serna
Snir
Spergel
Springel
Springel
Steinmetz
Sugimoto
Swarztrauber
Thacker
Thacker
Thacker
Thacker
Theuns
Vetterling
Wadsley
White
Wisdom
Wood
Publication venue: 'Elsevier BV'
Publication date: 01/01/2005
Field of study

We discuss the design and implementation of HYDRA_OMP a parallel implementation of the Smoothed Particle Hydrodynamics-Adaptive P3M (SPH-AP3M) code HYDRA. The code is designed primarily for conducting cosmological hydrodynamic simulations and is written in Fortran77+OpenMP. A number of optimizations for RISC processors and SMP-NUMA architectures have been implemented, the most important optimization being hierarchical reordering of particles within chaining cells, which greatly improves data locality thereby removing the cache misses typically associated with linked lists. Parallel scaling is good, with a minimum parallel scaling of 73% achieved on 32 nodes for a variety of modern SMP architectures. We give performance data in terms of the number of particle updates per second, which is a more useful performance metric than raw MFlops. A basic version of the code will be made available to the community in the near future.Comment: 34 pages, 12 figures, accepted for publication in Computer Physics Communication

arXiv.org e-Print Archive

CiteSeerX

Crossref

CERN Document Server