75,885 research outputs found
A Fully Polynomial-Time Approximation Scheme for Speed Scaling with Sleep State
We study classical deadline-based preemptive scheduling of tasks in a
computing environment equipped with both dynamic speed scaling and sleep state
capabilities: Each task is specified by a release time, a deadline and a
processing volume, and has to be scheduled on a single, speed-scalable
processor that is supplied with a sleep state. In the sleep state, the
processor consumes no energy, but a constant wake-up cost is required to
transition back to the active state. In contrast to speed scaling alone, the
addition of a sleep state makes it sometimes beneficial to accelerate the
processing of tasks in order to transition the processor to the sleep state for
longer amounts of time and incur further energy savings. The goal is to output
a feasible schedule that minimizes the energy consumption. Since the
introduction of the problem by Irani et al. [16], its exact computational
complexity has been repeatedly posed as an open question (see e.g. [2,8,15]).
The currently best known upper and lower bounds are a 4/3-approximation
algorithm and NP-hardness due to [2] and [2,17], respectively. We close the
aforementioned gap between the upper and lower bound on the computational
complexity of speed scaling with sleep state by presenting a fully
polynomial-time approximation scheme for the problem. The scheme is based on a
transformation to a non-preemptive variant of the problem, and a discretization
that exploits a carefully defined lexicographical ordering among schedules
Energy-Efficient Scheduling for Homogeneous Multiprocessor Systems
We present a number of novel algorithms, based on mathematical optimization
formulations, in order to solve a homogeneous multiprocessor scheduling
problem, while minimizing the total energy consumption. In particular, for a
system with a discrete speed set, we propose solving a tractable linear
program. Our formulations are based on a fluid model and a global scheduling
scheme, i.e. tasks are allowed to migrate between processors. The new methods
are compared with three global energy/feasibility optimal workload allocation
formulations. Simulation results illustrate that our methods achieve both
feasibility and energy optimality and outperform existing methods for
constrained deadline tasksets. Specifically, the results provided by our
algorithm can achieve up to an 80% saving compared to an algorithm without a
frequency scaling scheme and up to 70% saving compared to a constant frequency
scaling scheme for some simulated tasksets. Another benefit is that our
algorithms can solve the scheduling problem in one step instead of using a
recursive scheme. Moreover, our formulations can solve a more general class of
scheduling problems, i.e. any periodic real-time taskset with arbitrary
deadline. Lastly, our algorithms can be applied to both online and offline
scheduling schemes.Comment: Corrected typos: definition of J_i in Section 2.1; (3b)-(3c);
definition of \Phi_A and \Phi_D in paragraph after (6b). Previous equations
were correct only for special case of p_i=d_
Stochastic Analysis of Power-Aware Scheduling
Energy consumption in a computer system can be reduced by dynamic speed scaling, which adapts the processing speed to the current load. This paper studies the optimal way to adjust speed to balance mean response time and mean energy consumption, when jobs arrive as a Poisson process and processor sharing scheduling is used. Both bounds and asymptotics for the optimal speeds are provided. Interestingly, a simple scheme that halts when the system is idle and uses a static rate while the system is busy provides nearly the same performance as the optimal dynamic speed scaling. However, dynamic speed scaling which allocates a higher speed when more jobs are present significantly improves robustness to bursty traffic and mis-estimation of workload parameters
Hierarchical Parallelisation of Functional Renormalisation Group Calculations -- hp-fRG
The functional renormalisation group (fRG) has evolved into a versatile tool
in condensed matter theory for studying important aspects of correlated
electron systems. Practical applications of the method often involve a high
numerical effort, motivating the question in how far High Performance Computing
(HPC) can leverage the approach. In this work we report on a multi-level
parallelisation of the underlying computational machinery and show that this
can speed up the code by several orders of magnitude. This in turn can extend
the applicability of the method to otherwise inaccessible cases. We exploit
three levels of parallelisation: Distributed computing by means of Message
Passing (MPI), shared-memory computing using OpenMP, and vectorisation by means
of SIMD units (single-instruction-multiple-data). Results are provided for two
distinct High Performance Computing (HPC) platforms, namely the IBM-based
BlueGene/Q system JUQUEEN and an Intel Sandy-Bridge-based development cluster.
We discuss how certain issues and obstacles were overcome in the course of
adapting the code. Most importantly, we conclude that this vast improvement can
actually be accomplished by introducing only moderate changes to the code, such
that this strategy may serve as a guideline for other researcher to likewise
improve the efficiency of their codes
- …