Proceedings Work-In-Progress Session of the 20th Euromicro Conference on Real-Time Systems by Caccamo, Marco
 
 
 
Proceedings 
Work-In-Progress Session 
of the 20th Euromicro Conference on 
Real-Time Systems 
 
 
 
 
 
July 2 – 4, 2008 
Prague, Czech Republic 
 
 
 
Organized by the 
Euromicro Technical Committee on Real-Time Systems 
 
 
Edited by Marco Caccamo 
 
 
 
© Copyright 2008 by the authors 
 ii
 
Message from the WIP Chair 
 
 
Dear Colleagues: 
 
Welcome to Prague and to the Work In Progress (WIP) session of the 20th Euromicro Conference on Real-
Time Systems (ECRTS’08). I am pleased to present to you 17 excellent papers on WIP that describe 
innovative research contributions in the broad field of real-time and embedded systems. The 17 accepted 
papers were selected from 23 submissions. Notice that these proceedings are also published as a Technical 
Report from the University of Illinois at Urbana-Champaign, Department of Computer Science 
(UIUCDCS-R-2008-2972). 
 
The purpose of the ECRTS WIP session is to provide researchers in Academia and Industry an opportunity 
to discuss their research ideas and to gather feedback from the real-time community at large. Special thanks 
go to the General Chair - Zdenek Hanzalek and Real-Time Technical Committee Chair - Gerhard Fohler. 
Special thanks also go to the Work-In-Progress Program Committee Members – Xue Liu, Chang-Gun Lee, 
Lucia Lo Bello, Sebastian Fischmeister, Enrico Bini, Sathish Gopalakrishnan, Shelby Funk, and Nathan W. 
Fisher for their hard work in reviewing papers. 
 
 
 
 
Marco Caccamo 
Work-in-Progress Chair 
20th Euromicro Conference on Real-Time Systems (ECRTS’08) 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 iii
Table of Contents 
 
Towards exploiting the preservation strategy of sporadic servers     1 
R.J. Bril, P.J.L. Cuijpers 
 
Utilizing Solid State Disk for Aggressive Processor Power Saving Techniques in Real-Time  
Applications           5 
S.W. Chung, J.S. Lee, D.Y. Kim 
 
Dynamic Configuration of Web Server Clusters with QoS Control    8 
L. Bertini, J.C.B. Leite, D. Mosse’ 
 
A Service-Oriented Programming Model for Real Time WSANS     12 
E. Cañete, M. Díaz, L. Llopis, B. Rubio 
 
Relaxing Event Densities by Lower Bounds on Event Streams     16 
S. Kollmann, K. Albers, F. Slomka 
 
Towards a Practical WCET Analysis Approach Based on Testing    20 
T. Lundqvist, P. Sandin 
 
CAMA: Cache-Aware Memory Allocation for WCET Analysis     24 
J. Herter, J. Reineke, R. Wilhelm 
 
On the complexity of optimal priority assignment for periodic tasks upon identical processors 28  
L. Cucu            
 
A Unified HW/SW Operating System for Partially Runtime Reconfigurable FPGA based  
Computer Systems          32 
Q. Deng, Y. Zhang, N. Guan, Z. Gu 
 
Energy-Aware Task Partitioning and Processing Unit Allocation for Periodic Real-Time Tasks 
 on Systems with Heterogeneous Processing Units      36 
J. Chen, A. Schranzhofer, L. Thiele 
 
Using Fixed Priority Scheduling with Deferred Preemption to Exploit Fluctuating Network  
Bandwidth           40 
M. Holenderski, R.J. Bril, J.J. Lukkien 
 
Estimation of Self-Healing Timing Characteristics for Real-Time Systems under Transient  
Faults            44 
S. Frenkel 
 
Proportional Cache-fair Scheduling for Multi-core Systems     48 
A. Suksompong, D. Isovic 
 
Competitive Reward-Based Scheduling for Real-Time Tasks     52 
N. Fisher, D. Grosu 
 
Real-Time Triangulation Based on Measurements from Mobile ADS–B Aircraft   56 
D. Uhlig, N. Kiyavash, N. Neogi 
 iv
 
Towards Server-based Switched Ethernet for Real-Time Communications   60 
R. Marau, L. Almeida, P. Pedreiras, T. Nolte 
 
ITEM - Implementation of Integrated TDMA and E-ASAP Module    64 
I. Singh, J. Trdlicka, Z. Hanzalek 
 
Towards exploiting the preservation strategy of sporadic servers
Reinder J. Bril and Pieter J.L. Cuijpers
Technische Universiteit Eindhoven (TU/e), Department of Mathematics and Computer Science,
Den Dolech 2, 5600 AZ Eindhoven, The Netherlands
r.j.bril@tue.nl, p.j.l.cuijpers@tue.nl
Abstract
Worst-case response time analysis of hard real-time tasks
under hierarchical fixed priority pre-emptive scheduling
has been addressed in a number of papers. By means of
an example, we show that the existing analysis can be im-
proved when a sporadic server is applied at highest priority
and that server is exclusively used for hard real-time tasks.
Improving the analysis is not straightforward, however, be-
cause the worst-case response time of a task is not neces-
sarily assumed for the first job when released at a critical
instant. Moreover, our example illustrates that the provi-
sion of the capacity of the server may be fragmented. The
paper includes a brief investigation of best-case response
times and response jitter for the example.
1. Introduction
Today, fixed-priority pre-emptive scheduling (FPPS) is
a de-facto standard in industry for scheduling systems with
real-time constraints. A major shortcoming of FPPS, how-
ever, is that temporary or permanent faults occurring in one
application can hamper the execution of other applications.
To resolve this shortcoming, the notion of resource reserva-
tion [8] has been proposed. Resource reservation provides
isolation between applications, effectively protecting an ap-
plication against other, malfunctioning applications.
In a basic setting of a real-time system, we consider a set
of independent applications, where each application con-
sists of a set of periodically released, hard real-time tasks
that are executed on a shared resource. We assume two-
level hierarchical scheduling, where a global scheduler de-
termines which application should be provided the resource
and a local scheduler determines which of the chosen ap-
plication’s tasks should execute. Although each application
could have a dedicated scheduler, we assume FPPS for ev-
ery application. For temporal protection, each application
is associated a dedicated reservation. We assume a peri-
odic resource model [11] for reservations. Conceivable im-
plementations include FPPS for global scheduling using a
specific type of server, such as the periodic server [5], the
deferrable server [13], or the sporadic server [12].
Worst-case response time analysis of real-time tasks un-
der hierarchical FPPS using deferrable servers and sporadic
servers to implement reservations has been addressed in
[1, 5, 6, 10], where the analysis presented in [5] improves
on the earlier work. In [2, 4], we have shown that the anal-
ysis in [5] can be improved for a deferrable server when
that server is exclusively used for hard real-time tasks. Es-
sentially, the absence of soft real-time tasks allows for an
exploitation of the preservation strategy of the deferrable
server. In this paper, we will show that the analysis in [5]
can also be improved for a sporadic server in the absence of
soft real-time tasks. Improving the existing analysis is not
straightforward, however, because the worst-case response
time of a task is not necessarily assumed for the first job
when released at a critical instant. Moreover, the provision
of the capacity of the server may be fragmented, potentially
giving rise to high context switch costs. For illustration pur-
poses, we consider a specific class of subsystems S and an
example subsystem S∈S. The paper includes a brief inves-
tigation of best-case response times and response jitter.
This paper is organized as follows. In Section 2, we
briefly recapitulate existing analytical results for our class
of subsystems S and introduce our example subsystem
S ∈ S. This example clearly illustrated the potential for
improvement. We investigate response times and response
jitter for our example in Section 3. In Section 4, we dis-
cuss the differences found between our new results and the
existing approaches. We conclude the paper in Section 5.
2. A recapitulation of existing analysis
In this section, we briefly recapitulate existing analysis.
We start with a description of a scheduling model and than
present our example subsystem S. Next, we recapitulate
the analysis for a periodic resource model [11], a periodic
server [5], and a deferrable server [2], which we illustrate
by means of S. We conclude this section with an overview.
2.1. A scheduling model
We assume FPPS for global scheduling, and consider
a class of subsystems S consisting of an application with
a single, periodic hard real-time task τ and an associated
server σ at highest priority. The server σ is characterized
by a replenishment period T σ and a capacity Cσ, where
0 < Cσ ≤ T σ. Without loss of generality, we assume that
σ is replenished for the first time at time ϕσ = 0. The task
τ is characterized by a period T τ, a computation time Cτ,
and a relative deadline Dτ, where 0 < Cτ ≤ Dτ ≤ T τ. We
assume that τ is released for the first time at time ϕτ ≥ ϕσ,
i.e. at or after the first replenishment of σ. The worst-case
response time WRτ of the task τ is the longest possible time
from its arrival to its completion. The utilization Uτ of τ is
given by CτT τ and the utilization U
σ of σ by CσT σ . A necessary
schedulability condition for S is given by [4]
Uτ ≤Uσ ≤ 1. (1)
2.2. An example subsystem
For illustration purposes, we use an example subsystem
S ∈ S with characteristics as described in Table 1. Note
that τ is an unbound task [5], because its period T τ is not
an integral multiple of the period T σ of the server. In this
section, we are interested in minimum capacity Cσmin for the
various types of servers, where Cσmin = min{C
σ|WRτ ≤Dτ}.
Given (1), Cσmin ≥Uσ ·T τ = 1.2.
T = D C
σ 3 Cσ
τ 5 2
Table 1. Characteristics of subsystem S.
2.3. Analysis for periodic resource model
Based on [11], we merely postulate the following lemma.
Without further elaboration, we mention that we can postu-
late similar lemmas for the analysis of S based on a de-
ferrable server in [10] and the abstract server model in [6]
(and therefore also on the sporadic and deferrable server).
Lemma 1 Assuming a periodic resource model for S, the
worst-case response time WRτ of task τ is given by
WRτ = Cτ +
(⌈
Cτ
Cσ
⌉
+ 1
)
(T σ−Cσ) . (2)
Given (2), we derive for our example S that the minimum
capacity for a periodic resource model is given by Cσmin = 2.
For this capacity, we find WRτ = 4.
2.4. Analysis for a periodic server
Strictly spoken, our class of subsystems S does not sat-
isfy the model described in [5], because that article assumes
that every set of tasks associated with a server contains at
least one soft real-time task. Fortunately, a periodic server
provides its resources irrespective of demand. As a result,
the soft real-time tasks of a task set do not hamper the ex-
ecution of the hard real-time tasks with which they share
a periodic server. The analysis presented in [5] therefore
equally well applies to S in general and S in particular. For
an unbound task, we derive from [5] that WRτ is given by
WRτ = Cτ +
⌈
Cτ
Cσ
⌉
(T σ −Cσ) . (3)
Without further elaboration, we mention that (3) also holds
for the analysis of S based on a deferrable server in [1] and
on a sporadic server in [1, 10]. Given (3), we derive that
Cσmin = 1.5, giving rise to WRτ = 5.
2.5. Analysis for a deferrable server
The following theorem for S is proven in [4].
Theorem 1 Consider a highest-priority deferrable server
σ with period T σ and capacity Cσ. Furthermore, assume
that the server is associated with a periodic task τ with
period T τ, worst-case computation time Cτ, and deadline
Dτ = T τ, where the first release of τ takes place at or after
the first replenishment of σ. The deadline Dτ is met when
the respective utilizations satisfy the following inequality
Uτ ≤Uσ ≤ 1. (4)
Note that (4) is a necessary and sufficient (i.e. exact)
schedulability condition for both the task and the server.
According to Theorem 1, S is schedulable using a de-
ferrable server for a capacity Cσmin = Uτ · T σ = 1.2. As il-
lustrated in [2], the worst-case response time WRτ of task τ
for Cσ = 1.2 is equal to 4.4.
2.6. Overview
Table 2 gives an overview of the minimum capaci-
ties Cσmin and minimum server utilities Uσmin that guarantee
schedulability of S for existing analytical approaches and
different types of servers. The table includes the worst-case
response time WRτ of τ for these approaches. The analysis
for a sporadic server is the topic of the next section.
3. Analysis for a sporadic server
We will now explore the example in more detail by con-
sidering the worst-case response time, best-case response
0 5 10 time
τ
σ
3.8 4.4 4.4
15 20
4.4 4.4
1.2
start-up lcm(T
σ
,Tτ)
0
Figure 1. Timeline for S with a simultaneous release of task τ and sporadic server σ, including a
graph with the remaining capacity of σ. The numbers at the top right corner of the boxes denote the
response times of the respective releases.
approach (including server or model) Cσmin Uσmin WRτ
periodic resource model [11]
abstract server model [6] 2.0 5/6 4.0
deferrable server [6, 10]
sporadic server [6]
periodic server [5]
deferrable server [1] 1.5 1/2 5.0
sporadic server [1, 10]
deferrable server [2, 4] 1.2 2/5 4.4
sporadic server (this paper)
Table 2. A comparison of approaches for S.
time, and response jitter of task τ of S for a sporadic server
with a capacity Cσ = 1.2.
Figure 1 shows a timeline with the available process-
ing time of sporadic server σ and the executions of task τ
with a first release of τ at ϕτ = 0. From this figure, we
conclude that S is schedulable under a sporadic server for
ϕτ = 0. Moreover, we derive that the worst-case response
time WRτ(ϕτ) and best-case response time BRτ(ϕτ) of τ for
ϕτ = 0 are given by WRτ(0) = 4.4 and BRτ(0) = 3.8, re-
spectively. Because the processing time of a sporadic server
is never lost, both the worst-case response time WRτ(ϕτ)
and best-case response time BRτ(ϕτ) of τ are independent
of the first release ϕτ of the task, hence WRτ(ϕτ) = 4.4
and BRτ(ϕτ) = 3.8. This has the following consequences.
Firstly, both a critical instant [7] and an optimal (or favor-
able) instant [3, 9] occurs for every value of ϕτ. Next, the
response jitter RJτ of task τ is constant, i.e.
RJτ = sup
ϕτ
(WRτ(ϕτ)−BRτ(ϕτ))
= WRτ −BRτ = 4.4−3.8 = 0.6.
When we ignore the initial start-up phase, the response time
of the task τ is constant and equal to 4.4. In such a case, the
response jitter becomes equal to zero.
From WRτ = 4.4, we conclude that S is schedulable un-
der a sporadic server with a capacity of Cσ = 1.2. Moreover,
we conclude that capacity suspension of σ is a prerequisite
for schedulability of S with a capacity of Cσ = 1.2, and S
is therefore not schedulable with a periodic server with that
capacity. Hence, the worst-case response time analysis pre-
sented in [5] can be improved when a sporadic server is
exclusively used for hard real-time tasks.
We conclude this section with three observations. Firstly,
the worst-case response time is not assumed for the first job
of the task τ. Secondly, the execution of any job of τ after
the 1st job in Figure 1 is dependent on the execution of a
previous job of τ. Thirdly, the provision of the capacity
of the server becomes fragmented with a size 0.4, which
is equal to the greatest common divisor of the computation
time Cτ of task τ and the capacity Cσ of the server σ, i.e.
gcd(Cτ,Cσ) = gcd(1.2,2.0) = 0.4.
4. Discussion
In Table 2, we partitioned the various approaches for S
into three main categories based on the minimum capacity
Cσmin of the periodic resource or server and the worst-case
response time WRτ of the task τ. Notably, both the de-
ferrable server and the sporadic server are dealt with in all
three categories. The differences between these results orig-
inate from the differences in the assumptions made for the
three categories. We briefly consider the assumptions for
the three categories and conclude the section with a remark.
4.1. Assumptions of approaches
For the first category, no assumptions are made about
the characteristics of other servers nor about the priority of
other servers. As a result, the specific preservation strategy
of a server cannot be exploited.
For [1, 5] of the second category, the fact that S has
highest priority is taken into account, and without poten-
tial interference of higher priority servers the minimum ca-
pacity of σ can therefore be reduced significantly, i.e. with
25%. These existing approaches do not exploit the preser-
vation strategy of a server to improve the results of a spe-
cific server, however. We observe that because every task
set is assumed to have at least one (unspecified) soft real-
time task in [5], the preservation strategy of a server cannot
be exploited. We consider the sporadic server approach of
[10] in the next section.
Similar to the second category, the third category also
takes the fact that S has highest priority into account. More-
over, the approaches in this category exploit the specific
preservation strategy of both the deferrable server and the
sporadic server. As a result, the minimum capacity of σ can
again be reduced, in this case with an additional 20%.
4.2. Concluding remark
Considering the approaches of [10] in Table 2, the de-
ferrable server falls into the first category and the sporadic
server falls into the second category. This is surprising, be-
cause [10] does not make any assumptions about the char-
acteristics of other servers nor about the priority of other
servers. Hence, one would expect that the results for the
approach for the sporadic server in [10] would also fall into
the first category.
In [5], it is claimed that periodic servers dominate both
deferrable servers and sporadic servers when a task set con-
tains at least one soft real-time task. In [2, 4] and this paper,
we have shown by means of an example that in the absence
of soft real-time tasks that claim no longer holds.
5. Conclusion
In this paper, we considered response times and response
jitter of hard real-time tasks under H-FPPS using sporadic
servers. We showed by means of an example that existing
worst-case response time analysis can be improved when
a sporadic server is used at highest priority and that server
is exclusively used for hard real-time tasks. For our ex-
ample, the utilization of the server can be significantly re-
duced when a sporadic server is used rather than a peri-
odic server or a general periodic resource model is assumed.
Given these initial results, application of a sporadic server at
highest priority can be an attractive alternative for resource-
constrained systems with stringent timing requirements for
a specific application when no appropriate period can be
selected for its associated server. Unfortunately, improv-
ing the existing analysis is not straightforward, because the
worst-case response time of a task is not necessarily as-
sumed for the first job when released at a critical instant due
to a start-up phenomenon. Moreover, the provision of the
capacity of the server may be fragmented, as illustrated by
the example, potentially giving rise to high context switch
costs.
Using the same example, we briefly investigated best-
case response times and response jitter. Unlike existing
best-case response times of tasks under FPPS [3, 9], we did
not assume infinite repetitions towards both ends of the time
axis. As a result, the best-case response time of a task is de-
termined by a start-up phase. When the start-up phase can
be ignored, the best-case response time becomes equal to
the worst-case response time, and the resulting response jit-
ter therefore becomes equal to zero.
Improved response time analysis of H-FPPS using spo-
radic servers is a topic of future work.
References
[1] L. Almeida and P. Peidreiras. Scheduling with temporal par-
titions: response-time analysis and server design. In Proc.
4th ACM International Conference on Embedded Software
(EMSOFT), pp. 95 – 103, September 2004.
[2] R. Bril and P. Cuijpers. Towards exploiting the preservation
strategy of deferrable servers. In Proc. WiP session of the
14th IEEE RTAS, pp. 13–16, April 2008.
[3] R. Bril, E. Steffens, and W. Verhaegh. Best-case response
times and jitter analysis of real-time tasks. Journal of
Scheduling, 7(2):133–147, March 2004.
[4] P. Cuijpers and R. Bril. Towards budgetting in real-time cal-
culus: deferrable servers. In Proc. 5th International Confer-
ence on Formal Modelling and Analysis of Timed Systems
(FORMATS), LNCS-4763, pp. 98 – 113, October 2007.
[5] R. Davis and A. Burns. Hierarchical fixed priority pre-
emptive scheduling. In Proc. 26th IEEE RTSS, pp. 389–398,
December 2005.
[6] G. Lipari and E. Bini. Resource partitioning among real-
time applications. In Proc. 15th ECRTS, pp. 151–158, July
2003.
[7] C. Liu and J. Layland. Scheduling algorithms for multipro-
gramming in a real-time environment. Journal of the ACM,
20(1):46–61, January 1973.
[8] R. Rajkumar, K. Juvva, A. Molano, and S. Oikawa. Re-
source kernels: A resource-centric approach to real-time
and multimedia systems. In Proc. SPIE, Vol. 3310, Confer-
ence on Multimedia Computing and Networking (CMCN),
pp. 150–164, January 1998.
[9] O. Redell and M. Sanfridson. Exact best-case response time
analysis of fixed priority scheduled tasks. In Proc. 14th
ECRTS, pp. 165–172, June 2002.
[10] S. Saewong, R. Rajkumar, J. Lehoczky, and M. Klein. Anal-
ysis of hierarchical fixed-priority scheduling. In Proc. 14th
ECRTS, pp. 152–160, June 2002.
[11] I. Shin and I. Lee. Periodic resource model for composi-
tional real-time guarantees. In Proc. 24th IEEE RTSS, pp.
2–13, December 2003.
[12] B. Sprunt, L. Sha, and J. Lehoczky. Aperiodic task schedul-
ing for hard real-time systems. Real-Time Systems, 1(1):27–
60, June 1989.
[13] J. Strosnider, J. Lehoczky, and L. Sha. The deferrable server
algorithm for enhanced aperiodic responsiveness in hard
real-time environments. IEEE Transactions on Computers,
44(1):73–91, January 1995.
Utilizing Solid State Disk for Aggressive Processor 
Power Saving Techniques in Real-Time 
Applications 
 
Sung Woo Chung, Jong Sung Lee, and Do Yeun Kim 
Division of Computer and Communication Engineering 
Korea University 
Seoul 136-713, KOREA. 
swchung@korea.ac.kr 
 
 
 
Abstract-The SSD (Solid State Disk) was originally invented 
for high performance, low power, low heating, and low noise. 
Thus, it is expected to replace the HD (Hard Disk) in the near 
future. In this paper, we found that the SSD is also beneficial 
for reducing processor power consumption in case of real-time 
applications such as playing moving pictures. Since the access 
time to the SSD and its variation are much shorter compared 
to the HD, the processor scales down voltage and frequency 
more aggressively without sacrificing quality of service, 
resulting in much less processor power consumption. 
I. INTRODUCTION 
The SSD (Solid State Disk) based on NAND flash 
memories outperforms the HD (Hard Disk) by more than 
150% in case of read operations [1]. While executing 
several programs simultaneously, random access is 
inevitable. However, it takes substantial time for the HD to 
deal with random accesses, since the HD needs substantial 
seek time to move the head to the right place. On the other 
hand, the SSD that is beneficial for random accesses 
performs much better than the HD. In addition to read 
latency, power, heating, weight, and noise of the SSD are 
significantly less compared to the HD. For this reason, the 
SSD comes into the spotlight as a next generation storage 
device. As the price of NAND flash memories gets lower in 
the near future, the HD is expected to be replaced by the 
SSD [2].  
The mobile devices, where a cooling fan is hardly adopted 
due to noise and low-power is crucial to increase the battery 
lifetime, must adopt the SSD that consumes less power. 
Additionally, the power consumption of other components 
as well as the storage of the mobile device should be 
minimized. Among the components, it is crucial to reduce 
the power consumption of the processor that accounts for 
substantial portion of total system power consumption. High 
power density leads to excessive heating because of severe 
reliability problem of the processor [6]. Thus, it is important 
to manage the power consumption of the processor: DVFS 
(Dynamic Voltage Frequency Scaling) is one of the most 
commonly used power management techniques [7]. Since 
power is proportional to frequency and voltage squared, 
reducing frequency and voltage lowers the power 
consumption of the processor. The more time the processor 
can utilize, the more processor power is reduced. Shorter 
storage access time can give more time to the processor 
without deteriorating quality of service. 
There have been many studies on the real-time storage 
systems such as [3][4][5]. However, the focus of this paper 
is not a real-time storage system but a real-time processor 
supported by the fast storage (SSD). In real-time 
applications such as mpeg playing, storage access time 
should be predictable as well as short to use aggressive 
DVFS in the processor [7]. Note frequently changing 
voltage and frequency up and down causes power overhead, 
which is inefficient in terms of power consumption. 
Unfortunately, the access time to the HD significantly varies 
depending on the distance between the head and the 
requested data. Even worse, it is almost impossible to 
predict the variations of the access time. In case of low 
quality moving pictures, shorter storage access time seems 
not to be crucial, since buffering that stores the requested 
data from the HD in advance can hide the storage access 
time (Note aggressive buffering is sometimes risky, since 
the buffered data may not be used which turns out to be 
unnecessary power consumption). However, when a user 
would like to fast play by 2 or 3 times, there is not enough 
time for buffering. In this case, adopting DVFS deteriorates 
quality of service. In the future, as higher quality moving 
pictures gets popular, storage that is improved slowly will 
be a bottleneck since higher performance processors such as 
multi-core processors can keep up with higher quality 
moving pictures. Thus, there is no time for buffering in the 
storage.   
Until now, the benefit of the SSD is known as better read 
performance, low power, low heating, and low noise. In this 
paper, we show using the SSD is also beneficial to processor 
power reduction, since the reduced and predictable access 
time of the SSD enables the processor to scale down 
frequency and voltage more aggressively without hurting 
the quality of service. 
 
II. MOTIVATION 
The SSD has two advantages over the HD in the 
perspective of real-time features: 1) the read access time is 
shorter and 2) the time variation among the read accesses is 
less. To measure the access time and its variations, we 
captured the DMA (Direct Memory Access) read time from 
the storage. We modified Linux 2.6.18 on Intel Core 2 CPU 
6400 (2.13 GHz) with 2GB DRAM. For the HD, we used a 
Western Digital 320GB hard disk where throughput is 
63.1MB/sec. For the SSD, we used Mtron 16GB where 
throughput is 92.6MB/sec.  
To examine the DMA read time, we read a 500MB file 
from the storage. During the read operation, we randomly 
investigate fifty DMA read times. The sector size (DMA 
block size) of the HD and the SSD is 200B and 256B, 
respectively, which were set by the manufacturers. Even 
after considering smaller sector size of the SSD, the DMA 
time of the SSD is much shorter than that of the HD. As 
shown in Figure 1, the DMA time in case of the HD is 1800 
µs, at least. On the other hand, the DMA time in case of the 
SSD is less than 1000 µs, as shown in Figure 2. In addition, 
the variations of the DMA time of the HD are much larger. 
Even after excluding one spike (5000 µs) in Figure 1, the 
DMA time is still around 1870 µs or 2790 µs. The 
difference is as much as about 920 µs. In case of the SSD, 
the variations are only 20 µs at the worst case. From this 
experiment, we found the SSD gives the processor more 
time which enables more aggressive power saving 
techniques. 
Figure 3 shows the execution time not adopting any 
power saving technique. Note the access time variation as 
well as the access time of the HD is longer compared to the 
SSD. Since the variation is not predictable at run time, the 
maximum variation should be considered not to hurt the 
quality of service. Thus, when DVFS is adopted as shown in 
Figure 4, the SSD provides the processor much more time.  
 
P∝ fV2  (1) 
(P: Power, f: Frequency, V: Voltage) 
 
According to the Formula (1), power is proportional to 
frequency and voltage squared. Generally, frequency is 
known to be proportional to voltage. Thus, power is 
approximately proportional to frequency cubed. Considering 
the experimental results from Figure 1 and Figure 2, the 
processor with the SSD has more time to handle one block. 
Hence, the processor can scale down its frequency and 
voltage, which drastically reduces processor power 
consumption. 
 
III. EVALUATIONS 
At first, we played high quality moving pictures 
(HDVDD: High-Definition Digital Versatile Disc, 
Resolution: 1280*528) in order to show that there is not 
enough time for buffering even in a high-performance 
notebook (Detailed specifications are described in Section 
II). If the hard disk accesses were not timing-critical (in 
other words, if the buffering gave the processor enough 
time), we could adopt aggressive power saving techniques 
without the faster storage. However, P2P file sharing incurs 
more accesses to the storage. In our experiment, we played 
moving pictures while copying the other file from the CD-
ROM to the storage in order to mimic the P2P 
communication.  
In case of the HD, the quality of the moving picture is bad 
enough to be noticed by users and the processor utilization 
0
1000
2000
3000
4000
5000
6000
1 6 11 16 21 26 31 36 41 46
el
ap
se
 
tim
e 
(m
ic
ro
se
c)
DMA time
 
Figure 1. DMA time for hard disk 
 
960
970
980
990
1000
1010
1 6 11 16 21 26 31 36 41 46
el
ap
se
 
tim
e 
(m
ic
ro
se
c)
DMA time
 
Figure 2. DMA time for solid state disk 
 
SSD
(a) HD (Hard Disk)
(b) SSD (Solid State Disk)
deadline
HD CPU
time
time
CPU
Variation of storage access time
 
Figure 3. Execution time without power saving 
technique
SSD
(a) HD (Hard Disk)
(b) SSD (Solid State Disk)
deadline
HD CPU
time
time
CPU
Variation of storage access time
 
 
Figure 4. Execution time with a power saving 
technique 
00.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0% 10
%
20
%
30
%
40
%
50
%
60
%
70
%
80
%
90
%
10
0%
11
0%
12
0%
13
0%
14
0%
15
0%
16
0%
17
0%
18
0%
19
0%
20
0%
(Reduced storage access time)/(CPU time)
No
rm
a
liz
e
d 
pr
o
c
es
s
o
r 
po
w
e
r 
co
n
s
u
m
pt
io
n
is 30~45%. Low processor utilization implies that the 
processor awaited the data from the HD for a long time. 
When the SSD is used instead of the HD, we can not notice 
any pause caused. In this case, the processor utilization is 
80~100%, which means that the processor is busy 
processing the moving picture. From this experiment, we 
found that the HD is the performance bottleneck so that 
there is not enough time for buffering. Moreover, it is 
manifest the processor speed will continuously grow. Thus, 
there will be more time for the processor to use in the future, 
though the processor currently does not have enough time 
for DVFS.  
To confirm the access time variation, we compulsorily 
decreased the frequency from 2 GHz to 800 MHz. As 
expected, the HD suffers lots of pauses but the SSD does not 
suffer any pause, as far as we feel. However, the moving 
picture was played slower in both cases.  
In this paper, we present analytical results on power 
consumption. Since power is approximately proportional to 
frequency cubed, more time for the processor dramatically 
reduces power consumption. Assuming that (the reduced  
storage access time in case of the SSD) divided by (the CPU 
time in case of the HD) is n, the processor power can be 
reduced to 1/(n+1)3 of the processor power consumption 
with the HD, shown in Figure 5. For example, when the 
reduced storage access time divided by the CPU time is 10%, 
we can reduce the processor frequency to 1/(1+0.1) of the 
original frequency, which results in 75.1% (=1/(1+0.1)3) of 
the processor power consumption with the HD. When the 
ratio is 100%, the processor power consumption is reduced 
to 29.7% (=1/(1+0.9)3). We confess that we have not 
measured the real ratio of the reduced storage access time 
divided by the CPU time yet.  
 
IV. CONCLUSION 
As the price of the flash memories continuously goes 
down, the SSD (Solid State Disk) that has superior read 
performance will rapidly replace the HD (Hard Disk). In 
addition to the performance, the SSD has other advantages: 
low power, low heating, and low noise. These features make 
the SSD more suitable for mobile devices, compared to the 
HD. 
However, there has not been any study on the effects of 
the SSD on the power reduction of the other computer 
components. In this paper, we present another benefit of the 
SSD: reducing processor power consumption. Since the HD 
has long access time and severe access time variation, the 
processor does not have enough time for efficient power 
saving techniques. On the other hand, the SSD has shorter 
access time and its variation is small (around 20 µs), which 
enables the processor to aggressively scale down frequency 
and voltage. Consequently, the system with the SSD can 
reduce the processor power consumption. In this paper, we 
present experimental results to show the potentials of 
processor power reduction, though we have not presented 
experimental power result, which is our future work. 
ACKNOWLEDGEMENT 
This work was supported by the Second Brain Korea 21 
Project and Korea Science and Engineering  Foundation 
(KOSEF) grant funded by the Korea government(MOST) 
(No. R01-2007-000-20750-0). 
 
REFERENCES 
[1] SAMSUNG NAND Flash-based Solid State Drive. Data sheet. 
Samsung Electronics, July 2007. 
[2] J. Janukowicz, and D. Reinsel. Evaluating the SSD Total Cost of 
Ownership. White paper. International Data Corporation(IDC), 
Oct. 2007. 
[3] Z. Dimitrijevic and R. Rangaswami. Quality of service support for 
real-time storage systems. Proceedings of IPSI conference, 2003. 
[4] K. Kim, J. Hwang, S. Lim, J. Cho, and K. Park. A realtime disk 
scheduler for multimedia integrated server considering the disk 
internal scheduler. Proceedings of the International Parallel and 
Distributed Processing Symposium, April 2003. 
[5] R. Abbott and H. Garcia-Molina. Scheduling I/O requests with 
deadlines: A performance evaluation. Proceedings of the IEEE 
Real-Time Systems Symposium (RTSS ’90),  December 1990. 
[6] V. Narayanan and Y. Xie. Reliability Concerns in Embedded 
System Designs. IEEE Computer, vol. 39 no. 1, pp.118-120, Jan. 
2006.  
[7] Kihwan Choi, Karthik Dantu, Wei-Chung Cheng, and Massoud 
Pedram. Frame-based dynamic voltage and frequency scaling for a 
MPEG decoder. Proceedings of the 2002 IEEE/ACM international 
conference on Computer-aided design, 2002. 
 
Figure 5. Normalized processor power consumption 
 
Dynamic Configuration of Web Server Clusters with QoS Control ∗
Luciano Bertini, Julius C.B. Leite
{lbertini, julius}@ic.uff.br
Daniel Mosse´
mosse@cs.pitt.edu
Abstract
To make data centers ideally sized, turning off servers
and tuning speeds for each server are essential for energy-
efficient service. We model the problem of selecting the
servers that will be on and what are their speeds as mixed
integer programming; we also show how to combine such
solutions with control theory. For proof of concept, we im-
plemented this dynamic configuration scheme in a real web
server cluster with soft real-time requirements and QoS con-
trol, that guarantees both energy-efficiency and good user
experience. In this cluster, we show the good performance
of our scheme, a comparison of centralized and a distributed
approach for QoS control, and a comparison of two schemes
for choosing speeds of servers.
1. Introduction
Energy consumption is a real concern in these times of
global warming and related environmental threats. In many
parts of the world, initiatives for deploying green data cen-
ters have already appeared. As a real example, in Germany, a
new green data center, to be finished in 2008, will save 25%
of energy, that will correspond to 16, 000MWh energy per
year, and will put annually in the atmosphere up to 11, 000
tons less carbon dioxide than conventional data centers of
the same size [1].
The energy cost of a web server cluster can be reduced
by sizing the system appropriately. As an example, in a 5-
machine cluster with a very low workload, close to 77% of
power reduction is possible if the correct configuration of
only one server is used. In this paper we study how to resize
a web server cluster and set the speeds of the servers that
will be turned on. The goal is to enable the server cluster
to configure itself according to the load, efficiently, while
providing the same user experience of a high performance
oversized server. To achieve this goal we consider the cluster
as a soft real-time system where the QoS requirement must
∗This research is being partially supported by the Brazilian Government,
through Capes – PVE, CNPq, the State of Rio de Janeiro Research Foun-
dation (FAPERJ), and also by the US federal research agency NSF, under
grants ANI 03-25353 and CNS-0524634.
be met statistically. That is, we do not guarantee worst case
execution times, but average execution times.
We combine two technologies: QoS control by means
of feedback control theory, and operations research. The
first controls the fraction of deadlines met (QoS), by mea-
suring how late requests finish execution (tardiness) to al-
low a fine-grain control of the QoS, and operations research
is used to achieve the optimal dynamic configuration of the
web cluster, that is, which nodes are on and off. We model
the problem of assigning speeds to servers, including zero
speed (server off), as a mixed integer programming (MIP)
problem, and solve it using traditional linear programming
techniques.
2. Optimization Problems
Our web server model is a cluster of N servers capable of
Dynamic Voltage Scaling (DVS) and a front-end. The front-
end is a normal web server acting as a reverse proxy and
serving as a gateway to the actual web servers that process
the requests. We want to solve the problem of finding the
best configuration for the cluster of N nodes. We will com-
pare two possible models for the problem, the traditional
DVS scheme and the switched DVS scheme. The former al-
lows only one speed and thus the solution will choose the
discrete frequency immediately higher than the exact the-
oretical frequency needed for a given workload (if speeds
were continuous). It is easier to implement and incur in less
overhead. In the latter, the switched DVS, which is based
on [6], the solution is allowed to switch between the two dis-
crete values adjacent to the exact theoretical frequency. The
advantage is that it is convenient for building a feedback con-
trol with the CPU frequencies being the actuator, because it
simulates a CPU with continuous frequencies (the output of
the controller can be immediately fed to the DVS module).
2.1. Switched DVS
A server i with Si frequencies has Si − 1 intervals be-
tween frequencies. For any interval s, we have two endpoint
frequenciesF si and F s+1i . We need to find the interval s that
combining linearly the frequencies F si and F s+1i will result
in the optimal frequency fi for each server i, allowing this
combination be zero, what will represent the server turned
off. Let us denote the power when busy at frequencies F si
and F s+1i as P
i,s
busy and P
i,s+1
busy , respectively. Similarly, for
the power when idle: P i,sidle and P
i,s+1
idle . In the same way, Hsi
and Hs+1i are the maximum load obtained in each server for
each frequency endpoint. The amount of requests per sec-
ond the cluster has to process is represented by Hbase. The
problem is modeled as follows:
Minimize:
N∑
i=1
Si−1∑
s=1
{
αsiP
i,s
busy + β
s
i P
i,s+1
busy
}
(1)
subject to:
N∑
i=1
Si−1∑
s=1
{
αsiH
s
i + β
s
iH
s+1
i
}
≥ Hbase (2)
αsi + β
s
i − y
s
i = 0, ∀i ∈ {1 . . .N} , ∀s ∈ {1 . . . Si − 1}
(3)
Si−1∑
s=1
ysi ≤ 1, ∀i ∈ {1 . . .N} (4)
ysi ∈ {0, 1} , ∀i ∈ {1 . . .N} , ∀s ∈ {1 . . . Si − 1} (5)
This is a piece-wise optimization problem, because the
objective function is a sum of several discontinuous line seg-
ments. The main variables, αsi and βsi , mean how much
of the frequencies end points F si and F s+1i , in a given in-
terval s at the server i, we will combine to obtain the de-
sired frequency fi for that server. This combination model
for piece-wise problems first appeared in [5]. After solv-
ing the problem and obtaining αsi and βsi , the necessary
frequency value fi for each server i is given by: fi =∑Si−1
s=1 α
s
iF
s
i + β
s
i F
s+1
i .
With this convenient modeling, we can solve the problem
using traditional linear programming techniques, and the so-
lution also includes the reconfiguration of servers (on/off).
The problem defined above is a MIP problem, because y is
an integer variable. This variable allows the identification
of the elected frequency interval, and allows the solver to
search for any combination, including turning a server off,
what is represented by y = 0. When ysi = 0, the frequency
for that segment and for a given server is set to zero. Re-
striction (4) ensure that at most one y is 1 for each server i.
If ysi = 0∀s, then server i is turned off.
2.2. Traditional DVS
For brevity, we will not show here the problem formula-
tion with discrete frequencies, we will only show the results.
The interested reader can look at the technical report [4].
Basically, the difference in the formulation is that we do not
combine two frequencies, but we combine idle time with
busy time, so that the variables are the utilizations of ev-
ery server. Moreover, we have to model the load distribu-
tion mechanism. With an extra restriction, we enforce the
utilization in each server to be the same, because the load
distribution used in the implemented web cluster distributes
requests by sending a fraction of the total work proportional
to the performance of each server, measured in terms of their
current frequency settings. This is guaranteed by our mod-
ification in the Apache load balancer module that makes it
dynamic. We dynamically assign weights proportional to
the measured performance that is periodically reported back
to the frontend, by each server.
3. Experimental Evaluation
We compare our implementation with a baseline imple-
mentation from [7], and show some comparative results of
two approaches for QoS control. In all experiments, for the
sake of comparison, the deadline and average execution time
of a request are the same as in [7], 200ms and 24.5ms respec-
tively. The workload is a ramp of dynamic requests, starting
from zero load until it reaches the full load of the system.
3.1. Baseline Comparison
The work presented in [7] uses a real-time utilization to
define the DVS policy, and on/off is done based on a prede-
fined sequence of machines. That work compared with, and
improved on, the work published in [8]. They used a real-
time utilization U =
∑
i
Ci
Di
computed based on the dead-
line Di, and a recent utilization Urecent, which is given by
the number of requests times the average execution time Ci,
divided by a recent time period. Then the frequency used
is max
(
U, Urecent
0.8
)
× fmax [7]. We were able to determine
that the DVS overhead is not due to changing the frequency,
but to the scheduling of the DVS task. If a period of 10ms
is used (like in the original work), it results in 300 context
switches per second (cs/s), and 70W power when idle. And
if the period is increased to 50ms, we get 100cs/s and 68W.
Here we focus only on showing comparison results of our
method with [7]; the details of the effect on the period can
be seen in [4].
Figure 1.a shows a comparison of no power manage-
ment (i.e., a regular cluster of machines, without DVS and
no on/off, would have this power consumption), only on/off
(VOVO), and on/off and DVS with the QoS control (our
approach). For 5 machines, and very low load, the power
decreases from 390 to 90 Watts, reducing up to 77% the con-
sumption. Comparing with using of only VOVO, the com-
bination of the globally optimized DVS and VOVO reduces
power in about 50W, what can be seen after every configu-
ration change in Figure 1.a.
Figure 1.b compares our method with the method of
on/off and DVS presented in [7], where the machines are
turned on in order, from the more power efficient to the less
power efficient, and the DVS is done locally at each server,
using the real-time utilization method mentioned, and thus
without global optimization. Furthermore, this DVS scheme
is not able to control the QoS in a fine grain manner, and
most of the time results in 100% QoS (see Figure 1.c).
The two up-and-down steps of power around 50 req/s in
Figures 1.a and 1.b are due to the change of configuration
where one server has to be turned off for another server to
be turned on in our scheme, as determined by the solution to
the MIP optimization. It is worth doing this switch depend-
ing on the tendency analysis of the workload. The peak re-
duction in power that can be observed in Figure 1.b is about
15%. This represents a good improvement, considering that
the baseline is already a power managed system.
Figure 1.c shows the QoS and tardiness for both schemes.
The reference line at 95% of QoS is the target QoS for both
cases. It is interesting to note that when a machine is turned
on (e.g., at time t = 1000 in this figure), the two cases have
opposite QoS behavior. Our QoS controller goes to 100%
QoS because the controller output is too high for the new
configuration. This is a transient effect that disappears as
soon as the controller finds a new control output to satisfy
the QoS in the new configuration. In the baseline case the
QoS decreases, because the QoS awareness of the method is
based on turning a node on just before the old configuration
cannot handle a QoS of 95%, accounting for the prediction
on the load given by the max load increase parameter. In
this experiment, the scheme Rusu 2006 shows a tardiness
curve that stays usually below the tardiness in our method
because it runs more overprovisioned most of the time.
3.2. Discrete Versus Continuous Frequency
To see the advantages of the two DVS assignment poli-
cies described by the MIP problems in Sections 2.1 and 2.2,
we generated a ramp workload for both cases. The first ad-
vantage of the continuous case over the discrete case is that
it is more appropriate for a feedback controller that relies on
the continuity of the actuator to compute the output. The
problem of not having a continuous actuator in the discrete
case appears as instabilities. This happens because as the
frequencies are discrete, when the tardiness (see also Fig-
ure 2.a) reaches a value that makes the output increase, the
frequency will in some cases increase one step higher than
the needed frequency, making the system act too fast. The
tardiness value then drops and this will be the beginning of
an instability cycle. The top part of Figure 2.a shows also
how the aggregate frequency vary in both cases. The fre-
quency indicated is the sum of all frequencies of the servers,
showing the capacity of the cluster in cycles per second.
4. QoS Control Evaluation
The QoS controller from [2] is a PIDF controller with
a parametrization that eases the tuning process. The PIDF
controller has a good response for a stochastic system, due
to the filter component added in the derivative part of a PID
controller. The QoS control is done indirectly by control-
ling the tardiness of completion of web requests, defined as
the ratio of response time to the deadline. The rationale of
measuring and controlling QoS by measuring tardiness is
described in [3]. A statistical inference relates the average
tardiness to the desired QoS value, defined as a fraction of
deadlines met. The controller output will be a normalized
performance factor that will be used as an input to the MIP
solver, to define the Hbase load demand.
We implemented two approaches for the QoS control: a
centralized SISO (Single Input Single Output) and a dis-
tributed SIMO (Single Input Multiple Output). The SISO
controller takes a single measure of tardiness and computes
one single output that is used as an input to the MIP op-
timization process. The disadvantage is that there is an
added complexity of precomputing offline tables, for opti-
mizing the controller output. Instead of using the SISO con-
troller, we are going to compare it to a distributed SIMO
control architecture that wins in simplicity at a cost of op-
timality. With N independent controllers, the same QoS is
achieved, and the implementation is simpler because there
is no need to run the MIP optimization for the controller,
only for defining the points to turn servers on/off. When the
system stabilizes, however, it will operate in a suboptimal
point (i.e., higher power consumption). In our experiments,
both configurations achieved a QoS within the specification,
with higher fluctuations in the SISO model. Besides being
more power efficient, the centralized scheme with optimiza-
tion achieved a slightly better QoS. This can be seen in the
right y-axis of Figure 2.b and Figure 2.c.
Plots in Figure 2.b and Figure 2.c also show the utilization
and frequency resultant of the two cases, SISO and SIMO.
These plots show that the variations in frequency and uti-
lization, such as the one that happens at time t = 1000s, are
smaller in the SIMO than in the SISO scheme. This is due to
the initialization of the controller. In the SIMO case, when
a new server is turned on, its controller output is initialized
to zero, and then increases until control is achieved. In the
SISO case, when a new server is turned on, (because there
is a single output) the initial value is the value with which
the controller was operating before the new server comes in.
It results in an overprovision of the system. The QoS goes
to 100% until the system stabilizes again. The latter case is
more conservative, and the former case can result in a un-
derprovisioned system, as happened just after t = 500s in
Figure 2.b, where a short QoS drop to 80% can be seen. The
more conservative case is preferred, and the SIMO case can
be easily modified to copy this behavior.
5. Conclusion
In this paper we present an optimal solution to dynamic
servers cluster configuration by solving the problem of find-
ing speeds and combination of servers to be turned on/off us-
ing linear programming. By comparing two approaches, one
using discrete frequencies, and another using pseudo con-
 0
 100
 200
 300
 400
 500
 600
 0  50  100  150  200  250  300  350  400  450
Po
w
er
 (W
)
Load (req/s)
No power management
Only VOVO
VOVO and DVS
A
B
C
B
C
A
 0
 100
 200
 300
 400
 500
 600
 0  50  100  150  200  250  300  350  400
Po
w
er
 (W
)
Load (req/s)
Rusu 2006
QoS control
A
B
B
A
 0.96
 0.98
 1
 0  500  1000  1500  2000  2500  3000
Qo
S
Time (s)
Rusu 2006
QoS control
 0
 0.1
 0.2
 0.3
 0.4
 0.5
 0.6
 0  500  1000  1500  2000  2500  3000
Ta
rd
in
es
s
Time (s)
QoS control
Rusu 2006
(a) (b) (c)
Figure 1. Comparisons. (a) No power management, only on-off, and on-off and DVS. (b) and (c)
Comparing our method with [7] for power, QoS, and tardiness
 0.5
 1
 1.5
 2
 0  200  400  600  800  1000  1200  1400  1600T
ot
al
 F
re
q.
 (1
07
 
H
z)
Time (s)
Continuous
Discrete
 0
 0.2
 0.4
 0.6
 0.8
 1
 1.2
 0  200  400  600  800  1000  1200  1400  1600
Ta
rd
in
es
s
Time (s)
Continuous
Discrete
 0
 0.5
 1
 1.5
 2
 0  500  1000  1500  2000  2500  3000  3500
 80
 85
 90
 95
 100
To
ta
l F
re
q.
 (1
07
 
H
z)
Qo
S
Time (s)
QoS
95%
Frequency
 0
 20
 40
 60
 80
 100
 0  500  1000  1500  2000  2500  3000  3500
Ut
iliz
at
io
n
Time (s)
 0
 0.5
 1
 1.5
 2
 0  500  1000  1500  2000  2500  3000  3500
 80
 85
 90
 95
 100
To
ta
l F
re
q.
 (1
07
 
H
z)
Qo
S
Time (s)
QoS
95
Frequency
 0
 20
 40
 60
 80
 100
 0  500  1000  1500  2000  2500  3000  3500
Ut
iliz
at
io
n
Time (s)
(a) (b) (c)
Figure 2. (a) Comparison of the continuous and discrete frequencies optimizations. (b) and (c) Uti-
lization and frequency x QoS for the SIMO and SISO controllers, respectively
tinuous frequencies, we concluded that the continuous ap-
proach is better, without incurring in additional overhead. It
allows for the use of a continuous actuator mechanism nec-
essary for using feedback control theory. On the other hand,
the discrete frequency method showed to be improper to be
used as an actuator mechanism, because it does not provide
proportionality between the control output and actuation in
the system, generating instabilities from the control perspec-
tive.
When comparing the distributed control against the cen-
tralized with optimization, we achieved best QoS with lower
power consumption in the latter case. The advantage of the
distributed approach is the simplicity of the implementation
and better scalability, because less tables are needed. How-
ever, scalability will not be an issue, because for up to 30
nodes the optimization can be done online, and beyond this
cluster size, other system components become a bottleneck,
such as the front-end usage and network bandwith.
References
[1] Citi data center wins environmental award, December 2007.
www.citigroup.com/citigroup/press/2007/071207a.htm.
[2] L. Bertini, J. C. B. Leite, and D. Mosse´. SISO PIDF con-
troller in an energy-efficient multi-tier web server cluster for
e-commerce. In 2nd IEEE Intl. Workshop on Feedback Con-
trol Impl. and Design in Computing Systems and Networks,
pages 33–38, Munich, Germany, May 2007.
[3] L. Bertini, J. C. B. Leite, and D. Mosse´. Statistical QoS guar-
antee and energy-efficiency in web server clusters. In 19th Eu-
romicro Conference on Real-Time Systems, pages 83–92, Pisa,
Italy, July 2007.
[4] L. Bertini, J. C. B. Leite, and D. Mosse´. Optimal dynamic
configuration in web server clusters. Technical Report RT-
1/08, Instituto de Computac¸a˜o – UFF, January 2008.
[5] G. B. Dantzig. On the significance of solving linear program-
ming problems with some integer variables. Econometrica,
28(1):30–44, January 1960.
[6] T. Ishihara and H. Yasuura. Voltage scheduling problem for
dynamically variable voltage processors. In Intl. Symp. on Low
power electronics and design, pages 197–202, Monterey, Cal-
ifornia, United States, 1998.
[7] C. Rusu, A. Ferreira, C. Scordino, A. Watson, R. Melhem,
and D. Mosse´. Energy-efficient real-time heterogeneous server
clusters. In IEEE Real-Time and Embedded Technology and
Applications Symp., pages 418–428, San Jose, CA, USA, April
2006.
[8] V. Sharma, A. Thomas, T. F. Abdelzaher, K. Skadron, and
Z. Lu. Power-aware QoS management in web servers. In 24th
IEEE Real-Time Systems Symposium, pages 63–72, December
2003.
 A Service-Oriented Programming Model for Real Time WSANS  
 
 
E. Cañete, M. Díaz, L. Llopis, B. Rubio 
{ecc,mdr,luisll,tolo@lcc.uma.es} 
Depart. Languages and Computing Science 
University of Málaga, Spain 
 
 
 
Abstract – The increasing complexity of the wireless 
sensor and actor network (WSAN) applications together 
with the technological evolution of this kind of system has 
allowed higher level programming paradigms to be 
proposed for these networks.  In this paper, a service based 
programming model for real time WSANs is proposed.  It 
allows us to specify the services and their interaction in 
order to build applications. The model syntax is platform 
independent which will facilitate its use by automatic tools 
in the implementation phase. The real time issue is 
important in WSANs so the model allows us to specify real 
time requirements as service priorities, periods of service 
execution or deadlines.   
 
I. INTRODUCTION 
*The combination of recent technological advances in 
electronics, nanotechnology, wireless communications, 
computing, networking, and robotics has enabled the 
development of Wireless Sensor Networks (WSNs), a new 
form of distributed computing where sensors (tiny, low-cost 
and low-power nodes, colloquially referred to as ''motes'') 
deployed in the environment communicate wirelessly to gather 
and report information about physical phenomena [1]. WSNs 
have been successfully used for a variety of applications, such 
as environmental monitoring, object and event detection, 
military surveillance and precision agriculture [2][3]. 
 
One variation of WSNs that is attracting growing interest 
among researchers and practitioners is the so called  Wireless 
Sensor and Actor Networks (WSANs) [4]. In this case, the 
devices deployed in the environment act not only as sensors 
able to sense environmental data, but also actors able to react 
and to affect the environment. For example, in the case of a 
fire, sensors relay the exact origin and intensity of the blaze to 
water sprinkler actors so that it can be extinguished before it 
becomes uncontrollable. Actors are resource rich nodes 
equipped with better processing capabilities, higher 
transmission power and longer battery life than sensors. 
Moreover, the number of sensor nodes deployed in a target 
 
 
* This work is supported by the EU funded project FP6 IST-5-033563 and the 
Spanish project TIC-03085. 
 
area may be in the order of hundreds or thousands whereas 
such a dense deployment is usually not necessary for actor 
nodes due to their higher capabilities. In some applications, 
integrated sensor/actor nodes, especially robots, may replace 
actor nodes. 
 
The progress taking place in the software technology for 
sensors including Java [5] or .NET platforms [6] allows us to 
propose high level programming models to develop 
applications. In this paper we propose a service-oriented 
programming model to facilitate the development of 
applications. It provides a higher level of abstraction, 
interoperability and reusability [7], that is, users can specify 
services without knowing which WSAN execute them and 
services of different provider could operate. The programming 
model is platform independent so automatic tools will be able 
to generate code for the different platforms. Additionally, the 
real time requirements of the system such as priorities, periods 
or deadlines can be specified. The model defines the 
mote/actor template (node) which will publish the set of 
services accessible in the network. The access points to the 
node services are the ports which define the commands and 
events that other nodes may require. Commands can be 
synchronous or asynchronous and events are always 
asynchronous. The service template will include both provided 
and required ports necessary to carry out the service execution. 
Additionally, the model defines the group concept to build 
node subsystems with common characteristics.  
 
Recently, some work based on the service oriented 
paradigm has been carried out. In [7] challenges on service 
oriented sensor-actuator networks are presented. Oasis [8] is a 
programming framework that provides abstractions for service 
oriented sensor networks but it does not deal with real time 
issues. In [9] the service oriented architecture is applied to real 
world scenarios such as WSAN home security systems but it 
does not take into account timing requirements that may be 
useful for these systems. Atlas [10] is a service oriented sensor 
and actor platform that enables programmable pervasive 
services. 
 
This paper is organized as follows: In section II the service 
oriented programming model is presented. In Section III a 
brief example is presented and Section IV presents some 
conclusions and future work. 
 II. THE SERVICE ORIENTED MODEL 
In this section the main characteristics are detailed. A syntax 
has been defined to specify applications meeting the model 
characteristics. It allows developers to specify applications in a 
platform independent way and then this specification to be 
mapped to the desired platform. Figure 1 shows the 
development methodology; the application requirements 
(functional and non-functional) together with the service 
model semantics build a service based application. It includes 
the application behavior and an underlying layer (scheduler) to 
meet the model semantics. For example, focusing on real time 
issues, the scheduler must ensure that the most critical event or 
command will be executed first taking into account the 
priorities assigned in the model.  Additionally, it will 
implement the periodical actions defined. Finally, automatic 
tools will map the service based application specification to 
the target platform building a middleware between the 
application and the operating system (TinyOs, Java Virtual 
Machine or .NET).  
Application
Service Model
Service based
Application
Application
Scheduler
TinyOS
Middleware
Applications
N
es
C
Java Virtual Machine
Middleware
Applications
Ja
va
.NET Micro Framework
Middleware
Applications
.N
E
T
 
Fig. 1 – The development methodology 
Fig. 2 shows a graphical example of a service-composition 
scheme. Optional ports are drawn in a dark color. Moreover, 
the ports and the service implementation are connected by 
means of dash-dotted lines. As we can see in the figure, the 
service provided by node A requires the ports offered by B and 
C and the port offered by D is optional. The service 
implementation will depend on the node characteristics and the 
supported technology. 
A. Node Definition 
As commented previously, a node can be an actor or a 
sensor. In order to define a node we use a template (in the 
mote/actor being modeled) to specify which services want to 
be published and which group will be made. 
 
 
The code above shows the definition of a mote called 
MoteType1 with two parameters: an identifier (Id) to be able to 
distinguish the motes/actors within a network and a location 
(Location) to know where the mote/actor is placed. Finally, 
two services (Service1 and Service2) are published in two 
different groups (GroupA and GroupB).  
 To create a mote/actor based on this template, we will use 
the Create primitive: 
 
 
 This code will create a mote whose identity is 1 and it will 
be placed in room A.  
Node A
Node C Node B
Node D
Node
Service
Service
Implementation
Port
 
Fig. 2 – Service-composition scheme 
B. Group Definition 
The group concept adds a new level of abstraction to join 
nodes with common restrictions. The model allows certain 
group constraints to be defined such as Cardinality, 
DeviceType, MinBatteryLevelRequired: 
· Cardinality: We can specify how many sensors and 
actors must belong to a group. 
· DeviceType: It defines what type of element (actors, 
sensors, or both) belongs to the group. 
· MinBatteryLevelRequired: It indicates the minimum 
threshold battery that must have a sensor or actor to 
belong to a group. 
 
 
 Previous code shows the specification of a group where the 
parameter Location is mandatory. In addition, this group must 
have between three and six motes and one actor, and finally 
each sensor or actor trying to join the group must have a 
minimum battery level of ten.The model defines the Create 
Group template to create the group: 
Group MyGroup(loc :Location){ 
 Location = loc; 
 Cardinality = (mote, 3-6) (actor, 1-1); 
 DeviceType = “actor” or “mote”; 
 MinBatteryLevelRequired = 10.0; 
} 
Mote template MoteType1(i: Id, loc: Location){ 
 Id = i; 
 Location = loc; 
 Publish Service1, Service2 in group 
GroupA(loc); 
 Publish Service1, Service2 in group 
GroupB(loc); 
} 
Create mote MoteType1(1, “Room A”); 
 
  
 This code will create a group of type MyGroup placed in the 
Room A. 
 
C. Port Declaration 
Ports are the access points to node (mote or actor) services 
as well as being the mechanism used to compose the services 
offered to the system by the nodes. Within a port definition, 
developers can define commands (synchronous and 
asynchronous) and events which are asynchronous 
communication mechanisms. Ports are global definitions for 
all the services and therefore no implementation details are 
associated with them. This way, different services offering a 
specific port can be implemented in different ways: different 
execution platforms, programming languages, algorithms, etc. 
A simple port is defined below: 
 
 
MyPort defines a synchronous command named GetValue. 
It takes one output parameter, an asynchronous command 
named SetValue, which passes one input parameter, and an 
event called High.  
D. Service Definition 
A service definition is composed of: firstly, a service 
description followed by the ports that will take part in the 
service. The port classification is divided into three types: 
provided ports, required ports and optional ports: 
· Provided ports: Ports that the service offers to other 
services. The rest of the nodes can query this service 
through the provided ports. 
· Required ports: Ports that a service needs in order to 
be executed. Other nodes in the network must offer 
these services. 
· Optional ports: These ports are associated with non 
crucial services, e.g. ports used to monitor the system. 
In the example below MyService provides Myport and 
requires OtherNodePort. 
 
 
E. Including Real Time Constraints 
Previous sections details how to build services based on 
applications but the model allows the developer the possibility 
of defining real-time restrictions for each command and event 
defined in the ports. These restrictions are specified by the 
keywords Priority, Deadline and Period. 
· Priority allows us to define the priority of the 
commands and events defined in the service. It allows 
us to establish an order, attending to the most critical 
service first. 
· The Deadline is established at the request of the 
service. It will allow the runtime system to analyze 
wether the requested service has been executed meeting 
the deadlines. 
·     The Period is defined in the events declared in the 
services but the model allows us to define a periodic 
request of the event. The period of the service 
establishes the minimum execution period. The run 
time system must analyze if the reception period of the 
event requested can be carried out.  
 
The code below represents a service definition with timing 
constraints; firstly, we write a brief description, then, we 
define that the service provides a port (MyPort) and only one 
port is required (OtherNodePort). In this example, the service 
has no optional ports. 
 
The GetValue command will be executed with a priority of 
5 and the High event has an execution period of one-second 
and a priority of 10. The priority concept will allow us to solve 
the simultaneous execution of GetValue or the High event. 
 
From the point of view of the service request, a GetValue 
command is required but it must be executed before 1000 ms, 
SetValue does not have timing requirements and the High 
event is requested within a period of 2000 ms. As we can see 
in the service definition, it can be executed with a minimum 
period of 1000 ms so the run time system can be designed to 
adapt the event reception to the timing constraints required. 
III. AN EXAMPLE 
In this section a brief example is presented to illustrate the use 
of different model templates. The example constitutes a real 
time WSAN to monitor and control fire in a building. We 
require four node types: the FireDetect node is an actor that 
is able to detect a possible fire and then act accordingly. The 
TemperatureSensor and SmokeSensor nodes provide 
temperature and smoke level measurements respectively. The 
monitor node (not specified in the example) receives data 
Create Group MyGroup(“Room A”); 
 
Service MyService2 { 
 Description = “Example service2”; 
 Requires MyPort with constraints { 
  GetValue deadline 1000; 
  SetValue; 
  High Period 2000;   
 } 
} 
} 
Service MyService { 
 Description = “Example service”; 
 Provides MyPort with constraints { 
  GetValue Priority 5; 
  SetValue; 
  High Period 1000, Priority 10;   
 } 
 Requires OtherNodePort; 
} 
 
Port MyPort { 
 CommandSync GetValue(out value :int); 
 CommandAsync SetValue(in value: int); 
 Event High; 
} 
Service MyService { 
 Description = “Example service”; 
 Provides MyPort; 
 Requires OtherNodePort; 
} 
 
 about the actions carried out by the fire detector and updates 
the temperature and smoke threshold values used to trigger the 
corresponding events. In the proposed system four ports exist, 
one for each service, although a service can usually provide 
more than one. The smoke port (PSmoke) offers a command to 
obtain the smoke reading and raises an event when this reading 
is higher than a pre-determined threshold. The temperature 
port (Ptemperature) is defined in the same way. The system is 
composed of only one fire detector, but there could be one or 
more temperature/smoke sensors per room and any number of 
monitors. In order to provide the fire detection service, at least 
one temperature sensor and a smoke sensor must be available. 
Therefore, temperature and smoke ports are required. 
Monitoring ports will be optional. 
 
 
 
Following the service specification, by using the priority, the 
execution order for simultaneous invocations to service 
commands or events can be determined when it is executed in 
the same node. In the example, event SmokeHigh is the highest 
priority action. Additionally, the fire detection requires the 
smoke data with a period of 1000 ms. Finally, the fire 
detection service will call command GetTemperature and its 
execution must carry out before 2000 ms. 
 
The previous code specifies the service architecture but it is 
necessary to define the actors and sensors that will provide the 
services. It is carried out by means of the Create templates.   
From the point of view of actors the application will include 
the actor creation to provide the service. An actor FireSDetect 
is created with identifier 1 and location “building”. 
 
 
The application from the point of view of sensors will create 
two services if sensor is able to get smoke and temperature 
data. 
 
 
IV. CONCLUSIONS AND FUTURE WORK 
In this paper we have presented a novel service based 
programming model for real time WSANs. It allows us to 
define the services and their interactions in order to build the 
applications in a platform independent way. The real time 
constraints as priority, periodical actions or deadlines can be 
specified by means of the model. As future work a middleware 
to execute the model semantics will be carried out. A 
prototype is being developing to apply it to .NET or Java 
platforms. 
REFERENCES 
[1] Akyildiz, I.F., Su, W., Sankarasubramaniam, Y., Cayirci, E.,Wireless 
Sensor Networks: A Survey. Computer Networks Journal, 38, 4 (2002), 
pp. 393--422. 
[2] Sensor Networks Applications. Special Issue of IEEE Computer, 37, 8 
(2004), pp. 50--78. 
[3] Wireless Sensor Networks. Special Issue of Communications of the 
ACM, 47, 6 (2004). 
[4] Akyildiz, I.F., Kasimoglu, I.H., Wireless Sensor and Actor Networks: 
Research Challenges.  Ad Hoc Networks J., 2, 4 (2004), pp. 351--367. 
[5] http://www.sunspotworld.com/ 
[6] http://www.xbow.com/Products/productdetails.aspx?sid=253 
[7] A. Rezgui, M Eltoweissy. Service-oriented sensor-actuator networks: 
Promises, challenges, and the road ahead. Computer Communications 
30 (2007) 2627-2648. 
[8] M. Kuhwaha et al. OASiS: A Programming Framework for Service-
Oriented Sensor Networks. Communication Systems Software and 
Middleware, 2007. pp 1-8 
[9] J. Prinsloo et al. A Service Oriented Architecture for Wireless Sensor 
and Actor Network Applications. SAICSIT 2006. pp 145-154. 
[10] J. King et al. Atlas: a service oriented sensor platform: hardware and 
middleware to enable programmable pervasive spaces. Proceedings of 
31st IEEE Conference on Local Computer Networks, 2006, pp.630-638. 
Mote template SmokeSensor (l : Location) { 
… 
Publish SSmoke; 
} 
Mote template TemperatureSensor (l : Location) { 
… 
Publish STemperature; 
} 
Actor template FireDetect (id: Location) { 
Publish SFireDetection; 
} 
Port PSmoke { 
CommandSync GetSmoke(out d: float ;out id:ID); 
Event SmokeHigh; 
Event SmokeData; 
} 
Port PTemperature { 
CommandSync GetTemperature(out d: float; 
out id: ID ); 
Event TempHigh; 
} 
 
Service Stemperature { 
 Description=”Temp. Data” 
 Provides PTemperature with constraints {  
GetTemperature priority 4; 
TempHigh priority 5; 
  } 
} 
Service Ssmoke { 
 Description=”Smoke Data” 
 Provides PSmoke with constraints {  
GetSmoke priority 6; 
SmokeHigh priority 8; 
SmokeData priority 7, period 500; 
  } 
 
} 
Service SFireDetection { 
Decription=”Detect fire”; 
Requires PTemperature with constraints { 
  Ge tTemperature deadline 2000; 
    TempHigh period 2000; 
} 
Requires PSmoke with constraints { 
  SmokeData period 1000; 
  SmokeHigh; 
} 
} 
Application { 
 … 
 Create sensor SmokeSensor(1,”Building1”); 
 Create sensor TemperatureSensor(1,”Building1”); 
 … 
} 
 
Application { 
 … 
 Create actor FireDetect(1,”Building1”); 
 … 
} 
 
Relaxing Event Densities by Lower Bounds on
Event Streams
Steffen Kollmann, Karsten Albers and Frank Slomka
Embedded Systems / Real-Time Systems
University Ulm
{first name.second name}@uni-ulm.de
Abstract—The regular execution of tasks - e.g. sensor data
acquisition - is common in embedded systems. So it is possible
to determine not only a minimal distance in which events can
occur, but also a maximum distance between events. In this paper
we will show for fixed-priority systems how it is possible to use
a lower bound of stimulation to improve the minimal distance
between events. Taking this into account will relax the worst case
response times of tasks in distributed real-time systems, leading
to a more accurate schedulability analysis.
I. INTRODUCTION
For some embedded systems it is necessary to interact
with its environment in specific time intervals. Sensor data
acquisition or control circuits are examples for such cases. So
it is possible to determine not only a maximum density of
stimulation in such systems, but also a minimal density.
In figure 1 it can be seen that the consideration of minimal
distances can lead to an improvement of the density of
outgoing events. The CPU has got the tasks τx and τy . We
assume that τx has a strict periodic execution. In the right part
of the figure the improvement of the distance of two events of
τy can be observed. In the upper part the minimal stimulation
is not considered, as usual in real-time analysis. In the lower
part the minimal stimulation is considered. This leads to a
relaxed outgoing event stream.
Cpu
τx
τy
b
x,3
c
x,2
C
y,1
c
x,1
c
x,1
c
x,2
c
y,1
c
y,1
b
y,2
not considered
considered
c
y,1CPU
priority=high
Θ
1
τx
τy
τx
τy
improvement
priority=low
Θ
2
Θ
3
b
x,4
b
x,3
b
y,2
b
x,4
b
y,2
b
y,2
Θ
4
Fig. 1. Improvement of the event density by considering minimal event
streams
The rest of the paper is organized as follows: In section
II we give a short overview about related work in this area.
The model is explained in section III. Section IV explains
the calculation of event streams in distributed systems by
considering minimal stimulation. An example is given in
section V. Finally a conclusion follows.
II. RELATED WORK
Some models consider lower bounds of event sequences that
can occur in a system. Such models are for example the real-
time calculus [7] or the periodic task model with jitter [5].
But only a very few contributions make use of these bounds.
The transaction model [6], for example, does not use the lower
bounds.
The real-time calculus defines no possibility to consider the
best case execution time of the task under analysis in order to
calculate the upper request curve. Only the lower service curve
includes the minimal occurrence of the higher priority tasks.
This lack is founded by the fact, that the real-time calculus
can not distinct between best case and worst case execution
times.
Redell, for example, shows in [4] how the calculation of a
best case response time can be obtained when lower bounds
of stimulations are considered. In this paper we also exploit
the lower bound of the stimulations in order to improve the
maximum density of events in a system. For this, we will
adapt Redell’s approach to the event stream model [2] and
extend it in order to improve the calculation of event streams
in distributed systems.
III. MODEL
In this section we introduce our models. We differentiate
between the task model and the model for the stimulation.
A. Task Model
Γ is the set of tasks on one resource Γ = {τ1, ..., τn}. A
task τ = (c, b, d, ρ,Θ,Θ). Where c is the worst case execution
time, b is the best case execution time, d is the deadline, ρ is
the priority for the scheduling (the lower the number the higher
the priority), Θ defines the maximum stimulation (maximum
density of events) and Θ the minimum stimulation (minimum
density of events). Let τij be the j-th job/execution of task τi.
In our model we assume that a task can only generate
an event at the end of its execution to notify other tasks.
Furthermore we assume a fixed-priority scheduling.
B. Maximum Event Streams
Event streams have been first defined in [2]. The purpose
was to give a generalized description for every kind of stimuli.
The basic idea is to define an event function E(I,Θ) which
can calculate for every interval I the maximum amount of
events occurring within I . In the following, when speaking
of intervals we mean the length of the interval. The event
function needs a properly described model behind it which
makes it easy to extract the information. The idea is to notate
for each number of events the minimum interval which can
include this number of events. Therefore we get an interval for
one event (which is infinitely small and therefore considered
to be zero), two events and so on. The result is a sequence
of intervals showing a non-decreasing behaviour. The reason
for this behaviour is, that the minimum interval for n events
cannot be smaller than the minimum interval for n-1 events
since the first interval also includes n-1 events. This sequence
of intervals shows a periodic behaviour and is called event
stream. Each of the single intervals is called event stream
element.
Definition 1: A maximum event stream is a set of event
stream elements θ:Θ={θ1,θ2,...,θn} and each event stream ele-
ment θ = (p, a) consists of an offset-interval a and a period
p. The maximum event stream complies the characteristic
E(I1+I2,Θ)≤E(I1,Θ)+E(I2,Θ).
The characteristic of the maximum event stream is called
sub-additivity. This means that the maximum number of events
of an interval cannot exceed the cumulated maximum number
of events of its subintervals.
Each event stream element describes a set of intervals of the
sequence. For the event stream element θ the interval a+k ·p
is part of the sequence and all the intervals with k ∈ N. An
event stream models a given sequence if all the elements and
only the elements of the sequence can be generated using the
event stream elements. Therefore it is possible to calculate for
each interval the maximum amount of events that can occur
within this interval:
Event Stream Function:
E(I,Θ)=
P
θ∈Θ
E(I,θ) ; E(I,θ)=
8>><>>:
0 I<aθj
I−aθ
pθ
+1
k
I≥aθ∧pθ<∞
1 I≥aθ∧pθ=∞
(1)
As inverse function we define the following function which
gives to a number of events the minimum interval in which
these events can occur:
Request Time Function:
RT (n,Θ)=min{I|E(I,Θ)=n} (2)
With an infinite (∞) period it is possible to model irregular
behaviour. A detailed definition of the concept and the
mathematical foundation can be found in [1].
!
"
#
$
%!& %
%
!
!
!
Fig. 2. This figure shows three different event sequences
In figure 2 some examples for event streams can be found.
The first one Θ1 = (p,0) has a strictly periodic stimulus with
a period p. The second example Θ2 = (∞,0), (p,p-j) shows a
periodic stimulus in which the single events can jitter within a
jitter interval of size j. In the third example Θ3 = (p,0), (p,0) ,
(p,0), (p,t) three events occur at the same time and the fourth
occurs after a time t. This pattern is repeated with a period of
p. Event streams can describe all these examples in an easy
and intuitive way.
C. Minimal Event Streams
Analog we define the minimal event streams which describe
for every Interval I the minimum stimulation in such an
interval.
Definition 2: A minimum event stream is a set of event
stream elements θ:Θ={θ1,θ2,...,θn} and each event stream ele-
ment θ = (p, a) consists of an offset-interval a and a period
p. The minimum event stream complies the characteristic
E(I1+I2,Θ)≥E(I1,Θ)+E(I2,Θ).
The characteristic of the minimum event stream is called
super-additivity. This means that the maximum number of
events of an interval can exceed the cumulated maximum
number of events of its subintervals.
Anymore, for the minimal event stream applies the follow-
ing lemma.
Lemma 1: For a minimum event stream aperiodic events
occurring independently of the remaining event stream can be
ignored.
Proof: Let us assume that an aperiodic event (∞, a) exists
in a minimal event stream Θ and this event is the last aperiodic
event:
(∃Θ);(∃θ=(∞,a))|(θ∈Θ∧¬θ′∈Θ:aθ<aθ′∧pθ′=∞)
⇒ (∃I1∈R);(∃I2∈R)|(I1>aθ∧I2=I1+I1)
⇒ E(I1)>E(I2)−E(I1)
⇔ E(I1+I1)<E(I1)+E(I1)
Which is a contradiction to the assumption. Since there is no
last aperiodic event in a minimal event stream, it follows that
no aperiodic event exist in a minimal event stream. 
The examples in figure 2 can be described by the following
minimal event streams: The first one Θ1 = (p,p). The second
example Θ2 = (p,p+j). In the third example Θ3 = (p,p-t), (p,p),
(p,p), (p,p) .
IV. IMPROVED DENSITY BY MINIMAL EVENT STREAMS
We have introduced a task model and a model for the
stimulation. With these models we will show how it is possible
to determine the stimulation density in the whole system. For
this we have to determine when the worst case occurs.
Lemma 2: A number of outgoing events occur in the
maximum density when the first event is delayed as much as
possible and all further events occur as early as possible.
Proof: We assume that two outgoing events e1 and e2
exist having a higher density than the events fulfilling the
assumption. If e1 and e2 are closer together than in the
assumption, this would mean either e1 arrives later than
allowed by the assumption or e2 arrives earlier than allowed
by the assumption. This is a contradiction, because we assume
already the maximum or minimum values for both arrival
times. So there must be two other events later in the outgoing
event stream having a shorter distance to each other. Assume
that two events are occurring closer than in the assumption
and the first event is delayed as much as possible and the
second arrives as early as possible, this would mean that the
corresponding incoming events also have a shorter distance
to each other than the first two incoming events. But this is
in contradiction to the event stream definition. The proof for
another number of events is analog. 
For the calculation we need the worst case response time
which determines the maximum delay of an event. Since we
have minimal event streams it is also possible to determine a
best case response time. So we first define the methodology
in order to determine these two response times.
A. Worst Case Response Time
The most usual way to do a real-time analysis is to perform
a response time analysis as introduced by Lehoczky et. al.
[3]. The condition ∀τ ∈ Γ : WCRTk(τ) ≤ dτ holds when
the real-time analysis is successful. In order to calculate the
worst case response time we have adapted the approach from
[3].
WCRTk(τ)=min{I|I=k·cτ+
P
τ′∈HP E(I,Θτ′ )·cτ′} (3)
The equation is similar to the common definition of the worst
case response time. Only the calculation of the influence
of higher priority events has been changed. The amount of
execution produced by higher priority tasks can be calculated
by the event function multiplied by the worst case execution
time. By means of a fixed point iteration the worst case
response time can be calculated for every k.
B. Best Case Response Time
Additionally to the worst case response time it is possible
to determine a best case response time, since we have minimal
event streams. For this we have adapted the best case response
time from Redell [4].
BCRT (τ)=min{I|I=bτ+
P
τ′∈HP E(I,Θτ′ )·bτ′} (4)
The equation adds to the best case execution time of task τ
the best case execution time of the higher priority tasks. How
many execution times are added depends on the minimal event
streams of the higher priority tasks. As well as the worst case
response time, it is possible to find the best case response time
by a fix-point iteration. For a detailed description see [4].
C. Calculation of Outgoing Event Streams
For the calculation of the density of the outgoing events we
define an interval function. This function gives for an amount
of events the minimum interval in which they can occur. We
call it interval function and define it as follows:
I(n,τ)=
8<:0 n=1
RET (n,τ)−RET (1,τ) n>1
(5)
RET (n,τ)=
8>><>>:
WCRT1(τ) n=1
BC(τ,RT (n,τ),RET (n−1,τ),WCRT1(τ),BCRT (τ),ΓτHP )) n>1
(6)
According to the event stream definition one event occurs
always in the interval zero. Hence, we distinguish in the
equation 5 between two cases. The first case describes the
1 BC(τ, RTn,RETn−1,WCRTτ ,BCRTτ ,ΓHP ) {
2 CSTART = max(RTn,RETn−1);
3 RETnew = CSTART + BCRTτ ;
4 whi le ( t r u e ) {
5 B = bτ ;
6 f o r (∀τ′ ∈ ΓHP ) {
7 ∆I = RETnew − (WCRTτ − cτ′ );
8 B = B + E(∆I,Θ
τ′ ) · bτ′ ;
9 }
10 ∆J = RETnew −WCRTτ ;
11 I f (B > ∆J ) {
12 RETnew = WCRTτ + B ;
13 }
14 e l s e {
15 re turn RETnew ;
16 } } }
Fig. 3. Calculation of the improved Best Case Response Time
interval for one event which is always zero according to the
event stream definition. All other events are covered by the
second case via the Request End Times.
In order to explain the calculation of the events greater than
one we use the figure 4.
c
x,3
c
y,1
c
y,1
I
I
1)
τ
x
τ
y
P
rio
rit
y
I
I
2)
τ
x
τ
y
P
rio
rit
y
I
I
3)
τ
x
τ
y
P
rio
rit
y
I
I
4)
τ
x
τ
y
P
rio
rit
y
c
x,1
c
x,2
c
y,1
c
y,1
WCRT
c
x,1
c
x,2
c
y,1
c
x,3
c
y,1
c
x,1
c
x,2
c
y,1
c
y,1
b
y,2
b
x,4
c
x,3
c
y,1
b
y,2
ΔI
c
x,1
c
x,2
c
y,1
c
y,1
b
y,2
b
x,4
c
x,3
c
y,1
b
y,2
b
x,5
ΔJcx
b
x,4
b
y,2by,2
Fig. 4. Improvement of the event density considering minimal event streams
According to lemma 2 the first event is delayed as much
as possible. This delay can be determined by the worst case
response time of the first job. So the first calculation is the
WCRT of instance one like in figure 4 (part 1). So the case
for n=1 in equation 6 calculates the worst case response time.
The next events must occur as soon as possible. This
happens when the task runs with its best case execution time
and the job runs as soon as possible. For this calculation we
use the algorithm depicted in figure 3. In line 2 we determine
when the calculation can start. From this point in time we add
the best case response time determined by Redell’s approach
(line 3). This can be seen in figure 4 (part 2).
The next step is to determine, whether more interrupts can
occur from higher priority tasks or not. So we determine
for every higher priority task an interval (∆I) from the last
possible stimulation of the task in the worst case response time
up to the end of the best case response time (see figure 4 part
3). This is done in line 7.
Line 8 determines the absolute demand of execution of one
task within the interval. If the execution demand of all tasks is
greater than the interval ∆J , the best case response time will
be more relaxed (line 12). See figure 4 (part 4). Otherwise
the best case response time will be not changed (line 15).
This step must be repeated until the best case response time
is unchanged. This is done by the while-loop which is equal
to a fix-point iteration.
Is the request end time of the n-th event determined, the
minimal interval for n events can be determined by the request
end time of the n-th event minus the request end time of the
first event (see equation 5).
V. EXAMPLE
In order to show the significance of our approach we show
by a short example the improvement of the density of events
in a distributed system. Figure 5 shows this example. We
calculate the density of events for ΘF and show the impact
on the response time via task τ4.
τ
1
τ2
τ3
τ5
τ6
CPU1 CPU2
priority=high
priority=middle
priority=low
Θ
A
priority=low
priority=high
Θ
C
Θ
D
Θ
B
Θ
E
Θ
F
Θ
H
Θ
G
τ4
priority=middle
Fig. 5. Example of a distributed system
The next table describes the properties of the distributed
system.
CPU1 τ1 τ2 τ3
c 4 4 14
b 4 4 13
d 40 50 50
ρ 1 2 3
Θ ΘA ΘB ΘC
Θ ΘA ΘB ΘC
CPU2 τ4 τ5 τ6
c 31 2 9
b 15 1 5
d 55 60 40
ρ 2 3 1
Θ ΘD ΘE ΘF
Θ ΘD ΘE ΘF
TABLE I
PARAMETERS OF THE DISTRIBUTED SYSTEM WHICH IS DEPICTED IN
FIGURE 5
The maximum event streams are: ΘA = {(12, 0)}, ΘB =
{(12, 0)}, ΘC = {(30, 0)} and ΘD = {(70, 0)}.
The minimum event streams are: ΘA = {(12, 12)}, ΘB =
{(12, 12)}, ΘC = {(30, 30)} and ΘD = {(70, 70)}.
We have calculated the minimal intervals of the first five
events of ΘF to show the improvement of the approach. This
can be seen in table II where we have calculated the densities
with approach, without approach and with Redell’s approach.
The table III shows the three different event streams of ΘF .
n ΘF1 ΘF2 Impr. ΘF1 ΘF3 Impr.
1 0 0 0% 0 0 0%
2 29 21 27,58% 29 13 55,17%
3 50 42 16% 50 27 46%
4 71 65 8,45% 71 57 19,71%
5 95 95 0% 95 87 8,42%
TABLE II
SHOWS THE IMPROVEMENT OF THE APPROACH ON THE EVENT STREAMS.
ΘF1 SHOWS THE INTERVALS WITH THE NEW APPROACH, ΘF2 SHOWS THE
INTERVALS WITH REDELL’S APPROACH AND ΘF3 WITHOUT ANY
APPROACH. THE IMPROVEMENT IS GIVEN IN %
ΘF1 = {(∞,0),(∞,29),(∞,50),(∞,71),(30,95)}
ΘF2 = {(∞,0),(∞,21),(,42),(30,65)}
ΘF3 = {(∞,0),(∞,13),(30,27)}
TABLE III
RESULTS OF THE EVENT STREAMS WITH THE DIFFERENT APPROACHES.
So we are able to calculate the worst case response time of
tasks τ4. The response time without any approach is 67 t.u.,
with Redell’s approach 58 t.u. and with the new approach 49
t.u. This leads to an improvement of 15,51% against Redell’s
approach and to 26,86% against without any approach.
VI. CONCLUSION
In this paper we have shown how to use lower bounds
of stimulation in order to improve the real-time analysis of
distributed systems. We have shown how the approach of
Redell [4] can be adapted and extended in order to improve
the calculation of event sequences. Furthermore we have
shown that this leads directly to more realistic response times
in the system. In the future we would like to develop an
efficient approach to calculate the maximum and minimum
event streams in the systems. Additionally, we will extend the
introduced approach further so we obtain tighter bounds in the
analysis. Another approach is to determine the real occurrence
of the last events of the higher priority tasks during the worst
case execution time. This would lead to a greater interval ∆I .
An extension to dynamic scheduling is also an aim.
:
REFERENCES
[1] Karsten Albers and Frank Slomka. An event stream driven approximation
for the analysis of real- time systems. In ECRTS ’04: Proceedings of the
16th Euromicro Conference on Real-Time Systems, pages 187–195. IEEE,
July 2004.
[2] Klaus Gresser. An event model for deadline verification of hard real-time
systems. In Proceedings of the 5th Euromicro Workshop on Real-Time
Systems, 1993.
[3] John P Lehoczky. Fixed priority scheduling of periodic task sets with
arbitrary deadlines. In Proceedings of the 11th IEEE Real-Time Systems
Symposium, pages 201–209, December 1990.
[4] Ola Redell and Martin Sanfridson. Exact best-case response time analysis
of fixed priority scheduled tasks. In ECRTS ’02: Proceedings of the 14th
Euromicro Conference on Real-Time Systems, page 165, Washington, DC,
USA, 2002. IEEE Computer Society.
[5] Kai Richter. Compositional Scheduling Analysis Using Standard Event
Models - The SymTA/S Approach. PhD thesis, University of Braun-
schweig, 2005.
[6] Ken Tindell. Adding time-offsets to schedulability analysis. Technical
report, University of York, Computer Science Dept, YCS-94-221, 1994.
[7] Ernesto Wandeler. Modular Performance Analysis and Interface-Based
Design for Embedded Real-Time Systems. PhD thesis, ETH Zurich,
September 2006.
Towards a Practical WCET Analysis Approach
Based on Testing
Thomas Lundqvist
Dept. of Computer Science and Engineering
Chalmers University of Technology
SE-412 96 Go¨teborg, Sweden
thomas.lundqvist@chalmers.se
Patrik Sandin
Saab Space AB
SE-405 15 Go¨teborg, Sweden
patrik.sandin@space.se
Abstract—Analyzing the worst-case execution time, the WCET,
of a program or task is an important activity when constructing
hard real-time systems. Traditional techniques like testing and
measurements now face problems due to the introduction of cache
memories. This paper presents a new approach that enhances the
traditional testing methodology with different analysis methods.
The safeness of individual program paths are guaranteed by the
use of a safety margin. Furthermore, the approach provides help
for the tester to find the critical paths to measure. The approach
is demonstrated for a processor containing an instruction cache.
The results indicate that this promises to be a simple and practical
approach that still can result in low overestimation of the WCET.
I. INTRODUCTION
The determination of the maximum or worst-case execution
time, WCET, of a program or task is an important prerequisite
when verifying response times in hard real-time systems [1].
Traditionally, the WCET of tasks has been estimated by the
use of measurement and testing techniques. By running a
program with different inputs while measuring the execution
time, the WCET can be estimated. A testing methodology
cannot guarantee a safe estimate, i.e., the actual WCET can
be underestimated. Nevertheless, measurements can work well
in practice since manual inspection of a program often reveal
the test cases needed to provoke the worst-case behaviour.
This testing methodology, traditionally used in industry,
is now facing problems. Increasingly, microprocessors with
cache memories are being introduced in hard real-time systems
in order to increase performance by reducing the average mem-
ory access latency. One example being the LEON processor
core [2], [3], which in its later versions contains both an in-
struction and a data cache memory. Cache memories introduce
new timing dependencies between previously unrelated parts
of the program. These dependences are often nonintuitive and
make it harder to rely on manual inspection to derive worst-
case test cases.
From a testing point of view, cache memories introduce
two new sources of uncertainties. The first uncertainty is that
the execution time of a single program path can vary even
when testing with the same input data. The reason is that the
number of cache misses depends on the initial state of the
This work has been supported by the NRFP (Swedish National Space
Board) project 53/07.
cache. This initial state can be hard to control and observe
which is fundamental in creating reliable tests. The second
uncertainty is that the execution of certain program paths
can trigger conflict misses that leads to a large nonintuitive
increase in the execution time. These program paths might
not appear to be interesting from a manual inspection point of
view but may still be the paths that cause the longest execution
time due to the extra conflict misses.
To restore faith in the testing methodology, we are working
on a WCET estimation approach that complements testing
with analysis. Our approach is to attack the two sources of
cache-related uncertainties by using different analysis meth-
ods:
• Single path estimates should be made safe by adding
a safety margin to the measured WCET. Ideally, the
tightness of the margin (the overestimation) should be
controllable by being able to use a range of analysis
methods of different complexity. Then, when a large
margin might be acceptable, a simple analysis would
suffice.
• The tester should be assisted in finding untested dan-
gerous program paths. An analysis should give warning
or information about possible cache conflicts to help the
tester cover nonintuitive but important program paths.
By eliminating the two sources of uncertainties using rela-
tively simple analysis methods, we hope to restore the same
level of confidence in our WCET estimates as we had before
cache memories were introduced.
Previous research in WCET analysis [1] has produced a
rich variety of analysis approaches. Our approach shares many
ideas with other measurement-based approaches [4]–[8]. An
important difference however, is that other methods strive for
automation in path analysis or test input generation using static
analysis methods. Our approach is to instead rely on the tester
for assuring program path coverage. This, we believe, will
result in a more simple and useful overall approach. Another
difference is that many other methods [4], [6], [8] do not
include a safety margin to guarantee safe timing analysis
leading to potentially unsafe estimates. One notable exception
is [5], where they avoid the timing analysis uncertainty by
carefully controlling the hardware. In [7], they propose an
approach similar to ours: complementing measurement with
analysis to establish that major uncertainties are covered.
However, their method relies on more complex probabilistic
calculations. We simply use the measured execution time plus
a safety margin for obtaining a WCET estimate.
The goal of our approach is to create a range of analysis
methods for handling both instruction and data caches. Since
this is work in progress, the purpose of this paper is to
present the basic ideas. To illustrate these ideas, a simple
processor architecture containing an instruction cache will be
used. In the next section (Section II), we introduce a small
example program to illustrate the two sources of uncertainty
when trying to measure the WCET during testing. Then, in
Section III and IV we explain how our approach can help to
restore confidence in the measured WCET estimates.
II. TESTING AND CACHE MEMORIES
To illustrate the problem with testing we will now only focus
on instruction caching and use the example program in Fig. 1.
This program consists of a function a(), which calls three
other functions: b(), c(), and big(). The input data to a()
is the boolean variables: x and special, and the boolean
vector: v[]. These input variables control which program path
is going to be executed and thereby which instructions get
fetched via the instruction cache.
For our example we assume that the program is run on
an idealized processor with pipelined instruction execution
and a direct-mapped, 16 KiB instruction cache and no data
cache. Each machine instruction executes with a constant,
fixed latency in the pipeline. A cache miss stalls the execution
by a fixed cache miss penalty of 5 clock cycles. All data
defining the instruction cache can be found in Fig. 2. This
figure also shows how the linker has placed the functions in
memory and how big each function is in terms of memory
(cache) blocks. An important observation for our example is
that the two functions b() and c() map to the same location
in the instruction cache. This is a potential source of cache
conflict misses as we will see later.
We will now look at what happens when we measure the
execution time of this program by testing different input data.
Table I shows how the real execution time varies depending on
the input we use. For example, test case 1 makes the program
call function b() before the loop is entered as well as inside
each loop iteration. The real execution time varies between
4140 clock cycles and 4210 clock cycles due to cold misses in
the instruction cache. If the cache already contains the needed
memory blocks in the beginning of the execution we get 4140
clock cycles. If the cache is empty, we get 4210 clock cycles.
Table I also illustrates how a tester might proceed when
trying to estimate the WCET of function a(). We assume
that during and between measurements we have no control
over the cache content. Our measurements will therefore end
up anywhere in the range given by the real execution time.
For the WCET estimation, the loop is the natural starting point
since programs often spend most of their time in loops. The
first two test cases, 1 and 2, cover the two different alternatives
1 void a(x, v[], special)
2 if (x)
3 b()
4 else
5 c()
6 if (not special)
7 for (i = 0 ; i < N ; i++)
8 if (v[i])
9 b()
10 else
11 c()
12 else
13 big()
Fig. 1. The example C program. The function a() is shown. This function
calls three other functions: b(), c(), and big(). The boolean input
variables x, v[], and special control the execution path.
Instruction
cache
Size: 4 KiB
Block size: 16 B
Direct mapped
Cache miss
penalty: 5 cycles
0x000 tag[0] a()
0x200 b() and c()
0x800 big()
0xff0
tag[1]
tag[32]
tag[33]
tag[128]
tag[129]
tag[255]
tag[254]
Function Address Cache address # of blocks
a() 0x0000-0x009f 0x000 10
b() 0x0200-0x023f 0x200 4
c() 0x1200-0x124f 0x200 5
big() 0x1800-0x1bff 0x800 64
Fig. 2. Instruction cache configuration and placement of functions. Each
block in the cache has a tag identifying the memory block currently cached.
The functions map to different locations in the cache depending on their
memory address. The blocks from b() and c() might conflict in the cache.
inside the loop: calling b() or calling c(). The tester would
find that c() has a longer execution time and would maybe
continue with the additional test cases, 3 and 4, to cover the
other obvious program paths. The final estimate of the WCET
would become 5200 clock cycles (test case 3).
In this testing example, the WCET was underestimated due
to two reasons. First, we did not know how the real execution
time varies for the program paths we measured and even if we
run the same test case multiple times, we cannot know if we
TABLE I
TEST-CASES USED WHEN ESTIMATING THE WCET OF THE EXAMPLE
PROGRAM IN FIG. 1. THE NUMBER OF LOOP ITERATIONS IS N = 100.
Test Real exe- Measure-
case x v[] special cution time ment
1 true true ∀i false 4140–4210 4154
2 true false ∀i false 5160–5235 5177
3 false false ∀i false 5150–5225 5200
4 false false ∀i true 650–1045 655
(5) false alternating false 6650–6725
true–false
have covered the whole range. For example, among the four
paths tested, the estimated WCET should have been 5235 not
5200. The other reason for underestimating the WCET is that
we failed to test the dangerous program path represented by
test case 5 in Table I. This test case causes b() and c() to
be called in an alternating way so that 4 extra conflict misses
occur in each iteration. Testing this path would have given an
estimate closer to the real WCET of 6725 clock cycles.
III. SAFETY MARGIN METHODS FOR INSTRUCTION
CACHES
In the previous section we found that the WCET was
underestimated due to two reasons. First, the execution time
of a single path could vary due to the undefined initial cache
content. The second reason was that a critical program path
was not tested. We will now see on how our approach can
handle these problems. In this section, we will take a look
on methods to handle the varying execution times. In the next
section (Section IV), we will present methods to help with the
second problem, finding critical untested paths.
To obtain a safe WCET estimate for a single program path
despite the variations possible due to the undefined initial
cache state, we add a safety margin to our measurements. This
safety margin should ideally be as small as possible to reduce
the overestimation. Still, we want to have a range of methods
available so that less complex methods can be used when
some overestimation can be tolerated. This section presents
three such methods: the constant bound method, and the
dynamic and static cache footprint methods. We demonstrate
these methods using the example program and the direct-
mapped example system from Section II. However, the same
approach also handles set-associative instruction caches with
LRU (Least Recently Used) replacement.
To be able to calculate a safety margin we need to be able to
reason about how the initial cache state can influence the future
execution time. An important requirement on the processor
architecture is that the effect of a change in the initial cache
state has a constant, fixed penalty on the future execution time.
Another way of expressing this is that a change in the initial
timing state in the system has a bounded timing effect [9]. In
our example system this requirement is fulfilled. For example,
we know that an invalidation of a cache block can cause at
most 5 clock cycles penalty on the future execution time. This
makes it possible to calculate a safety margin based on the
number of initially undefined cache blocks as:
margin = B ∗ P
where B is an upper bound on the number of undefined
initial cache blocks and P is the cache miss penalty. The
worst-case assumption here is that during measurements, the
cache might contain useful blocks so that fewer cold misses
occur compared to the possible run-time behaviour. The safety
margin compensates for this risk.
To calculate the margin we need to find an upper bound
on the number of undefined cache blocks B. We now present
three different methods to estimate this upper bound.
A. The constant bound method
The constant bound method represents the most simple
approach. Here, we simply assume that all cache blocks in
the cache are undefined and potentially used by the program:
B = total number of cache blocks
For our example in Section II we get B = 256 and a margin
of 256× 5 = 1280 clock cycles.
If the overestimation can be tolerated, this method has
important advantages. The margin calculated is program in-
dependent. Thus, no analysis of the program is needed. Also,
for programs larger than the cache size, this method gives the
best estimate.
B. Dynamic cache footprint method
The next method, the dynamic cache footprint method, has
the potential of reducing the overestimation by limiting B to
the actual number of blocks touched when executing a certain
program path. This method relies on collecting instruction
fetch trace data during testing and requires hardware or
simulator support. Mapping trace data to memory locations
then reveals how many cache blocks that are touched for a
certain program path:
B(p) = touched blocks for path p
This results in the lowest possible overestimation since
it calculates an individual safety margin for each measured
program path (test case). For example, for test case 5 in our
previous example we would find that the number of touched
blocks is B(5) = 15 and the margin would become 15x5 = 75
clock cycles, exactly covering the real variation in execution
time.
C. Static cache footprint method
The static cache footprint method simplifies the dynamic
version by relying on information from the linker instead.
Knowing the placement and size of functions in memory,
the total number of blocks occupied by the program can be
calculated:
B = total program footprint in cache
This will typically produce a bound B larger than what the
dynamic method does. For our example, B = 79, and the
margin is 79 × 5 = 395 clock cycles. One advantage is that
the same margin can be used for all measured paths.
D. Discussion
The possible WCET overestimation resulting from the dif-
ferent safety margin methods is listed in Table II. The over-
estimation varies since the measured execution times varies.
Thus, we could do multiple measurements and pick the lowest
one to reduce the overestimation. For our example program,
the dynamic cache footprint method gives the lowest overesti-
mation. However, if we increase the number of loop iterations
to N = 1000, all methods result in low overestimation.
TABLE II
THE WCET OVERESTIMATION BY THE DIFFERENT SAFETY MARGIN
METHODS FOR TWO DIFFERENT VALUES OF N , THE NUMBER OF LOOP
ITERATIONS IN THE EXAMPLE PROGRAM.
Method Margin Overestimation
N = 100 N = 1000
Constant bound 1280 17.9%–19.0% 1.8%–1.9%
Dynamic cache footprint 75 0.0%–1.1% 0.0%–0.1%
Static cache footprint 395 4.8%–5.9% 0.5%–0.6%
The safety-margin methods can also be used in combination
with hardware control strategies. For example, by adopting
some of the techniques mentioned in [5], like cache flushing
or locking, the execution time variation can be completely or
partly eliminated. This would also reduce the safety margin
thus allowing for a trade-off between hardware control, anal-
ysis complexity, and overestimation.
IV. TEST COVERAGE WARNING METHODS FOR
INSTRUCTION CACHES
The safety margin methods only guarantees a safe WCET
estimate for individual program paths. For path analysis, we
rely on the tester to provide sufficient coverage. This can be
easy if the program only contains a single path. However,
given multiple paths, we want to give the tester information
or warnings about possible conflict misses in the program to
help the tester cover dangerous paths.
The approach taken is to rely on the linker to provide
placement and size information about all functions in the
program. The program memory regions can then simply be
mapped to cache regions to indentify conflicting areas. For
example, in our previous example, such an analysis would
quickly reveal that 4 memory blocks in b() and c() map to
conflicting regions.
Given information about conflicting regions in the code, the
tester proceeds to assess the risk:
• If a path exists through the program that passes through
conflicting regions in a repeatable and alternating way,
that path should be tested. Alternating between two
different regions could trigger conflict misses and if these
regions are inside a loop it could cause a large increase
in execution time.
• When inspecting different paths, the worst-case path
found so far should be prioritized to see if it conflicts
with some other region. Conflicts with the worst-case
path have a great potential of causing an increase in the
WCET.
Following these steps, a tester should have had no difficul-
ties in finding test case 5 in our previous example.
Given the information about conflicts, there is also another
important option. Since conflict misses are unwanted also
when striving for good average performance, it can be of
interest to control the linking phase in order to avoid conflicts.
This can be difficult for a direct-mapped cache but it can be
an important option for set-associative caches.
V. DISCUSSION AND FUTURE WORK
Our approach is to let the tester be responsible for finding
critical paths and thereby obtaining safe WCET estimates.
This simplifies the analysis needed and should work well for
programs with few program paths to test. However, further
studies are needed of more realistic programs and benchmarks
to assess the general applicability of this approach.
In the previous sections we have demonstrated our approach
for direct-mapped instruction caches. However, this study is
part of an ongoing project that aims to develop a full set of
methods to handle WCET analysis for the LEON processor
core [2], [3]. Thus, we need to handle both instruction and data
caches, which also are set-associative with LRU replacement.
Apart from the WCET analysis we should also be able to
estimate the effect of caches when doing a response-time
analysis of tasks that use preemptive scheduling.
Another important goal is to be able to support regression
testing. Here, we believe a successful approach will be to sup-
port the tester with information about the impact of program
changes by highlighting differences in margin calculations and
conflicting program regions between test runs.
VI. CONCLUSION
In this paper we have presented ideas for a new approach
to WCET analysis. Based on testing and the measuring of
critical program paths we add a safety margin to the mea-
sured execution times to obtain safe WCET estimates. The
methods presented are fairly simple but can still lead to low
overestimation of the WCET. Further studies are needed to
confirm the general applicability of the approach.
REFERENCES
[1] R. Wilhelm, J. Engblom, A. Ermedahl, N. Holsti, S. Thesing, D. Whalley,
G. Bernat, C. Ferdinand, R. Heckmann, T. Mitra, F. Mueller, I. Puaut,
P. Puschner, J. Staschulat, and P. Stenstro¨m, “The worst-case execution
time problem—overview of methods and survey of tools,” ACM Trans-
actions on Embedded Computing Systems (TECS), vol. 7, no. 3, Apr.
2008.
[2] ESA Microelectronics, “LEON2-FT IP core,”
http://www.esa.int/TEC/Microelectronics.
[3] Gaisler Research, “LEON processor cores,” http://www.gaisler.com.
[4] I. Wenzel, R. Kirner, B. Rieder, and P. Puschner, “Measurement-based
worst-case execution time analysis,” in Third IEEE Workshop on Software
Technologies for Future Embedded and Ubiquitous Systems, 2005. SEUS
2005, 2005, pp. 7–10.
[5] J.-F. Deverge and I. Puaut, “Safe measurement-based wcet estimation,”
in 5th Intl. Workshop on Worst-Case Execution Time (WCET) Analysis,
R. Wilhelm, Ed., Dagstuhl, Germany, 2007.
[6] G. Bernat, A. Colin, and S. M. Petters, “Wcet analysis of probabilistic
hard real-time systems,” in Proceedings of the 23th IEEE Real-Time
Systems Symposium (RTSS’02), 2002, p. 279.
[7] S. M. Petters, P. Zadarnowski, and G. Heiser, “Measurements or static
analysis or both?” in 7th Intl. Workshop on Worst-Case Execution Time
(WCET) Analysis, C. Rochange, Ed., Schloss Dagstuhl, Germany, 2007.
[8] M. Lindgren, H. Hansson, and H. Thane, “Using measurements to derive
the worst-case execution time,” in Proceedings of the Seventh Interna-
tional Conference on Real-Time Systems and Applications (RTCSA’00),
2000, p. 15.
[9] T. Lundqvist, “A wcet analysis method for pipelined microprocessors with
cache memories,” Ph.D. dissertation, Chalmers University of Technology,
Go¨teborg, Sweden, June 2002.
CAMA: Cache-Aware Memory Allocation
for WCET Analysis
Jo¨rg Herter
Department of Computer Science
Saarland University, Germany
Email: jherter@cs.uni-sb.de
Jan Reineke
Department of Computer Science
Saarland University, Germany
Email: reineke@cs.uni-sb.de
Reinhard Wilhelm
Department of Computer Science
Saarland University, Germany
Email: wilhelm@cs.uni-sb.de
Abstract—Current WCET analyses do not support dy-
namic memory allocation. This is mainly due to the un-
predictability of cache performance when standard memory
allocators are used. We present a novel dynamic memory
allocator that makes cache performance predictable and
(de)allocates memory in constant time. It thereby enables
WCET analysis in the presence of dynamic memory alloca-
tion.
I. INTRODUCTION
At present, static worst-case execution time (WCET)
analyses exist exclusively for programs with static memory
allocation. However, supporting dynamic memory alloca-
tion as well would be desirable for a number of reasons:
• It often allows to save memory space, e.g. by im-
mediately reusing the newly available space when
converting one data structure into another.
• It is sometimes more natural to use, i.e. it gives a
clearer program structure.
Why is dynamic memory allocation not supported by
current WCET analyses? In order to give safe and rea-
sonably precise estimations of the WCET, analyses have
to derive bounds on the cache performance. They have
to be able to statically classify most memory accesses
producing cache hits during program execution as such.
For standard malloc implementations this is impossible.
Since these do not provide information about the addresses
of allocated memory. In particular, a cache analysis does
not know which cache sets allocated memory will map
to. To obtain guarantees on the cache performance, an
analysis would need to know which blocks of data compete
within the cache, i.e. which blocks may evict each other
from the cache. In addition, memory allocators cause cache
pollution themselves. While maintaining and traversing
their internal data structures, they influence the cache
contents in an unpredictable way. Another somewhat less
severe problem with standard malloc is that its execution
time cannot be easily bounded.
This work is supported by the German Research Council (DFG)
as part of the Transregional Collaborative Research Center “Automatic
Verification and Analysis of Complex Systems” (SFB/TR 14 AVACS)
and the German-Israeli Foundation (GIF) in the “Encasa” project.
Our novel dynamic memory allocator alleviates these prob-
lems:
• It allocates and deallocates memory in constant time.
• It causes only a small, constant amount of cache
pollution, completely predictable in the sense that one
can statically determine which cache sets are affected.
• Allocation to cache sets can be controlled by an
additional parameter.
We describe the main ideas of the new memory allocator
and possible challenges in its implementation. Then we
discuss how its properties can be exploited to obtain safe
and precise WCET estimations.
A. Previous and Related Work
1) Constant time allocators: Dynamic memory alloca-
tors with bounded worst-case execution times have been
investigated for many years. The binary buddy system is a
long-known allocation algorithm whose WCET can be cal-
culated. However, it suffers from a relatively high internal
fragmentation of about 28% [9]. Besides simple segregated
lists which produce very high fragmentation, Ogaswara
proposed the first constant time allocation algorithm ([7];
as cited by Wilson et al. [13]). However, its fragmentation
is still high compared to other existing allocators. TLSF
[5], a dynamic memory allocator for real-time systems,
achieves constant run times while producing tolerable frag-
mentation. For real-life programs, fragmentation produced
by TLSF is similar to that caused by Doug Lea’s memory
allocator [4], currently considered to be the best general
allocator available [6]. Our allocator was greatly influenced
by TLSF. In fact, it can be regarded as a cache-conscious
modification of TLSF.
2) Cache-conscious/cache-aware allocators: Chilimbi
et al. proposed a cache-conscious memory allocator
(ccmalloc) in order to improve program execution
times [2]. Compared to malloc, Chilimbi’s ccmalloc
takes as an additional argument a pointer to an existing
data structure/object that is likely to be accessed
contemporaneously (or at least contemporary) with the
element to be allocated. ccmalloc achieves its goal by
trying to allocate the newly requested storage next to the
Sets
Lines
Fig. 1. Possible mapping from list elements to cache sets of a 4-way
set-associative cache with 4 cache sets.
one pointed to by its second argument. As a result, newly
allocated storage is often located in the same cache set as
the referenced one.
II. CACHES, CACHE ANALYSIS, AND
DYNAMIC MEMORY ALLOCATION
Caches are used to bridge the increasing gap between
processor speeds and memory access times. A cache is a
small, fast memory that stores a subset of the main memory
contents. It is located at or near the processor. Due to
the principle of locality, most memory accesses can be
serviced by the cache, although it is much smaller than
main memory, thereby drastically improving the average
latency of memory accesses. In order to give safe and rea-
sonably precise estimations of the WCET, cache analyses
[3] have to derive tight bounds on the cache performance.
In modern processors, turning off the cache easily causes
a thirty-fold increase in execution time [8]. Conservatively
classifying each memory access as a cache miss is thus
not an option. To classify memory accesses as hits, the
cache analysis needs to know the mapping of program
data to cache sets. Otherwise, it does not know which
memory blocks compete for cache lines. See the example
of a linked list in Figure 1. In the example, the elements of
the dynamically-allocated linked list map to the cache sets
very unevenly. Five of the six list elements map to different
blocks in memory, but all to one of the four cache sets.
While traversing the list, one of the list elements is already
being evicted, although the list is much smaller than the
cache. If the LRU replacement policy would be employed,
a subsequent second traversal of the list would result
in only one cache hit. With a standard malloc, a cache
analysis would not even be able to guarantee this single
hit: it has no knowledge of the mapping to cache sets.
Furthermore, all knowledge of the cache analysis about
previous cache contents would be lost while traversing the
list.
We propose to extend the malloc-routine by an addi-
tional parameter that constrains allocation to a specific
cache set. malloc(size, set) shall then return memory at
an address that maps to the cache set set. Given this
routine it is easily possible to allocate memory rela-
tive to a given address in the cache. A second routine
malloc(size, pointer, rel distance) shall return memory
at an address rel distance cache sets away from pointer.
Using the latter, one can construct a list, allocating con-
secutive list nodes to consecutive cache sets and thereby
evenly distributing the list in the cache in a predictable
manner. Another possibility would be to allocate all list
nodes in the same cache set. This scheme minimizes cache
damage to other structures as only one cache set is affected,
independently of the size of the list.
This approach breaks with the philosophy of caches being
transparent to the programmer. However, this is unavoid-
able if one wants to obtain static guarantees on the cache
behavior in the presence of malloc. Furthermore, we may
automatically generate the additional parameters. Only the
two schemes – distributing a data structure evenly in the
cache and allocating its elements to the same cache set
– seem reasonable to preserve cache predictability. Once
decided which scheme to employ, we merely need to
associate calls to malloc with data structures. Required
information can be obtained from a shape analysis [11].
III. CACHE-AWARE MEMORY ALLOCATION
A memory allocator manages free and in-use blocks of
memory. Allocators must satisfy two conflicting demands:
they should have fast response times to (de)allocation
requests and minimize fragmentation, i.e. the amount of
free memory not usable to satisfy requests1. A survey on
existing dynamic memory allocators can be found in [13].
For a memory allocator used in real-time systems, the
demand for constant response times arises. We also need
our allocation algorithm to be able to allocate memory
blocks mapped to a given cache set.
In general, allocators strive to minimize fragmentation
by applying some placement choice, i.e. decide where
to allocate new blocks in order to keep fragmentation
low. Splitting techniques to satisfy requests for smaller
blocks and coalescing techniques to serve larger blocks
supplement the set of main techniques utilized by alloca-
tors. We follow Wilson et al. by viewing allocators as a
mechanism that implements a placement policy, motivated
by a strategy for minimizing fragmentation [13]. The over-
all strategy determines acceptable, implementable policies
for placing blocks in memory. These policies are then
implemented by a set of algorithms and data structures,
the mechanism.
We propose the following strategy: Separately manage
regions of free blocks mapped to the same cache set.
Within those memory regions select a suitable free block
1Traditionally, fragmentation is classed as external and internal frag-
mentation. Internal fragmentation is due only to the allocation algorithms
itself. It arises when larger blocks are served than requested. The wasted
memory then occurs internal within the block. External fragmentation is
due to an unability to serve a large contiguous block, although enough
small non-contiguous free blocks are available. External fragmentation
is caused by properties of the allocation algorithm and the sequence of
allocation/deallocation requests.
whose size may be slightly larger than the requested size
if that allows for finding such a block in constant time.
The policies “manage free blocks mapped to the same
cache set in several disjoint size classes” and “always
select the most/least recently freed block from the smallest
size class large enough to satisfy the request” (LIFO/FIFO
good-fit2) meet the proposed strategy. It is reasonable to
believe that the ordering of the free lists has a significant
impact on the overall fragmentation. Whether we settle for
a LIFO or FIFO good-fit policy, resulting in a most recently
freed and least recently freed ordering, respectively, will
be determined by a series of experiments using real-life
programs3. These policies can be implemented as follows.
Logically, we may think of the available memory as a
partition of n disjoint regions where n is the number of
cache sets. Each region is mapped to a distinct cache set.
We can further partition those regions into memory blocks
of distinct size classes. That is, for each cache set, we
obtain a set of size classes consisting of memory blocks
whose sizes are within that size class. The free memory
blocks of a single size class can be managed and organized
in a simple linked list (free list). We can further store
pointers to the heads of all such free lists in a consecutive
memory area, for example an array. This way, we reduce
the problem of finding a suitable free block satisfying an
allocation request to computing the index within that array
where the address of an appropriate free list is stored.
When a suitable free list is found, we simply return the first
element of that list. If a memory block is to be deallocated,
we compute the index of an appropriate free list into which
to reinsert this block. We may then either add the free block
as first or last element of that list, resulting in either a most
or a least recently freed ordering on the list.
We can think of the array storing all addresses of free lists
as a three-dimensional construction as depicted in Figure 2.
The first layer encodes to which cache set the memory
block shall map. In the second layer, neighboring partitions
of memory blocks constitute size classes a power of two
apart. That is, at index i on the second layer, a third layer
2Given a linked list of free memory blocks, a first fit algorithm would
select the first block, starting from the head of the list, that is large enough
to satisfy the allocation request. A best fit would select a smallest free
block of the list large enough to satisfy the request, a worst fit would
select a largest free block (given that this block is large enough to satisfy
the request). A good fit selects the best (i.e. smallest) block chosen from
some subset of fitting blocks. Hence, good fits in general are a tradeoff
between best and first fit. They avoid an exhaustive search of the whole
free list but might not select an optimal block.
3Similar experiments conducted by Weinstock [12] and more recently
by Wilson et al. [14] showed that first fit with an address-ordered
list produces significantly less fragmentation than LIFO-ordered first
fit. Although a FIFO ordering for first fit has not been considered
as thoroughly, there exist results suggesting that FIFO produces less
fragmentation than LIFO; maybe as little as address-ordered first fit [14].
We believe that similar results will be obtained for our FIFO and LIFO
good-fit policy.
1 2 k n − 1 n 1st layer
1 . . . i . . . I 2nd layer
0 . . . j . . . 2L − 1 3rd layer
. . .
︸
︷︷
︸
free list containing all memory blocks mapped to
cache set k whose sizes are in
ˆ
2i + 2i−L, sj+1,i
´
where sj+1,i =

2i+1 − 1 if j = 2L − 1
2i + 2i−L · (j + 1) otherwise
Fig. 2. Logical view on the partitioning of the memory.
containing all free lists for blocks of sizes in [2i, 2i+1−1]
mapped to the same cache set is referenced. Hence, the
second layer constitutes a simple segregated list for size
classes of powers of two. To diminish the fragmentation
that such a single segregated list would cause, we add
a third layer in which size classes increase linearly. If
all layers are organized in a single array by flattening
their hierarchical structure, the corresponding index of the
desired free list within that array can be computed in
constant time by a mapping functionM : N3 7→ N.
A request to deallocate a memory block will determine
an appropriate free list for the given block and append
it either to the head or the tail of this list, depending
on our ordering strategy. While response times will be
exceptionally good, fragmentation might be a problem.
We reduce fragmentation by a preanalysis of the program
code in order to determine a safe approximation of the
sequence of requests, both for allocation and deallocation,
presented to the allocator during program execution. This
information can be used to model the second and third
layer of the allocator such that fragmentation is minimized.
Hence, we first analyze the memory allocation behavior
of the program, adjust the allocator accordingly, and then
estimate the WCET. We will further investigate how well
allocation behavior of real-life programs can be statically
analyzed. This may result in an automatic selection of the
best allocator for a program during compile time.
There are still some questions not answered in detail, most
of them implementation specific. How can requests for
blocks larger than one cache line be efficiently handled,
what is a good initial partitioning into size classes, and
how much of a problem is fragmentation in real-life pro-
grams? We are currently evaluating the following approach
regarding requests for blocks larger than a single cache
line. Requests for large blocks are in general very rare
[13]. We may therefore allocate blocks destined to hold a
large atomic object in a non-cached area of memory with-
out significally increasing execution times. Large records
Sets
Lines
Fig. 3. Shape of a linked-list obtained from a shape analysis and its
mapping to cache sets of a 4-way set-associative cache with 4 cache sets.
Before
After list traversal
After further accesses
most-recently-used
least-recently-used
most-recently-used
least-recently-used
most-recently-used
least-recently-used
Fig. 4. Effect of traversing the linked list of Fig. 3 on static cache
knowledge. Boxes shaded in dark gray indicate information about cache
contents other than about list elements. Boxes shaded in light gray
indicate information about list elements in the cache.
(structs) can usually be split into smaller records that each
fit into a cache line.
IV. WCET ANALYSIS
How does a cache analysis exploit the properties of
our new memory allocator? The main idea is as follows.
Suppose, we have information about the shape of the
dynamically-allocated data structures including the relative
distances of objects in the cache. Such information can
be obtained from a shape analysis. Consider, for example,
a linked list as shown in Figure 3. If all six objects
organized in that list are mapped to cache sets in such
a way that neighboring elements are mapped to sets of
relative distance 1, then traversing the list affects at most⌈
6
n
⌉
cache lines per cache set, where n denotes the number
of cache sets. The number of cache lines per cache set
affected by a traversal of a data structure can be used to
(a) bound the information loss caused by that traversal and
(b) infer hits for a second traversal. Figure 4 depicts this
for our list example. In the example, we assume least-
recently-used (LRU) replacement. Cache lines are sorted
from most- to least-recently-used. Upon a cache miss, the
least-recently-used element is evicted. Dark-gray-shaded
boxes represent knowledge of a must-cache analysis [3].
For instance, a dark-gray-shaded box in the third line of
a set indicates that the analysis “knows” that a certain
memory block is in line 1, 2 or 3. Thus, dark-gray-shaded
boxes represent upper bounds on positions in the LRU-
stack. Traversing the list evicts at most 2 lines from each
cache set. The analysis can thus safely infer that the two
most-recently-used elements of each cache set are still
contained in the cache (Figure 4, center) after list traversal.
The upper bound on the position in the LRU-stack is
increased by two. After other memory accesses, if the list
is traversed again later in the program, it is sometimes also
possible to safely predict cache hits for this traversal.
V. SUMMARY AND CONCLUSIONS
Our work is aimed at developing a static program
analysis for determining WCET bounds for programs
performing dynamic memory allocation. To enable such an
analysis, we propose to replace the used memory allocator
of the program by a predictable, cache-aware allocator and
use this allocator to guide memory allocation with respect
to the cache set mapping. Constant execution times are
achieved by relying on segregated lists which is a common
practice with real-time allocators [1], [5]. We combine a
shape analysis with a WCET analysis to obtain WCET
bounds for the analyzed programs. The shape analysis
is necessary to compute heap shapes that contain infor-
mation about the data structures arising during program
execution. This information relates individual parts of the
data structures to cache sets, allowing for a cache hit/miss
classification of accesses to components of data structures
as well as bounding the loss of information about contents
of cache sets when loading structures into the cache.
REFERENCES
[1] D. F. Bacon, P. Cheng, and V.T. Rajan, “A Real-Time
Garbage Collector with Low Overhead and Consistent Uti-
lization,” SPNOTICES: ACM SIGPLAN Notices, 2003.
[2] T. M. Chilimbi, M. D. Hill, and J. R. Larus, “Making
Pointer-Based Data Structures Cache Conscious,” Computer,
vol 33(12):67–75, 2000.
[3] C. Ferdinand and R. Wilhelm “Efficient and Precise Cache
Behavior Prediction for Real-Time Systems,” Real-Time Sys-
tems, 17(2-3):131–181, 1999.
[4] D. Lea, “A Memory Allocator,” Unix/Mail, 6/96, 1996.
[5] M. Masmano, I. Ripoll, A. Crespo, and J. Real, “TLSF: A
New Dynamic Memory Allocator for Real-Time Systems,”
IEEE Computer Society, ECRTS, 2004.
[6] M. Masmano, I. Ripoll, A. Crespo, J. Real, and A. J.
Wellings, “Implementation of a Constant-Time Dynamic
Storage Allocator,” Software: Practice and Experience, 2008.
[7] T. Ogasawara, “An Algorithm with Constant Execution Time
for Dynamic Storage Allocation,” 2nd Int. Workshop on Real-
Time Computing Systems and Applications, 1995.
[8] M. Langenbach, S. Thesing, and R. Heckmann “Pipeline
Modeling for Timing Analysis,” Proceedings of the Static
Analyses Symposium (SAS), volume 2477, 2002.
[9] J. L. Peterson, T.A. Norman, “Buddy Systems,” Communi-
cations of the ACM, 20(6):421-431, 1977.
[10] M. Rezaei, K. M. Kavi, “Intelligent Memory Manager:
Reducing Cache Pollution Due to Memory Management
Functions,” Journal of Systems Architecture, 2006.
[11] M. Sagiv, T. Reps, and R. Wilhelm, “Parametric Shape
Analysis via 3-valued Logic,” ACM Transactions on Pro-
gramming Languages and Systems, Vol. 24, No. 3, Pages
217–298, May 2002.
[12] C. B. Weinstock, “Dynamic Storage Allocation Tech-
niques,” PhD thesis, 1976.
[13] P. R. Wilson, M. S. Johnstone, M. Neely, and D. Boles, “Dy-
namic Storage Allocation: A Survey and Critical Review,”
International Workshop on Memory Management, 1995.
[14] P. R. Wilson, M. S. Johnstone, M. Neely, and D. Boles,
“Memory Allocation Policies Reconsidered,” technical report,
1995.
On the complexity of optimal priority assignment for periodic
tasks upon identical processors
Liliana Cucu, LORIA-INPL
615 rue du Jardin Botanique
Villers-les-Nancy, France
liliana.cucu@loria.fr
Abstract
In this paper we study global fixed-priority
scheduling of periodic task systems upon iden-
tical multiprocessor platforms. Based on exist-
ing feasibility tests for periodic task systems upon
identical multiprocessor platforms, we show (us-
ing a dummy priority assignment algorithm) that
optimal priority assignment for these systems ex-
ists. Then we provide an algorithm based on RM-
US[m/(3m−2)] that has lower complexity. Finally,
we conjuncture that, contrary to the general opin-
ion, (pseudo-) polynomial optimal priority assign-
ment algorithms for periodic task systems upon
identical processors might exist.
1 Introduction
Real-time systems are generally embedded and
are interacting with the environment. Requests
in real-time environment are often of a recurring
nature. Such systems are typically modeled as
finite collections of simple, highly repetitive tasks.
When the different instances of those tasks are
generated in a very predictable manner, we deal
with periodic tasks. A periodic task τi generates
jobs at each integer multiple of its period Ti with
the restriction that the first job is released at time
Oi (the task offset).
The real-time performances of periodic tasks
on uniprocessor have been extensively studied
since the seminal paper of Liu and Layland [7]
which introduces a model of periodic systems.
The literature considering scheduling algorithms
and feasibility tests for uniprocessor scheduling is
tremendous. In contrast for multiprocessor paral-
lel machines the problem of meeting timing con-
straints is a relatively new research area.
In this work we deal with global scheduling. By
global scheduling, we understand that task mi-
gration is allowed (i.e., different jobs of an individ-
ual task may execute upon different processors)
as well as job migration (an individual job that is
preempted may resume execution upon a pro-
cessor different from the one upon which it had
been executing prior to preemption).
We deal also with identical processors. By
identical processors, we understand that all pro-
cessors have the same computing power for all
tasks.
The scheduling algorithm determines which
job[s] should be executed at each time instant.
When priorities are assigned to the tasks during
the entire life of tasks, we have a fixed-priority
scheduling algorithm. If there is at least one
fixed-priority schedule satisfying all constraints of
the system, then we say that there is at least
a feasible priority assignment. A fixed-priority
scheduling algorithm is optimal if the algorithm
provides a feasible priority assignment, if any.
Related research. The problem of scheduling
periodic task systems on several processors was
originally studied in [6]. Recent studies pro-
vide a better understanding of that scheduling
problem and provide first solutions. E.g., [3]
presents a categorization of real-time multipro-
cessor scheduling problems.
Initial results indicate that real-time multiproces-
sor scheduling problems are typically not solved
by applying straightforward extensions of tech-
niques used for solving similar uniprocessor
problems because of scheduling anomalies [5].
The main fixed-priority algorithm (in the unipro-
cessor case) Rate Monotonic (RM) is no longer
optimal in the multiprocessor case and different
versions of RM were proposed for the multipro-
cessor case [1]. Particular anomalies for fixed-
priority algorithms were also underlined in [2],
e.g., the priority assignment given by Audsley in
the uniprocessor case is no longer optimal in the
multiprocessor case. Moreover, to the best of our
knowledge, the literature does not provide any
optimal priority assignment algorithm for periodic
task systems scheduled using preemption upon
identical processors. This paper is a first step to
fill this gap by using existing feasibility tests for
periodic task systems upon identical processors
[4].
Contribution of this paper In this paper we
study global fixed-priority scheduling of periodic
task systems upon identical multiprocessor plat-
forms. First we propose a dummy algorithm be-
longing to O(n!) that is based on existing feasi-
bility tests for periodic task systems upon identi-
cal multiprocessor platforms. Thus, we show that
optimal priority assignment for these systems ex-
ists. Then we provide an algorithm based on RM-
US[m/(3m − 2)] with lower complexity. Finally, we
conjuncture that, contrary to the general opinion,
(pseudo- )polynomial optimal priority assignment
algorithms might exist.
Organization of the paper The paper is orga-
nized as follows. Section 2 introduces the model
and the notations necessary to the understanding
of the paper. Section 3 provides the main contri-
bution of this paper and we conclude in Section 4.
2 Model and notations [4]
We consider the scheduling of periodic task sys-
tems. A system τ is composed by n periodic
tasks τ1, τ2, . . . , τn, each task is characterized
by a period Ti, a relative deadline Di, an exe-
cution requirement Ci and an offset Oi. Such
a periodic task generates an infinite sequence
of jobs, with the kth job arriving at time-instant
Oi + (k − 1)Ti (k = 1, 2, . . .), having an execution
requirement of Ci units, and a deadline at time-
instant Oi + (k − 1)Ti +Di.
We will distinguish between implicit deadline sys-
tems where Di = Ti,∀i; constrained deadline sys-
tems where Di ≤ Ti,∀i and arbitrary deadline
systems where there is no relation between the
deadlines and the periods.
We consider in this paper a discrete model i.e.,
the characteristics of the tasks and the time are
integers. Moreover, we consider that task paral-
lelism is forbidden: a task cannot be scheduled
at the same instant on different processors.
All scheduling algorithms considered in this pa-
per are deterministic and work-conserving with
the following definitions given below.
Definition 1 (Deterministic algorithm). A
scheduling algorithm is said to be deterministic if
it generates a unique schedule for any given sets
of jobs .
Definition 2 (Work-conserving algorithm). A
work-conserving algorithm is defined to be the
one that never idles a processor while there is
at least one active task.
By default, we consider that all the fixed-priority
schedulers for whom we provide the results in
Section 3 are always deterministic and work-
conserving.
3 Priority assignment
In this section we prove that optimal priority as-
signment algorithm for periodic systems (be they
constrained, implicit or arbitrary deadline task
systems) does exist in the sense that if there
is at least one feasible priority assignment, then
the algorithm will find it. We prove this property
by proposing in Section 3.1 a dummy algorithm
(of n! complexity) which consider all possible se-
quences of priority assignment and test the fea-
sibility of the task system. The feasibility issue is
solved using existing feasibility tests given in [4].
These latter multiprocessor tests have a pseudo-
polynomial complexity and they do not do worse
than uniprocessor tests.
Finally in Section 3.2, we improve the complex-
ity of the dummy algorithm by using a branch &
bound algorithm. This algorithm is based on al-
gorithm RM-US[m/(3m − 2)] given in [1]. More-
over, we discuss the fact that worst-case be-
haviour of this algorithm is probably a rare event
and one can use large deviations approaches to
prove its complexity.
3.1 Optimal priority assignment
We consider a task system τ = {τ1, τ2, · · · , τn} of
n periodic tasks with τi = (Oi,Ci,Ti,Di). Task sys-
tem τ can be an implicit, constrained or arbitrary
deadline task system.
We define a working variable W ∈ {1, 2, · · · ,n}n
such that the i’th element of W is equal to j ∈
{1, 2, · · · ,n} if and only if task τi has priority j.
We consider that all tasks have different priori-
ties, thus the i1’th and the i2’th elements of W
are different if i1 , i2. For instance for a task
system τ = {τ1, τ3, τ2} ordered from the highest
priority task to the lowest priority task, we have
W = (1, 3, 2).
Algorithm 1 Optimal priority assignment algorithm
for periodic task upon identical parallel machines
Require: Task system τ and m identical processors
Ensure: Priority assignment if it exists
1: W := (1, 2, · · · ,n);
2: ntestedcon f ig := 1;
3: varBoolean := f alse;
4: while n , n! or varBoolean , true do
5: if Feasibility Test returns true then
6: varBoolean := true;
7: else
8: ntestedcon f ig := ntestedcon f ig + 1;
9: increaseW;
10: end if
11: end while
12: if varBoolean , f alse then
13: There is no feasible priority assignment;
14: else
15: W is a feasible priority assignment;
16: end if
In Algorithm 1, line 5, we use the feasibility test
given in [4].
Theorem 1. If a periodic task system τ is feasi-
ble under fixed-priority scheduling on m identical
processors, then Algorithm 1 will find a feasible
priority assignment.
Proof. For any periodic task system there are n!
possible sequences of priority assignment. Given
the condition n , n! imposed in Algorithm 1, line 4
the algorithm tests all possible sequences unless
it finds a feasible priority assignment. Thus, Al-
gorithm 1 stops either if it finds a feasible priority
assignment, or if it has visited all possible priority
assignments and none of them is feasible. Given
these two cases, we can conclude that Algorithm
1 will always find a feasible priority assignment, if
it exists. 
Corollary 2. For any periodic system τ that is
feasible under fixed-priority scheduling onm iden-
tical processors, a feasible priority assignment
can be found in O(n!S), where S is the complex-
ity of the feasibility test of a periodic task system
under fixed-priority scheduling.
Proof. The proof is obtained from the fact that
Theorem 1 proves that Algorithm 1 is an optimal
priority assignment algorithm that needs at most
n! steps to decide. 
3.2 Another priority assignment algo-
rithm for implicit deadline tasks
In this section we improve the optimal priority as-
signment algorithm by proposing an algorithm of
lower complexity, that stands only in the case of
implicit deadline tasks.
The main idea of this algorithm comes from the
observation given in [2] that "even if we could use
schedulability tests that are necessary and suffi-
cient, it is no longer possible to find an optimal
priority assignment by using the test for lowest
priority viability approach". This observation is
based on the fact that exchanging the priorities
between higher priority tasks can turn a schedu-
lable system into an unschedulable one. Thus
in the multiprocessor case, we cannot test a fea-
sibility assignment starting from the lowest pri-
ority tasks to the highest ones. Therefore one
maybe should do it from the highest priority tasks
to the lowest ones. Algorithm 2 exploits this idea
by starting to assign first higher priorities. Since
the schedulability of higher priority tasks is not af-
fected by lower priority tasks, we can test at each
new step i the feasibility of the i tasks to whom
priorities have been already assigned. Moreover,
we exploit the feasibility result obtained for RM-
US[m/(3m − 2)] for giving the highest priorities.
Require: Task system τ and m identical processors
Algorithm 2 Another priority assignment
Ensure: Priority assignment for task if it exists
1: Choose a subset τ0 such that m0 =
minm0=1,··· ,m{U(τ0) ≤ m23m−2 } ;
2: assign priorities for tasks belonging to τ0 accord-
ing to RM-US[m/(3m − 2)];
3: n0 :=n − card(τ0);
4: i0 :=n0;
5: while n0 , 0 do
6: assign priority n0 to task ;
7: if FeasibilityTest returns true then
8: n0 := n0 − 1;
9: i0 :=i0 + 1;
10: else
11: i0 :=i0 + 1;
12: end if
13: end while
Theorem 3. Algorithm 2 is an optimal priority as-
signment for periodic tasks onm identical proces-
sors.
Proof. Algorithm 2 is obviously optimal. 
Discussion on worst-case behaviour : We
conjecture that Algorithm 2 behaves well in av-
erage situations and that the worst-case situa-
tions are rare events. To conclude on the aver-
age complexity of Algorithm 2 we need to use
rare events theory since it is difficult to find rep-
resentative task systems (large enough and ran-
dom enough). If we can say how much worst-
case complexity deviates from average complex-
ity, this indicates that the following conjecture is
true:
Conjecture 4. There is an optimal priority as-
signment algorithm that has pseudo-polynomial
complexity.
4 Conclusions and future works
In this paper we prove that optimal priority assign-
ment algorithm for periodic tasks upon identical
processor does exist. The proposed dummy al-
gorithm has O(n!) complexity. We improve this
complexity by giving a second algorithm. This
second algorithm is based on RM-US[m/(3m−2)]
algorithm. A first possible extension concerns the
harmonic task systems and it can be obtained
by replacing the latter algorithm with the RM-
USm/(2m − 1) algorithm.
In order to conclude on the complexity of Algo-
rithm 2, we are currently working on the evalua-
tion of its performances on large set of tasks. We
intend then to apply rare event theory to state on
its average complexity. If we can apply this the-
ory, we will obtain the proof that (pseudo-) poly-
nomial optimal priority assignment algorithm for
periodic tasks upon identical processors exists.
References
[1] A, B., B, S.,  J, J.
Static-priority scheduling on multiprocessors.
Proceedings of the 22nd IEEE Real-Time
Systems Symposium (2001), 193–202.
[2] A, B.,  J, J. Some insights
on fixed-priority preemptive non-partitioned
multiprocessor scheduling. Proceedings of
the WIP session of IEEE Real-Time Systems
Symposium (RTSS’00) (2000), 53 – 56.
[3] C, J., F, S., H, P., S,
A., A, J.,  B, S. A catego-
rization of real-time multiprocessor schedul-
ing problems and algorithms. Handbook of
Scheduling (2005).
[4] C, L.,  G, J. Feasibility intervals
for multiprocessor fixed-priority scheduling of
arbitrary deadline periodic systems. Proceed-
ings of the 10th Design, Automation and Test
in Europe (DATE’07) (2007).
[5] D, S.,  L, C. On a real-time schedul-
ing problem. Operations Research(26)
(1978), 127–140.
[6] L, C. Scheduling algorithms for multiproces-
sors in a hard real-time environment. JPL
Space Programs Summary 37-60(II) (1969),
28–31.
[7] L, C.,  L, J. Scheduling algorithms
for multiprogramming in a hard-real-time en-
vironment. Journal of the ACM 20, 1 (1973),
46–61.
A Unified HW/SW Operating System for Partially Runtime Reconfigurable FPGA
based Computer Systems ∗
Qingxu Deng1, Yi Zhang1, Nan Guan1 and Zonghua Gu2
1 Northeastern University, Shenyang, China
2 Hong Kong University of Science and Technology, Hong Kong, China
Abstract
Partially Runtime-Reconfigurable (PRTR) FPGAs allow
hardware tasks to be placed and removed dynamically at run-
time. We present an OS for hybrid computing systems consist-
ing of both CPUs and PRTR FPGAs. The OS is based on Linux,
and provides unified interfaces for both HW and SW processes
to ease the design of such hybrid systems. The scheduler of HW
processes is implemented on the hardware, to alleviate the per-
formance penalty of the time-consuming HW task scheduling
algorithms.
1 Introduction
A Partially Runtime-Reconfigurable (PRTR) FPGA (re-
ferred to as FPGA for short), such as the Virtex family FP-
GAs from Xilinx [1], is composed of a rectangular grid of Con-
figurable Logic Blocks (CLBs) and the interconnects between
them. A FPGA allows part of the area to be reconfigured while
the remainder continues to operate without interruption and is
regarded as a 2D continuous processing area that can hold a lot
of HW tasks. In other words, HW tasks can be allocated and
deallocated dynamically at runtime.
Traditionally, designing HW/SW hybrid systems is a very
tough work. The HW and SW part were developed separately,
and later pieced together. Since standard interfaces and services
have not yet been established [9] [18], designers are forced to
literally build systems from scratch. A unified interface for both
HW and SW can provide clean separation of the system design
and implementation, which is the base of system design stan-
dardization.
Furthermore, a HW/SW interface that is familiar and easy to
understand will greatly facilitate the transition from past super-
computers or computer clusters based systems into HW/SW hy-
brid platforms [11].
∗This work is partially supported by the National High Technology Re-
search and Development Program of China (863 Program) under Grant No.
2007AA01Z181 and the National Natural Science Foundation of China under
Grant No. 60773220
In this paper we present our on-going work on a Linux-based
OS for hybrid systems consisting of both CPUs and FPGAs.
The OS provides unified interfaces for both HW processes and
SW processes. Since HW scheduling algorithms are usually
very time-consuming, we migrate this part of work of OS into
hardware in order to reduce the runtime overhead and improve
real-time performance of the system.
The remained part of the article is organized as follows. We
introduce the related work in Section 2, and then present our OS
prototype design in Section 3. Finally future work is discussed
in Section 4.
2 Related Work
The concept of OS for reconfigurable computing systems
was firstly proposed by Brebner et al. [3]. Wigley et al. [17]
discussed several issues in OS for reconfigurable computing
systems, including HW tasks downloading, FPGA area man-
agement, HW tasks scheduling, storage management and pro-
tection, I/O communications, HW tasks communication and
fragmentation metrics.
Walder and Platzner described a OS prototype for reconfig-
urable computing systems in [15] [12]. They discussed the on-
line scheduling of HW tasks and implemented HW task sched-
uler and placer in their OS prototype. Their work shows a good
paradigm of the runtime system for HW multitasking. How-
ever, they didn’t consider the unified management of HW tasks
and SW tasks.
Rissa and Niittylahti [10] introduced a HW/SW hybrid sys-
tem, where FPGAs on a PCI-broad are connected to general
computers via the PCI bus. Wiangtong presented the Ultra-
SONIC system based on a similar architecture in [16], where
they mainly focused on the system HW/SW co-design, and in-
troduced a HW/SW co-design environment DAG.
Kwok-Hay et al. [11] introduced a Linux-based system
BORPH, which provides unified interfaces for both SW and
HW at the OS kernel level. BORPH provides HW processes
with Unix-standard access interfaces and BOF (a ELF based
file format) in order to unify the operations of both SW and
HW tasks. BORPH is implemented on the BEE2 module hard-
ware platform. Every BEE2 module contains five FPGAs (one
P P C
IP
C o r e
IP  C o r e
IP
C o r e
I C A P
R e c o n fig u r a b le  F a b r ic
Figure 1. Hardware Platform
 
PPC
PLB-OPB
Brige
PLB
OPB
DDR
RAM
IP Core
IP Core
I/O Device
IP Core
BRAM
ICAP
BRAM
Figure 2. Hardware Platform
control FPGA and four User FPGAs). They defined a bus ar-
chitecture for inter-FPGA communication. The BORPH sys-
tem didn’t consider the partially reconfiguration and HW tasks
scheduling problem.
Agron et al. [2] proposed a CPU/FPGA hybrid system
Hthread on CSoC (Configurable SoC), where some run-time
system components like Thread Manager, Scheduler, Mutex
Manager, and a new CPU Bypass Interrupt Scheduler (CBIS)
are migrated into the reconfigurable fabric on an FPGA.Migrat-
ing these services into hardware brings significant performance
benefits to software threads through more efficient invocation
and processing mechanisms as well as helps in eliminating the
hidden overhead of context switch times associated with enter-
ing and exiting the RTOS. The Hthread system is not based on
Linux, but built around their own-developed APIs that are com-
patible with the POSIX thread standard.
3 System Design
3.1 Overview
The system is implemented on a Virtex-II Pro XC2VP30
FPGA, which contains a PowerPC405 hardcore and partially
runtime reconfigurable fabrics, as shown in Fig. 2. Multiple
HW tasks can simultaneously execute on the reconfigurable
fabrics, be allocated and deallocated dynamically at runtime
without interrupting other HW tasks. The off-chip DDR mem-
D e d ic a te d
P r o g r a m
IP C o r e
S W
P r o c e s s
H W  P r o c e s s
R e c o n fig u r a b le
F a b ir c
S W
P r o c e s s
D e d ic a te d
P r o g r a m
IP C o r e
H W  P r o c e s s
C P U
In te r fa c e
R e g e s ite r
In te r fa c e
R e g e s ite r
Figure 3. Hardware Platform
ory is used as the system main memory, and on-chip BRAMs
(Block RAM) are used by IPCores as their own storage re-
source. IPCores are connected to the PLB (Processor Local
Bus) or OPB (On-chip Peripheral Bus), depending on their
communication bandwidth demand. The PowerPC hardcore ac-
cesses ICAP (Internal Configuration Access Port) via the high-
speed PLB to configure the reconfigurable fabrics.
Our goal is to develop an OS in order to:
• Provide a unified process model for both SW tasks and
HW tasks;
• Enable on-line placement and scheduling of HW processes
at OS level.
We choose Linux 2.6 as the foundation of our OS prototype.
In contrast to Linux 2.4, Linux 2.6 supports preemptions in the
kernel mode, which benefits the on-line management of HW
processes (will be discussed in Section 3.3).
3.2 The HW Process Model
Both the HW tasks and SW tasks are implemented as pro-
cesses in our system. A HW process consists of two parts: the
IPCore (hardware part) and the Dedicated Program (software
part), as shown in Fig. 3. IPCores take charge of the computa-
tion work, while the Dedicated Program encapsulates the com-
munication operations between the IPCore and the system. The
Dedicated Programs are instantiated from the same template,
and access IPCores by the same device driver, so HW process
designers do not need to write any software program, but only
need to implement their IPCore conforming to the pre-defined
interface standard.
3.2.1 Communication
The Dedicated Program accesses the IPCore via some specific
registers in the IPCore , which is named as Interface Register.
The Interface Register consists of two parts: (1) State Regis-
ters, which show the current state of the IPCore and (2) Data
Registers, which store the communication data.
The passive communication of the HW process is quite sim-
ple. When some process P wants to send data to a HW process
H , the procedure is:
1. P sends data to H’s Dedicated Program;
2. H’s Dedicated Program write data to the Data Registers of
the IPCore.
When some process P wants to get data from the HW process
H , the procedure is:
1. P sends a message to H’s Dedicated Program to denote
which data are required;
2. H’s Dedicated Program reads data from the assigned Data
Registers of the IPCore.
3. H’s Dedicated Program sends data to P .
The active communication of the HW process is a little more
complicated. To enable IPCores to initiate communications, we
bind a unique interrupt source to each HW process. When the
HW processH wants to send data to some other process P , the
procedure is:
1. H’s IPCore updates the State Registers.
2. H’s IPCore generates an interrupt requirement.
3. H’s Dedicated Program answers this interrupt: looks up
the State Registers and get data from the IPCore.
4. H’s Dedicated Program send these data to P .
When the HW processH wants to get data from some other
process P , the procedure is:
1. H’s IPCore updates the State Registers.
2. H’s IPCore generates an interrupt requirement.
3. H’s Dedicated Program answers this interrupt: look up the
State Registers and get data from P .
4. H’s Dedicated Program sends Data to the IPCore.
The Interfaces Registers are mapped to the system address,
and Dedicated Programs access the Interfaces Registers by di-
rect reading/writing operation to the corresponding address.
The address range of each IPCore is 4K, which equals to the
size of a page. This is for future extension of mapping IPCore’s
internal BRAM to the system address, in order to facilitate the
data-stream style communication.
Since all HW processes share the same Dedicated Program
template, the communication operation supported by the Dedi-
cated Program should be simple and application-independent.
Currently only three simple communication mechanisms are
supported for HW processes: (1) pipeline, (2) signal and (3)
message.
H E L F  H e a d e r
E L F  F ile  S e c tio n
H a rd w a re  S e c tio n
IP C o re  In fo .
In te rru p t N o .
e tc .
E L F  file  o f th e
D e d ic a te d
P ro g ra m
B its tre a m  o f
th e  IP C o re
Figure 4. Hardware Platform
3.2.2 File Format
We design a new file format HELF for HW processes by extend-
ing the ELF file format. A HELF file consists of three parts, as
shown in Fig. 4:
• HELF Header: HW process’s basic information, like the
width/height of the IPCore, the interrupt source no. etc.
• ELF Section: The ELF file of the Dedicated Program.
• Hardware Section: The bitstream of the IPCore.
Since all HW processes’ Dedicated Programs are exactly the
same, the ELF section could be compiled into a executable code
in prior, and directly linked into each HW process’s HELF file.
3.3 On-line Scheduling of HW Process
The on-line scheduling1 of HW tasks on PRTR FPGA is
much more complicated than SW scheduling. SW tasks only
share computing resources in the time dimension, while HW
tasks share computing resources in not only the time but also
the space dimension. The on-line HW task scheduling algo-
rithms are usually very time-consuming.
If the HW task scheduling algorithm is implemented in the
software, its execution time could be quite long and it will heav-
ily degrade the real-time performance of the system. So we
implemented the HW task scheduling algorithm on hardware.
We add functions to the original ”exec()”, to recognize
HELF files and extract the information of the HW process from
the HELF Header, e.g., its width, height, WCET, deadline etc.
These information are sent to the hardware-implemented sched-
uler as the input of the scheduling algorithms.
We have two choices to in ”exec()” after sending information
to the hardware scheduler:
1. ”exec()” yields immediately and the OS scheduler starts to
execute and selects other processes to execute. When the
scheduler produces the result, it sends interrupt signals to
”exec()” to finish the scheduling operations.
1Including task placement.
Table 1. The complexity of on-line HW task scheduling algorithms in literatures.
Author Literatures Free Area Management Method Complexity
Handa et al. [8] [7] Maximal Empty Rectangles O(N2 ∗W ∗H)
Cui et al. [4] [5] Maximal Empty Rectangles O(N2 ∗W ∗H)
Tabreo et al. [13] [14] Virtex List O(N2)
Deng et al. [6] Reject Region O(N ∗ (W +H))
2. ”exec()” does not yield execution, but waits for the result
of the hardware-scheduler, and then continues to execute.
Since Linux 2.6 is preemptable in kernel mode, ”exec()”
can be preempted if there is other more ungent processes.
Due to the strong computation power of hardware, the ex-
ecution of the scheduling algorithm would be very fast, so in
most case the scheduling decision will be obtained immedi-
ately. So we choose the second method in our system, which
is much easier to implement.
4 Conclusion and Future Work
In this paper, we have reported the current progress of the
project on a Unified HW/SW Operating System for Partially
Runtime Reconfigurable FPGA based Computer Systems. In
the next step, we will provide multiple templates for Dedicated
Programs in order to support more complicated communica-
tion mechanisms for HW processes, like Semaphore and Mu-
tex. We also plan to design the HW process interface for data-
steam style applications by mapping IPCore’s internal BRAM
to the system address. Experiments with real applications will
be conducted to evaluate the performance of our system.
References
[1] Xilinx website. In Available: http://www.xilinx.com.
[2] Jason Agron, Wesley Peck, Erik Anderson, David An-
drews, Ed Komp, Ron Sass, Fabrice Baijot, and Jim
Stevens. Run-time services for hybrid cpu/fpga systems
on chip. In RTSS, 2006.
[3] G. Brebner. A virtual hardware operating system for
the xilinx xc6200. In The 6th International Workshop
on Field-Programmable Logic and Applications (FPL),
1996.
[4] J. Cui, Q. Deng, X. He, and Z. Gu. An efficient algorithm
for online management of 2d area of partially reconfig-
urable fpgas. In DATE, 2007.
[5] J. Cui, Q. Deng, X. He, and Z. Gu. An efficient algorithm
for online soft real-time task placement on reconfigurable
hardware devices. In ISORC, pages pp. 321 – 328, 2007.
[6] Qingxu Deng, Fanxin Kong, Nan Guan, and YiWang. On-
line placement of real-time tasks on 2d partially run-time
reconfigurable fpgas. In Technical Report, Northeastern
University, China, 2008.
[7] M. Handa and R. Vemuri. Area fragmentation in recon-
figurable operating systems. In ERSA, pages pp. 77–83,
2004.
[8] M. Handa and R. Vemuri. An efficient algorithm for find-
ing empty space for online fpga placement. InDAC, pages
pp. 960–965, 2004.
[9] A. A. Jerraya and W. Wolf. Hardware/software interface
co-design for embedded systems. 2005.
[10] T Rissa and J Niittylahti. A hybrid prototyping platform
for dynamically reconfigurable designs. In The interna-
tional conference on Field-Programmable Logic and its
Applications(FPL), 2000.
[11] Hayden Kwok-Hay So, Artem Tkachenko, and Robert
Brodersen. A unified hardware/software runtime envi-
ronment for fpga-based reconfigurable computers using
borph. In CODES, 2006.
[12] C Steiger, H Walder, and Platzner M. Operating systems
for reconfigurable embedded platforms online scheduling
of real-time tasks. In IEEE Transaction on Computers,
pages Vol. 53, NO. 11, 1393–1407, 2004.
[13] J. Tabero, J. Septien, H. Mecha, and D. Mozos. A low
fragmentation heuristic for task placement in 2d rtr hw
management. In FPL, pages pp. 241–250, 2004.
[14] J. Tabero, J. Septien, H. Mecha, and D. Mozos. Task
placement heuristic based on 3d-adjacency and look-
ahead in reconfigurable systems. In ASPDAC, pages pp.
396–401, 2006.
[15] H. Walder and M. Platzner. Reconfigurable hardware op-
erating systems: From design concepts to realizations. In
ERSA, 2003.
[16] T Wiangtong, Y.K. P. Cheung, and W. Luk. A unified
codesign run-time environment for the ultrasonic recon-
figurable compute. In FPL, 2003.
[17] Grant Wigley and David Kearney. Research issues in op-
erating systems for reconfigurable computing. In ERSAw,
2002.
[18] T.-Y. Yen and W. Wolf. Communication synthesis for dis-
tributed embedded systems. 1995.
Energy-Aware Task Partitioning and Processing Unit Allocation for Periodic
Real-Time Tasks on Systems with Heterogeneous Processing Units
Jian-Jia Chen, Andreas Schranzhofer, and Lothar Thiele
Computer Engineering and Networks Laboratory (TIK)
Swiss Federal Institute of Technology (ETH) Zurich, Switzerland
Email: {jchen, schranzhofer, thiele}@tik.ee.ethz.ch
Abstract
Adopting multiple processing units to enhance the com-
puting capability or reduce the power consumption has been
widely accepted for designing embedded systems. Such con-
figurations impose challenges on energy efficiency in hard-
ware and software implementations. This work targets
energy-efficient task partitioning and processing unit alloca-
tion for periodic real-time tasks on a platform with a library
of applicable processing unit types. Each processing unit type
has its own power consumption characteristics for maintain-
ing its activeness and executing jobs. We show that there does
not exist any polynomial-time approximation algorithm with a
constant approximation factor unless P = NP . The heuris-
tic algorithms proposed in this work first decide how to assign
tasks onto processing units to minimize the energy consump-
tion, and then allocate processing units to fit the requested
demands. Experimental results show that the proposed algo-
rithms are effective for the minimization of the overall energy
consumption.
1 Introduction
In the past decade, energy-efficient and low-power designs
have become important issues in a wide range of computer
systems. The pursuit of energy efficiency could be not only
useful for mobile devices for the improvement on operating
duration but also helpful for server systems for the reduction
of power bills. Dynamic power consumption due to switch-
ing activities and static power consumption due to the leakage
current are two major sources of power consumption of a pro-
cessing unit.
As multiprocessor system-on-chips (MPSoCs) platforms
composed of multiple heterogeneous processors (processing
units) have been widely adopted, a designer can take advan-
tage of the particular processing units’ properties to increase
the flexibility of the system. The hardware platform may not
need to allow for all processing units to execute in parallel.
This introduces interesting new options in the embedded sys-
tems design process. For example, a field-programmable gate
array (FPGA) might be adopted to provide flexibility to ex-
ecute tasks/jobs in hardware for acceleration. Some helper
devices might also be used to reduce the workload on the pro-
cessor for the enhancement of special functions, such as dis-
crete cosine transform (DCT) or fast Fourier transform (FFT)
functions.
Moreover, due to the dramatic increase on power density,
multiprocessor platforms or platforms with co-processing
units have become more and more popular in architecture de-
signs. For example, chip makers, such as Intel and AMD, are
releasing multi-core chips to improve the system performance
instead of increasing operating frequencies. Such configura-
tions in embedded systems have triggered the researches in
hardware/software co-designs to improve the system perfor-
mance with energy-efficient considerations
Power-aware and energy-efficient scheduling for multipro-
cessor systems has been widely explored in recent years in
both academics and industry, especially for real-time systems.
However, only few results have been developed for energy-
efficient considerations for systems with heterogeneous pro-
cessing units (processors). By considering the minimization
of dynamic energy consumption for dynamic voltage scaling
(DVS) systems with negligible leakage power consumption,
heuristic algorithms and approximation algorithms are pro-
posed, e.g., [2–4, 6, 8]. Unfortunately, in nano-meter manu-
facturing, leakage current contributes significant power con-
sumption to the system, while the static power consumption
is comparable to the dynamic power dissipation [5]. By ap-
plying a dormant (sleep) mode and DVS to reduce the energy
consumption in homogeneous multiprocessor systems, Xu et
al. [7] and Chen et al. [1] propose polynomial-time algorithms
to derive task mappings to try to execute at a critical execution
frequency.
We explore energy-efficient task partitioning and process-
ing unit allocation for periodic real-time tasks based on a
given library of processing units. By considering both static
(leakage) power consumption and dynamic power consump-
tion, the objective of this research is to minimize the overall
energy consumption of the system. To simplify the presen-
tation, we only present the results for processing units that
have only one mode for execution. The extension can be
easily achieved for processing units that have multiple exe-
cution modes, e.g., DVS systems with discrete supply volt-
ages/speeds. The studied problem is shown to be a difficult
problem so that there does not exist any polynomial-time ap-
proximation algorithm with a constant approximation factor
unless P = NP . We model the problem as an integer linear
programming problem, and, based on the relaxation of the
integral constraints, we provide polynomial-time algorithms
to decide the assignment of tasks onto processing unit types.
After the assignment of tasks onto different processing types
is derived, we allocate a proper number of processing units
of each processing unit type to fit the processing requirement.
Experimental results show the effectiveness of the proposed
algorithms.
2 System Models
This section presents the problem definition, models of
processing unit (abbreviated as PU) types in power consump-
tion and execution, and the task model.
Models of Processing Units The power consumption func-
tion P on a PU type has two parts Pd and Ps, where Pd is
the dynamic power consumption dissipated for task/job ex-
ecution and Ps is the static power consumption to maintain
the activeness of the PU. For notational brevity, for a PU type
Mj , the static (resp. dynamic) power consumption is denoted
by Ps,j (resp. Pd,j). The results in this work can be easily
extended to systems with multiple dynamic power consump-
tion modes in a PU type, i.e., by applying dynamic voltage
scaling (DVS). Due to space limitation, we will only present
the results for systems without DVS capability.
The set of m available PU types is denoted by M. When
a PU of type Mj executes a job with power characteristics h,
the power consumption is assumed to be Ps,j + hPd,j . On
the other hand, when the PU is idle without executing any
jobs/tasks, the power consumption is Ps,j . In other words, if
we allocate a PU for executing tasks, when the PU is idle, we
cannot turn the PU off, and, hence, have to consume the static
power consumption to maintain the activeness of the PU.
Task Model This work explores the scheduling of periodic
real-time tasks that are independent in execution. A periodic
task τi is an infinite number of jobs. A task is characterized
by its period pi. The relative deadline of task τi is equal to its
period. The execution time on a PU is measured in worst-case
execution time. The worst-case execution time of task τi on
PU type Mj is ci,j . Let T be the input task set of n periodic
real-time tasks. To execute a task instance of task τi on PU
type Mj , the energy consumption is (hi,jPd,j + Ps,j)ci,j , in
which hi,j is the power characteristics of task τi on PU type
Mj . Note that, if ci,j > pi, then ci,j is set to ∞, since it is not
possible to complete the task on PU type Mj in time.
The earliest-deadline-first (EDF) policy is an optimal
uniprocessor scheduling policy for independent real-time
tasks. A set of tasks is schedulable by EDF if and only if
the total utilization of the set of tasks is no more than 100%,
where the utilization of a task is defined as its execution time
divided by its period. For the rest of this work, we focus on
systems that apply EDF scheduling on a PU.
The hyper-period of T, denoted byL, is the minimum pos-
itive number L so that L/pi is an integer for every task τi
in T. For example, L is the least common multiple (LCM)
of the periods of tasks in T when the periods of tasks are
all integers. This work focuses on the minimization of the
overall energy consumption in the hyper-period L. Another
equivalent measurement is on the average power consump-
tion. Suppose that Tj is the set of tasks assigned on a PU
of type Mj . The average power consumption of the PU is
(
∑
τi∈T
ci,jhi
pi
)Pd,j + Ps,j . As a result, the average power
consumption times the hyper-period (if exists) is the energy
consumption of the PU. For the rest of this work, we assume
that the hyper-period of T exists, while the results still hold
for the minimization of the average power consumption when
the hyper-period of T does not exist.
Problem Definition This work explores the Minimization
of Energy consumption of periodic real-time tasks on plat-
forms with HEterogeneous PUs (abbreviated as the MEHEPU
problem). The objective is to partition the input task set T of
n tasks into several disjoint subsets such that the energy con-
sumption in the hyper-period is minimized, in which all the
tasks in a partition of tasks are executed on an allocated pro-
cessing unit in M of m types without violating their timing
constraints.
3 Hardness Analysis
It is not difficult to see that the MEHEPU problem is NP-
hard in a strong sense even when there is only one PU type,
since it is a special case of the bin packing problem. Fur-
thermore, if we are restricted to allocate at most ℓj PUs of
type Mj , deriving a feasible task partition and PU allocation
is NP-complete. In other words, when ℓj 6= ∞ for all PU
types Mjs, unless P = NP , there is no polynomial-time al-
gorithm for deriving a feasible solution. An algorithm is said
to have an approximation factor β if the objective function of
its solution is at most β times of the optimal objective solu-
tion for any input instance. The following theorem shows the
hardness for deriving approximation algorithms.
Theorem 1 Unless P = NP , there does not exist any
polynomial-time approximation algorithm with a constant ap-
proximation factor for the MEHEPU problem.
Proof. This theorem is proved by an L-reduction from the
set cover problem, which does not admit any polynomial-time
approximation algorithm with a constant approximation fac-
tor unless NP = P. Given a universe E = {e1, e2, . . . , en}
of n elements, a collection S = {S1,S2, . . . ,Sm} of sub-
collections of E, and the cost Ci > 0 for each sub-collection
Si, the set cover problem is to choose a minimum-cost sub-
collection of S that covers all elements of E.
The L-reduction is done as follows: For each sub-
collection Sj , we create a PU type Mj with static power con-
sumption Ps,j = Cj . The dynamic power consumption Pd,j
on each PU pj is a constant O. For each element ei in E, we
create a task τi with a constant period p. If ei is in Sj , let ci,j
be p/n; otherwise, ci,j is set as ∞.
For an optimal solution of the set cover problem with cost
C∗, there is a feasible solution of the reduced input instance
of the MEHEPU problem with p(C∗ + O) energy consump-
tion in the hyper-period p. For the input instance of the
MEHEPU problem with p(C∗ + O) energy consumption in
the hyper-period, there exists a solution for the set cover prob-
lem with cost C∗. As a result, if there is a polynomial-time
β-approximation algorithm for the MEHEPU problem, the set
cover problem admits a polynomial-time β-approximation al-
gorithm. The contradiction is reached.
4 Proposed Algorithms
Suppose that the number of allocated units of PU type Mj
is Kj . For each task τi in T, a binary variable zi,j,k is set
as 1 if τi is assigned to execute on the k-th allocated unit of
typeMj; otherwise, zi,j,k = 0. The set Tj,k of tasks assigned
onto the k-th allocated unit of typeMj is schedulable by EDF
if
∑
τi∈Tj,k
ci,j
pi
≤ 1. The MEHEPU problem is formulated
as an integer linear programming problem as follows:
minimize L(
P
Mj∈M
Kj · Ps,j+
P
Mj∈M
P
τi∈T
Pn
k=1
hi,jci,j
pi
· zi,j,kPd,j)
subject to P
Mj∈M
PKj
k=1 zi,j,k = 1 ,∀τi ∈ T,P
Mj∈M
Pn
k=Kj+1
zi,j,k = 0 ,∀τi ∈ T,
P
τi∈T
ci,j
pi
· zi,j,k ≤ 1 ,∀Mj ∈ M, k = 1 . . .Kj ,
zi,j,k ∈ {0, 1} ,∀τi ∈ T, ∀Mj ∈ M, k = 1 . . .Kj ,
Kj ∈ {0, 1, 2, . . . , n} ,
(1)
where the first and second constraints require that each task
τi must execute on one allocated unit only, and the third con-
straint means that the total utilization of the tasks executing
on one allocated PU must be no more than one (because of
EDF scheduling).
However, deriving an optimal solution of Equation (1) is
still a NP-hard problem. We have to relax the constraints
in Equation (1). The first relaxation will be on the objective
function to reduce the number of variables required in the pro-
gramming. For each task τi in T , a binary variable yi,j is set
as 1 if τi is assigned to execute on a unit of type Mj ; other-
wise, yi,j = 0. As a result, the number
⌈∑
τi∈T
ci,j
pi
· yi,j
⌉
is the lower bound of the required units of type Mj . Equa-
tion (1) could be relaxed into the following integer linear pro-
gramming problem:
minimize L(
∑
Mj∈M
⌈∑
τi∈T
ci,j
pi
· yi,j
⌉
Ps,j+∑
Mj∈M
∑
τi∈T
hi,jci,j
pi
· yi,jPd,j)
subject to ∑Mj∈M yi,j = 1 , ∀τi ∈ T, and
yi,j ∈ {0, 1} , ∀τi ∈ T, ∀Mj ∈ M.
(2)
For any feasible solution of Equation (2), each task is as-
signed to exactly one PU type. Let task set Tj be the set of
the tasks in T assigned on type Mj for a solution of Equa-
tion (2), i.e., Tj = {τi ∈ T | yi,j = 1}. To allocate the units
of type Mj to execute tasks in Tj , we apply the algorithms
for the traditional bin packing problem such as the first-fit,
last-fit, worst-fit, and best-fit strategies.
Unfortunately, deriving an optimal solution for Equa-
tion (2) is still NP-hard. We relax the integral constraint of
Equation (2) as well as the ceiling function of the objective
function of Equation (2) as follows:
minimize L(
∑
Mj∈M
∑
τi∈T
ci,j
pi
· yi,jPs,j+∑
Mj∈M
∑
τi∈T
hi,jci,j
pi
· yi,jPd,j)
subject to ∑Mj∈M yi,j = 1 , ∀τi ∈ T, and
yi,j ≥ 0, ∀τi ∈ T, ∀Mj ∈ M.
(3)
By applying the extreme point theory, it is not difficult to see
that there exists an optimal solution for Equation (3), which
maps task τi in task set T to the unit with the smallest energy
consumption (hi,jPd,j + Ps,j)Lci,jpi in the hyper-period.
Deriving an optimal solution for Equation (3) can be done
in O(nm). Applying first-fit, last-fit, best-fit, or worst-fit of
to allocate PUs for one PU type can be done in O(n2). As a
result, the algorithm is with O(mn2) time complexity.
5 Experiments
Setups This section provides evaluations for the proposed
algorithms. The period of task τi is a random variable in
[1, 100]ms. The execution time ci,j of jobs of task τi on
PU type Mj is a random variable uniformly distributed in the
 1
 1.05
 1.1
 1.15
 1.2
 1.25
 1.3
 1.35
 1.4
 1.45
 2  3  4  5  6  7  8  9  10
n
o
rm
a
liz
ed
 e
ne
rg
y
number of PU types
First-Fit
Last-Fit
Worst-Fit
Best-Fit
(a) varying number of PU types, n=100
 1.25
 1.3
 1.35
 1.4
 1.45
 1.5
 80  85  90  95  100  105  110  115  120
n
o
rm
a
liz
ed
 e
ne
rg
y
number of tasks
First-Fit
Last-Fit
Worst-Fit
Best-Fit
(b) varying number of tasks, m=8
Figure 1. Experimental results
range of [0, 1] · pi, and hi,j is a random variable in the range
of [1, 2]. For each PU type Mj , the static and dynamic power
consumption are both random variables uniformly distributed
in [1, 10] mWatt. We run two configurations of experiments
by varying the number of PU types and the number of tasks.
For the first configuration, the number of PU types is from 2
to 10, while the number of tasks is 100. For the second con-
figuration, the number of PU types is 8, while the number of
tasks is from 80 to 120.
The normalized energy is adopted as the performance met-
ric in the experiments. The normalized energy consumption
of an algorithm for an input instance is the ratio of the allo-
cation cost of the solution derived from the algorithm to that
of the optimal solution of Equation (3). Clearly, an algorithm
with less normalized energy consumption has better perfor-
mance. Each point of the resulting configuration is an average
of 256 experiments.
Results Figure 1(a) presents the experimental results for the
first configuration by varying the number of PU types from
2 to 10 for 100 tasks. As shown in Figure 1(a), the nor-
malized energy increases when the number of PU types in-
creases. This is because the under-estimation of the static
energy consumption of Equation (3). As the lower bound
becomes more under-estimated, the normalized energy con-
sumption increases. Among the fitting algorithms, the best-
fit strategy is the best among the other evaluated algorithms,
while the improvement is more when the number of PU types
is less. Figure 1(b) shows the experimental results for the sec-
ond configuration by varying the number of tasks from 80 to
100 for 8 PU types. As shown in Figure 1(b), the normalized
energy decreases, in general, when the number of tasks in-
creases. Again, the best-fit strategy is the best among all the
evaluated algorithms.
6 Conclusion
This work explores energy-efficient task partitioning and
processing unit allocation for periodic real-time tasks based
on a given library of processing units. By considering both
static (leakage) power consumption and dynamic power con-
sumption, we propose heuristic algorithms based on the relax-
ation of the integer linear programming. We first find a best
mapping of tasks onto processing unit types, and then apply
bin-packing algorithms to allocate processing units. For fu-
ture research, we would like to explore whether the relaxation
can yield to approximation algorithms or possible approaches
with worst-case guarantees.
References
[1] J.-J. Chen, H.-R. Hsu, and T.-W. Kuo. Leakage-aware energy-efficient
scheduling of real-time tasks in multiprocessor systems. In IEEE Real-
time and Embedded Technology and Applications Symposium, pages
408–417, 2006.
[2] H.-R. Hsu, J.-J. Chen, and T.-W. Kuo. Multiprocessor synthesis for peri-
odic hard real-time tasks under a given energy constraint. In ACM/IEEE
Conference of Design, Automation, and Test in Europe (DATE), pages
1061–1066, 2006.
[3] T.-Y. Huang, Y.-C. Tsai, and E. T.-H. Chu. A near-optimal solution for
the heterogeneous multi-processor single-level voltage setup problem.
In 21th International Parallel and Distributed Processing Symposium
(IPDPS), pages 1–10, 2007.
[4] C.-M. Hung, J.-J. Chen, and T.-W. Kuo. Energy-efficient real-time task
scheduling for a DVS system with a non-DVS processing element. In
the 27th IEEE Real-Time Systems Symposium (RTSS), pages 303–312,
2006.
[5] R. Jejurikar, C. Pereira, and R. Gupta. Leakage aware dynamic voltage
scaling for real-time embedded systems. In Proceedings of the Design
Automation Conference, pages 275–280, 2004.
[6] M. T. Schmitz, B. M. Al-Hashimi, and P. Eles. Energy-efficient mapping
and scheduling for dvs enabled distributed embedded systems. In Pro-
ceedings of the 2002 Design, Automation and Test in Europe Conference
and Exhibition (DATE’02). IEEE, 2002.
[7] R. Xu, D. Zhu, C. Rusu, R. Melhem, and D. Mosse´. Energy-efficient
policies for embedded clusters. In ACM SIGPLAN/SIGBED Conference
on Languages, Compilers, and Tools for Embedded Systems(LCTES),
pages 1–10, 2005.
[8] Y. Yu and V. K. Prasnna. Power-aware resource allocation for inde-
pendent tasks in heterogeneous real-time systems. In Proceedings of
the Ninth International Conference on Parallel and Distributed Sys-
tems(ICPADS’02). IEEE, 2002.
Using Fixed Priority Scheduling with Deferred
Preemption to Exploit Fluctuating Network
Bandwidth
Mike Holenderski, Reinder J. Bril and Johan J. Lukkien
Technische Universiteit Eindhoven (TU/e)
Den Dolech 2, 5600 AZ Eindhoven, The Netherlands
{m.holenderski, r.j.bril, j.j.lukkien}@tue.nl
Abstract—Fixed Priority Scheduling with Deferred Preemption
(FPDS) offers a balance between Fixed Priority Non-preemptive
Scheduling (FPNS) and Fixed Priority Preemptive Scheduling
(FPPS), by allowing preemptions only at specified preemption
points. It provides finer grained preemptions than FPNS, improv-
ing the schedulability of higher priority tasks, and a coarser grain
preemptions than FPPS, reducing switching overhead incurred
during arbitrary preemptions. In this paper we investigate the
extent of improvement of FPDS with respect to FPPS and qualify
the costs of switching multiple resources under FPPS and FPDS,
and the cost of a preemption point. It forms a starting point for
our research into employing FPDS in an industrial case study,
to improve an existing multimedia processing system from the
surveillance domain. We focus on extending FPDS with optional
preemption points, to guarantee resource provisions to tasks
in spite of fluctuating resource availability, in the context of
reservation-based multi-resource sharing.
I. INTRODUCTION
On the two sides of the Fixed Priority Scheduling spectrum
we have Fixed Priority Non-preemptive Scheduling (FPNS)
and Fixed Priority Preemptive Scheduling (FPPS) [8]. While
FPNS favors the lower priority tasks, by postponing preemp-
tion by higher priority tasks until a lower priority running
task completes, FPPS focuses on the schedulability of higher
priority tasks. However, by allowing preemptions at arbitrary
moments in time, FPPS ignores the cost of such preemptions.
This overhead may become especially significant when tasks
share multiple resources, e.g. cache, local or main memory.
Fixed Priority Scheduling with Deferred Preemption (FPDS)
[4], [5], [7], [6], [3] finds a middle ground between FPNS and
FPPS:
• It aims at reducing the cost of arbitrary preemptions in
FPPS, by allowing them only at times convenient for the
system (referred to as preemption points), e.g. at times
where the context switch overhead due to preemption will
be smallest. If FPDS is used as a guarding mechanism
for critical sections, then there is also no need for
access protocols to the shared resources (other than the
processor), reducing the system overheads.
• It improves on FPNS by allowing shorter non-preemptive
subjobs and thus improves the schedulability of higher
priority tasks.
This work has been supported in part by the Information Technology for
European Advancement (ITEA2), via the CANTATA project.
FPDS is a generalization of FPPS and FPNS, where FPPS
can be modeled by FPDS with arbitrarily short subjobs (ig-
noring context switch and scheduling overheads), and FPNS
by FPDS with tasks consisting of a single subjob.
We distinguish two kinds of resources: preemptable and mu-
tually exclusive. When a preemptable resource (e.g. processor)
is preempted, we can store and reload its state, incurring some
bounded overhead. If we were to preempt a mutually exclusive
resource (e.g. access to a shared memory location) then its
integrity could be corrupted.
FPDS can be used to reduce the cost of context switches
of preemptable resources, and provide simple access protocol
to mutually exclusive resources. It promises a simple imple-
mentation of critical sections, compared to the intricate priority
inheritance protocols used in FPPS, as [11], [14] reveal wrong
implementation of these protocols in the existing real-time
operating systems. In FPDS, the subjobs simply execute non-
preemptively.
We are interested in employing FPDS in an industrial case,
to improve an existing multimedia processing system from the
surveillance domain. We focus on using FPDS to guarantee
resource provisions to tasks in spite of fluctuating resource
availability, in the context of reservation-based multi-resource
sharing.
A. A surveillance system
There are two main tasks in the system: a video task τv and
a network task τn. These tasks run on a platform containing
two processors with cache and two local memories (LM and
IRAM), communicating with the main memory M via DMA
transfers over a shared system bus, as shown in Figure 1.
A camera monitoring a scene places the captured video
frames in the main memory M. The video task τv loads the raw
frames from the main memory M to local memory LM, does
some video content analysis on them, encodes them and stores
the result back to the main memory M. The main processor is
responsible for scheduling and may offload some operations
of τv and τn to the co-processor.
The network task τn loads the encoded frames from the
main memory M to local memory IRAM, wraps them into
packets and sends them over the network via the EMAC
interface.
Fig. 1. Architecture of a surveillance system
Both tasks are periodic, with their period determined by the
video stream rate.
Currently the processor is scheduled according to FPNS,
with τn having higher priority than τv , because FPPS is not
feasible due to its large context switch overheads. These are
emphasized in the streaming multimedia domain, where tasks
are computation and data intensive, and where tasks are depen-
dent (producer consumer relation). However, due to fluctuating
network bandwidth and a resource greedy video task, the
network task cannot make optimal use of the processor: it
may be scheduled when the network is congested, or the video
task may hold the processor when the network is available.
We would like to use FPDS in combination with reservations
to guarantee resources to the video task, while optimizing
the use of the available network, by providing finer grained
preemption points.
B. Reservations and FPDS
We can guarantee the processor to the video task in spite
of fluctuating network bandwidth in the current surveillance
system by introducing reservations [9]. We implement the
reservations with two servers: a deferrable server for the
network task, allowing the exploitation of its preservation
strategy [2], and a periodic server for the video task.
We use FPDS to schedule the reservations globally, with
the network reservation having higher priority than the video
reservation. We use FPDS to schedule each reservation locally.
Selecting finer granularity of the preemption points in τv
will allow τn to resume more frequently and transmit when
the network is available. The deferrable server allows τn to
postpone its transmission and release the processor to τv , in
case the network is congested. Setting the preemption points
in τn to packet boundaries can prevent aborting an ongoing
transmission and avoid the cost of later retransmission.
C. Granularity of the preemption points
There is a trade off between the granularity of the pre-
emption points and the overall schedulability of the system.
On the one hand, if preemption points are spaced arbitrarily
close, their cost of invoking the scheduler may lead to more
frequent context switches and the need to execute expensive
locking mechanisms on shared resources, and thus lower the
useful utilization in the system [7]. On the other hand, a coarse
grain of preemption points will lead to worse schedulability
of higher priority tasks.
FPDS is also referred to as co-operative scheduling [4],
because some control over preemption is moved from the
scheduler towards the application. The real-time problem shifts
from scheduling towards finding the right preemption points
granularity.
D. Optional preemption points
It turns out that the additional cost due to preemption points
may imply a too coarse granularity of preemption points,
and thus discourage from using FPDS in certain applications.
We would like to suggest the concept of optional preemption
points, which are easy to implement and incur little overhead,
compared to the traditional preemption points.
The optional preemption points aim at reducing the over-
head associated with traditional preemption points through
a tighter co-operation between the scheduler and the tasks.
Traditionally, when a job arrives at a preemption point, it
does a system call to the scheduler indicating it is ready to be
preempted. Such a system call has two drawbacks:
1) The system call itself is expensive, especially when the
preempted job is running in a different memory space
than the scheduler.
2) The preempted job always has to assume that it will be
preempted. If the job knew that it will not be preempted
at the next preemption point it may choose a different
execution path, e.g. initiate a long memory transfer from
the main memory to the shared local memory, which it
would not do if it was to be preempted.
[7] reduce the system call overhead in point 1, by having
the kernel set a flag in a shared structure (we will refer to it as
the preemption flag) when a task arrives with a higher priority
than the running task. The running task checks during its next
preemption point whether it needs to be preempted by reading
the shared flag.
[13] presents a hardware supported solution to avoid a
system call in point 1. When a new task arrives, the scheduler
places it in the ready queue and loads dedicated registers with
the address of the next preemption point. A special hardware
component compares the address of the following executed
instructions with the address of the next preemption point.
When the address matches, then the corresponding switch
routine is performed.
E. Outline of this paper
We would like to extend FPDS to multiple resources and
apply it in reservation based scheduling, focusing on reducing
preemption point granularity to exploit the available network
bandwidth.
In this paper we would like to qualify the extent of improve-
ment of FPDS on FPPS. In Section II we qualify the context
switch overhead when multiple resources have to be switched,
and see how this cost influences the granularity of preemption
points. In section III we qualify the cost of a preemption point
itself, as opposed to regular trap/interrupt mechanisms. We
conclude with future directions in Section IV.
II. COST QUALIFICATION OF MULTIPLE RESOURCES
SWITCHING UNDER FPPS AND FPDS
The main idea behind FPDS is that the cost of context
switches can be reduced relative to FPPS, if tasks are pre-
empted at convenient times for the system, referred to as
preemption points. For example, it is inefficient to preempt
a task just after it’s program code and data was loaded into
the cache, because the program is likely to be flushed from
the cache before the preempted task resumes.
The literature on FPDS considers a single preemptable
resource (the processor) and focuses on how FPDS can help
reducing the cost of switching the processor [4], [7]. We would
like to extend FPDS to multiple resources, taking into account
the different switching costs of different resources. Depending
on the implementation and the underlying architecture, some
of the following factors may contribute to the context switch
overhead:
1) Registers: During a context switch the register file is
stored (in local or main memory), sometimes with hardware
support. Traditionally all registers are stored and reloaded. [13]
show how at runtime, the preemption is deferred to the next
preemption point, where the kernel invokes a custom context
switch routines (per preemption point), which save and restore
only the affected registers of the preempting and preempted
task. These routines are generated at compile time and execute
at the tasks privilege level, to ensure memory protection.
2) Cache: If the program for the preempting task is not
in the cache, it is first loaded from the main memory. Also
any cached data (especially in data intensive multimedia
applications) is likely to be flushed during the switch or the
execution of the new task.
3) Local memory: Similarly, if the preempting and the
preempted task are sharing the same locations in the local
memory, the local memory is stored to and reloaded from to
the main memory.
4) Main memory: Usually, memory performs best, when
the addresses are accessed in a consecutive manner. If the
main memory needs to be accessed by the scheduler to load
its program code or data during a preemption point, then the
sequence of memory accesses in the preempted task can be
interrupted. In our surveillance application example in Section
I-A, the memory access is a bottleneck.
5) DMA: In systems supporting virtual memory, memory
locations are grouped into a block and blocks are grouped
into pages, with virtual pages being mapped to physical pages.
When DMA is also supported, there is a question of which
addresses it should use: virtual or physical.
In case DMA transfers are setup using virtual addresses,
the processor has to provide the mapping from virtual to
physical addresses for the affected memory locations. In case
of physical addresses, a DMA transfer which spans across
pages will have to be chopped up into smaller pieces, because
the physical pages may not be contiguous. In both cases, there
is a setup overhead for a DMA transfer. Also, the operating
system must not remap the pages involved in the transfer [10].
6) Deep pipelines: Switching a pipelined stream of opera-
tions on a VLIW like processor may incur large overhead if
the instruction pipeline has to be flushed. In our surveillance
example, the overhead can be particularly large, when a video-
stream processing operations pipeline of τv offloaded to the
co-processor, is interrupted by packet encoding operations of
τn.
7) Network: There are two approaches to switching the
network: wait for the pending packet(s) to be sent, or interrupt
the current transmission. In the first case, the context switch
overhead will include the time necessary to complete the
transfer of the pending packets. In the second case, the
switching overhead is moved to retransmitting the aborted
packets the time the preempted task resumes.
The network resource behaves similarly to a hard disk, as it
has a large latency relative to the processor. Just like the disk
can be unavailable due to the seek time, the network availabil-
ity can fluctuate, due to congestion in the network. However,
since the network congestion is much less predictable, so is
the availability of the network resource.
The surveillance system currently employs busy waiting
approach, with the network task (once scheduled) waiting for
the network to accept all the packets, before it releases the
processor to the video task.
Under FPPS, depending on the architecture, all of these
overheads may be incurred upon a context switch. Traditional
FPDS can be used to reduce one particular overhead. For
example, if saving and reloading the data in the local memory
is the main bottleneck on a particular architecture, then
FPDS can be used to prevent arbitrary preemptions within
instruction sequences which operate on a shared location in
the local memory.
In our surveillance example the two bottlenecks are the main
memory and network access.
III. COST QUALIFICATION OF A PRE-EMPTION POINT
A preemption point in itself incurs overhead, which can be
modeled as shown in Figure 2.
Fig. 2. Breakdown of a preemption point, when (a) a higher priority task is
ready (b) no higher task is ready
1) While τ2 is executing task τ1 arrives. A corresponding
interrupt is dispatched (timer or external, depending on
task τ1) and handled by the kernel. The scheduler in
the kernel places τ1 in the ready queue and stores in a
register that a higher priority task has arrived. The kernel
returns and τ2 is allowed to continue.
2) When τ2 reaches a preemption point, it makes a system
call to the kernel. Here the task can execute a preemption
point specific code, e.g. register specific context switch-
ing functions at the kernel for saving and reloading only
the affected registers [13], or make a system call to
lock any mutually exclusive resources in case it will
be preempted [7].
3) The kernel checks whether a higher priority task is
pending.
4) The kernel stores the state of τ2. This can involve
switching any of the resources listed in Section II.
5) The kernel loads the state of τ1. This can involve
switching any of the resources listed in Section II.
6) After task τ1 executes, the kernel saves its state, in case
τ1 needs some of its state for its next invocation.
7) The kernel loads the state of task τ2.
IV. FUTURE DIRECTIONS
A. Quantify costs of preemption points and context switching
[7] quantifies the preemption point overhead for a processor.
We would like to quantify the preemption point and switching
costs in a multi-resource setting, based on our multimedia
streaming surveillance application, to gain insight into the
shortcomings of current solutions and serve as a reference
point for our future work.
B. Extract preemption points from code
In order for FPDS to be accepted by system designers,
it has to be easy to use. Rather than manually inserting
preemption points inside the code, we would like the compiler
to extract optimal preemption points, where the overheads are
minimized.
[13] presents a solution where the compiler identifies pre-
emption points (they refer to them as switch points) in the code
with a minimal number of general purpose live registers. We
would like to also include the switching overheads of other
preemptive resources, besides the processor.
C. Reduce preemption point cost
We would like to investigate how optional preemption points
can reduce the overhead of FPDS, compared to the traditional
preemption points.
[13] showes how to reduce the switching overhead by saving
and restoring only the affected registers. We would like to
extend this approach to other resources.
D. Account for race conditions due to optional preemption
points
In Section I-D point 2 we identified the potential of optional
preemption points allowing a task to check the preemption
flag before a preemption point and select an execution path
depending on whether it will be preempted or not during
its next preemption point. However, it gives rise to a race
condition.
Let tp be the next preemption point when the running task
τr can be preempted. For τr to adapt its execution path, it may
need to read the preemption flag at time tr < tp. The longer
the difference between tr and tp, the longer the time when
the flag cannot be changed by the scheduler, or the time when
τr will not take it into account. We would like to investigate
how to account for the race condition in the model of the
computation time of subjobs and the response time analysis.
E. Avoid preemptions when tasks share multiple resources
In systems which exploit multiple resources the cost of
preemptions is not straight forward, as it may depend on
the resources currently used by the running task and those
requested by the preempting task, leading to different costs
at different preemption points. We would like to investigate
finding appropriate preemption points, which take into account
the switching overheads of multiple resources.
ACKNOWLEDGMENTS
We would like to thank Rick Koeleman from VDG Security
for the insightful discussions about their platform architecture.
REFERENCES
[1] T. P. Baker, “Stack-based scheduling for realtime processes,” Real-Time
Syst., vol. 3, no. 1, pp. 67–99, 1991.
[2] R. J. Bril and P. J. Cuipers, “Towards exploiting the preservation strategy
of deferrable servers,” in To appear in: Proceedings of the Real-Time and
Embedded Technology and Applications Symposium (RTAS 08), 2008.
[3] R. J. Bril, J. J. Lukkien, and W. F. J. Verhaegh, “Worst-case response
time analysis of real-time tasks under fixed-priority scheduling with
deferred preemption revisited,” in Proceedings of the 19th Euromicro
Conference on Real-Time Systems (ECRTS 07), pp. 269–279, 2007.
[4] A. Burns, “Preemptive priority based scheduling: An appropriate en-
gineering approach,” in Advances in Real-Time Systems, S. Son, Ed.
Prentice-Hall, pp. 225–248, 1994.
[5] A. Burns, M. Nicholson, K. Tindell, and N. Zhang, “Allocating and
scheduling hard real-time tasks on a parallel processing platform,”
University of York, UK, Tech. Rep. YCS-94-238, 1994.
[6] A. Burns, “Defining new non-preemptive dispatching and locking
policies for ada,” Reliable SoftwareTechnologies —Ada-Europe 2001,
pp. 328–336, 2001.
[7] R. Gopalakrishnan and G. M. Parulkar, “Bringing real-time scheduling
theory and practice closer for multimedia computing,” SIGMETRICS
Perform. Eval. Rev., vol. 24, no. 1, pp. 1–12, 1996.
[8] C. L. Liu and J. W. Layland, “Scheduling algorithms for multiprogram-
ming in a hard-real-time environment,” J. ACM, vol. 20, no. 1, pp. 46–61,
1973.
[9] C. Mercer, R. Rajkumar, and J. Zelenka, “Temporal protection in real-
time operating systems,” in Proceedings of the 11th IEEE Workshop on
Real-Time Operating Systems and Software (RTOSS ’94), pp. 79–83,
1994.
[10] D. A. Patterson and J. Hennessy, Computer Organization and Design.
San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2004.
[11] D. Polock and D. Zobel, “Conformance testing of priority inheritance
protocols,” in Proceedings of the Seventh International Conference on
Real-Time Systems and Applications (RTCSA’00). Washington, DC,
USA: IEEE Computer Society, p. 404, 2000.
[12] L. Sha, R. Rajkumar, and J. P. Lehoczky, “Priority inheritance proto-
cols: An approach to real-time synchronization,” IEEE Trans. Comput.,
vol. 39, no. 9, pp. 1175–1185, 1990.
[13] X. Zhou and P. Petrov, “Rapid and low-cost context-switch through em-
bedded processor customization for real-time and control applications,”
in DAC ’06: Proceedings of the 43rd annual conference on Design
automation, pp. 352–357, 2006.
[14] D. Zo¨bel, D. Polock, and A. van Arkel, “Testing for the conformance
of real-time protocols implemented by operating systems,” Electronic
Notes in Theoretical Computer Science, vol. 133, pp. 315–332, 2005.
Estimation of Self-Healing Timing 
Characteristics for Real-Time Systems 
under Transient Faults       
S. Frenkel 
Institute of Informatics Problems Russian Acad. of Sc. (RAS), Moscow, Russia, 
119333,  Vavilova 44, kor.2, Moscow, Russia 
Slf-ipiran@mtu-net.ru 
Abstract-Self-healing phenomenon is very 
considerable aspect of high-reliable systems design. 
As the real-time systems have some timing 
constraints for their functioning, the self-healing 
time characteristics should be analyzed during 
design process. The computation of probability 
distribution function of time to self-healing needs 
for fair prediction of real time systems reliability.  
   This paper is an attempt to consider possible 
ways of estimation of time to self-healing under 
transient faults using Coupling of Markov chains 
model for reliability analysis of real-time systems 
with some fault-tolerant properties, modeled by 
well-known Finite State Machine (FSM) formalism. 
We consider some possibilities to use these models 
as a tool for the very reliable systems design.  
Keywords: fault-tolerant computer, self-healing 
I.INTRODUCTION 
Designers of nanotechnology-based real-time 
systems for safety-critical applications need to take 
into account that they are affected radiation-induced 
faults, first of all such as Single Event Upsets (SEU)  
ones [1], that can affect values stored by their latches. 
Thus, computing systems for safety-critical 
applications must be fault-tolerant to be able to 
continue properly functioning despite these transient 
failures of their hardware or software [1,2]. 
Self-healing phenomenon is very considerable 
aspect of high-reliable systems design. Self-healing 
enables the system to continue functioning correctly 
on the event of the failure to determine the errors and 
to recover from them. For example, a concept of a 
partially monotonic FSM, where the transitions are 
computed by partially monotonic Boolean functions is 
used to provide self-healing properties [1,2,3]. In 
particular, if we consider a self-checking digital circuit 
design, the different properties of logical functions 
may provide self-healing properties of the circuit [3]. 
The architecture that supports the self-healing property of 
the FSM is a well-known self-checking architecture [2], 
that uses output self-checking checker.  
     Since, in general, both input data and faults appear 
randomly, both the latency period and time before a 
system healing are some random values. A natural 
mathematical model of the system to model these 
phenomena is a random transition from one state to 
another. The semantics of this transitions may be 
considered in terms of finite state machines. This 
formalism may be used for various applications, from 
HW circuits to a program implementation of 
distributed algorithms, where interacted processes of 
computation may be expressed as FSM [4].  
Since, in general, fault-tolerance capability of real-
time systems is measured by a probability of finishing 
the task correctly within its deadline in the presence of 
a fault, designers need in prediction if the time to 
healing is within this deadline.  
     This paper describes a probabilistic approach to 
analysis of self-healing properties of real-time fault-
tolerant systems under some transient faults. The 
model of system, which allow to compare correct 
behavior with erroneous one is a Markov chain 
coupling [5]. The goal of this modeling is to estimate 
the time to return in correct behavior after hitting in 
some erroneous one. We will discuss the influence of  
the time to  reliability of the real-time systems. 
Because of very limited space we will not consider 
in this paper any specific mechanisms of the 
phenomena. We will just explain how we could use the 
suggested earlier models for self-stabilization 
phenomenon which could be used in Markov models 
of self-healing [6]. This is an attempt to consider 
possible ways of such modeling, using a generalized 
view on some probabilistic approaches to analysis of self-
healing paradigm–based fault-tolerant computer 
systems.  
II. SOME CONCEPTS OF  FAULT-TOLERANT 
DESIGN  
Let us outline some principal definitions of the 
fault-tolerant systems research area. 
A fault is a physical cause of incorrect behavior, e.g. a 
defect in a memory cell. Most popular fault model in 
the area of digital system testing is so-called stuck-at–
zero (a variable x has constant logical zero, which 
designated as x≡0), and stuck-at-one (correspondingly, 
x≡1) [2].   
The faults may be both permanent (that remain in 
existence indefinitely if no corrective action is taken), 
and transient ones (that appear and disappear quickly). 
An error is an undesired state or condition in a 
component of a target system, understanding as a 
discrepancy between an observed or measured value or 
condition and a specified theoretically correct value or 
condition. Error is a consequence of a fault. Faults 
may or may not cause one or more errors. Errors 
induce failures. A failure occurs when a system is no 
longer able to satisfy its specification, e.g. an incorrect 
word is formed on an its output. Correspondingly, we 
should differ between manifestation latency for the 
faults  and the errors.  
     The aim of a fault–tolerant design is to avoid a 
manifestation of the fault/error on the system designed 
output in order to prevent the failure behaviour.  
Note, that in accordance with [3], when a transient 
fault occurs, a system may transit from a fault-free 
behavior ("mode") either to the erroneous mode or in a 
mode where the faulty behavior will be "silent" that is 
an inner state will be incorrect while its output stays in 
a correct mode. If the system is able to return from the 
fault-free mode after its functioning in the silent mode, 
this is, obviously, the self-healing. In other words, in 
this case a future of a system (either it will “recover” 
or "die") depends of its behavior during several clocks 
after moving to the silent mode. This parameter of 
number of clocks looks  promising and motivating for 
the self-healing characterization.   
     A transient fault induced by the SEU may or may 
not be latched by a storage cell, but in case of the fault 
occurrence a correct operation of the corrupted module 
can be restored and the current state of the circuit can 
be reset. It can be achieved, say, either due to the 
monotonic properties of Boolen function describing 
the transitions of FSM representing the given system 
[3] or thanks to a reconfiguration [1]. Recall, that a 
system of logical functions ψ is partially monotonic in 
x’ variables if for any pair of Boolean k-tuples A, B 
the condition ψ(A)≤ ψ(B) is satisfied A≤B [3]. Let us 
illustrate this propertiy using an example from [3]. 
Table 1 represents an partially monotonic FSM in 
cubic form, that is the symbols  “-“ is “Don’t care” and 
free places in 3-bits  input vector X as well. Variables 
as, am are state ones (as  is  previous state,  am is a next 
one), Y is 7-bits output vector. 
           Table 1. An example of self-healing  of FSM 
                  
     Let the FSM had started from the state 1000, and 
next state 011 where the FSM should transit under 
input 111 was changed to 1100 due to a fault. But as 
we may see from this table, the output 1001010 
reminded unchanged, therefore, any alarm-or-
correction mechanisms will not be started up, and the 
FSM continue its normal work till the next clock. Let 
the fault be disappeared to the next clock.  If the next 
input vector is also 111, then the transition provides 
the state 0110 with the same output 1001010, and the 
next input 101 returns the FSM to the normal state 
0010 with the correct output 1001010, that means a 
“self-healing”.  
Another example of a system with self–healing, based 
on some reconfigurations of the a system under some 
errors provoked by a fault has been considered in 
[1,14]. However, this reconfiguration architecture 
functioning can be also described as a FSM [14].      
Under these considerations, we will consider that a 
self-healing system can recover from some transient 
faults within a finite time, provided that no further 
faults occur before the system is healing again. On the 
other hand, systems that are not self-healing might 
functioning in incorrect states forever, even if no 
further faults occur.  
III. FSM-BASED FAULT MANIFESTATION 
MODELS 
As it is mentioned above, in order to define the 
time of an observable event after a fault manifestation 
we need to compare the behavior both fault-free and 
faulty FSMs. Finite state machine (FSM) is very useful 
and popular model of a various components of 
computer system behavior at rather high levels of the 
system design, and the FSM transitions under 
independent random inputs is a Markov chain. 
Let FSM be a Mealy machine, with the state set 
{a1,…,an}, the input set xt,={x1(t),..xm(t)} and the 
output set yt={y1(t),..yn(t)}. Functions δ and λ are 
multiple-output Boolean functions which are a relation 
between the (input state, present state) pairs and the 
next states (δ), and between the (input state, present 
state) pairs and the output states (λ). Let the input 
words of the FSM be a randomly generated test input 
vector sequence. Obviously, that for the example of 
FSM mentioned above (Table 1), the probability of the 
self-healing property fulfillment depends on the  
distribution of input vectors, and self-healing 
properties are determined  by the properties of the 
distribution and transition functions of the FSM. 
     In general there are two possible general models to 
compare the behavior of the fault-free and faulty FSM 
defined either in a product of transition spaces of both 
FSMs or on  some pairs of states of Markov chains, 
corresponding to random transitions of both FSMs 
[7,8], that is a coupling of corresponding Markov 
chains with a given states of initial state distributions. 
  
IV. COUPLING  OF  MARKOV CHAINS as a 
MODEL OF  FSM HEALING 
In more narrow sense coupling is a method used 
for analyzing the rate of convergence to equilibrium in 
Markov chain Monte Carlo experiments [9]. From the 
point of view of fault-tolerant systems timing 
characterization, it is important that this method deals 
with estimation of the time that two faithful copies of a 
stochastic process coalesce together. In the practice of 
the Monte Carlo experiments, the couplings are useful 
because we can often make comparisons between 
distributions by constructing a coupling and comparing 
the random variables, that provides a very useful way 
to get upper bounds on the distance between the 
experiment trajectories. 
     Formally, a coupling of Markov chains is a process 
(Xt ,Yt) with the property that both (Xt) and (Yt) are 
Markov chains with transition matrix P, but the two 
chains may possibly have different starting 
distributions. This notion can be proper for the 
following model of FSM behavior in presence of a 
transient fault. Let us assume that an effect of a 
transient fault (error) on an FSM behavior can be 
modeled as a change of an initial state X0 of 
corresponding Markov chain for an initial state Y0, that 
is in the presence of the transient fault (acting, for 
example, during one clock ), the trajectory of the state 
transitions under switching of input signals are 
transitions as if the initial state of the FSM is Y
 0. Let 
such change leads to a change of a trajectory of the 
FSM transitions (under some random input vectors) 
from Xt to Yt (figure 1). In other words, a reason why 
this approach may be aimed for the problem of self-
healing time estimation is the fact that any coupling of 
Markov chains can be modified so that the two chains 
stay together at all times after their first simultaneous 
visit to a single state—that is, so that if Xs =Ys then Xt 
=Yt for t > s. It would be attractive to use the 
framework of the coupling to define and estimate the 
time between instant corresponding to beginning of an 
erroneous behavior and the moment of healing. Thus, 
in contrast with the classical works concerning 
coupling of Markov Chains [9] exploring a way of 
construction a coupling, we are interesting in finding 
and using of  the coupling properties of a given pair of 
Markov chains, describing fault-free and faulty FSMs 
rather than constructing a coupling.  
 Figure 1. FSM healing relatively to a transient error.        
     Earlier, the coupling time notion has been used in 
fault-tolerant computing area in [6] for deriving an 
upper bound on of expected time of reaching the legal 
configuration L (“hitting time”) starting from a 
“worst” configuration. More precisely, this paper deals 
mostly with the problem of the convergence rate (the 
time) for the chain to be -close to its stationary 
distribution. We will explain how the suggested 
recently coupling models of self-stabilization 
phenomenon could be used for analysis of a 
digital/computing system self-healing in presence of 
transient faults, when the system is modeled as a FSM 
under random inputs. 
Using the coupling notion, we may define the time of 
healing as following. Given a coupling (Xt, Yt), the 
(expected) coupling time is  
                       T= max
 x∈S, x∈SE(Tx,y),                        (1)     
                                                      
 where Tx,y= min{t : Xt = Yt | X0= x, Y0 = y}, S is the 
state of the Markov chain, corresponding to the given 
FSM transition table, E means the expected value 
       Let the initial distributions of Markov chain for a 
fault-free FSM is PXo, whereas PYo corresponds to the 
faulty FSM, and P is a transition matrix of the Markov 
chain, modeling the FSM under corresponding input 
random sequence. Then the coupling inequality [5] gives 
that the probability, that the Markov chain started in 
the initial state Y0 will hit after k steps  in the state,  
what it would hit  if it has started in the initial state  X0, 
is   
              supS |PxoPk(S) −. PyoPk(S )|  Pr(t>k)            (2) 
Correspondingly the probability of self-healing 
property fulfillment is:  
     
            Pr (t  k )   1- supS |PxoPk(S) − PyoPk(S )|     (3) 
If R(t, F) is the probability that the system has 
not failed in the continuous time interval [0, t], t > 0 in 
the presence of a faults F (e.g., SEU),  that is the 
reliability of the target system,  than the influence of  
self-healing for the system reliability can be represent 
as: 
                            R(t,F)= P0F(t)+ PhF(t) 
     where P0(t) is the probability of functioning the 
system on this interval without failures, both because 
of masking of the effect of the F and absent of F 
during this interval, PhF (t) is the probability that the 
error was healing before the end of the interval. If for a 
real-time system we consider this interval as a deadline 
mentioned above, that designers would be interested to 
estimate a gap between this interval and the healing 
interval.                             
     In the presentation of this paper some examples of 
using these models for some specific applications will 
be shown. 
V. CONCLUSION, DISCUSSION AND FUTURE  
WORK 
       In this paper we consider a way of real time 
systems reliability analysis with respect to transient 
faults taking into account some self-healing properties. 
We analyze these phenomena in terms of coupling of 
Markov chains describing faulty and fault-free FSMs 
under independent random input binary signals. The 
dependability from self-healing is proportional to the 
probability that a self-healing will prevent the failure [10]. 
Since we deal with the FSM models, it is possible to use 
this probabilities in analysis of reliability of digital and 
computer systems at FSM level of their modeling that 
is at rather early design stages. This models can be a 
base of a tool of fault-tolerant systems design, dealing 
with such fault tolerance aspects as fault detection 
latency (for permanent faults) [8] and self-healing in 
presence of some transient faults (Soft Upset Errors  
[1,11], in particular). These models can be used in 
analyzing reliability of a target system in hierarchical 
system design. Now there are some preliminary results 
of using this approach to self-healing probability 
estimation for a self-checking design [3].      
        Presently we consider the coupling only regards 
to one of possible convergence measures, while it 
would be interesting to consider it also with other ones
[12]. It would be useful also to integrate in the model 
of reliability some structural characteristics  like delay 
on SEU sensitivity of nodes of  logical circuits  [13]. 
Note, that when we use any of the known approaches 
to reliability estimation, we would often mark a system 
as erroneous, although it could successfully survive 
after numbers of clocks, which may lead to very 
pessimistic estimations and unjustified growth of 
design cost. One of very important area of research is 
the development of adequate Markov models which, in 
contrast to existing models, will allow for the 
expression of self-healing probabilistic characteristics 
in terms of the characteristics of components of an 
FSM composition (network). This model should allow 
the designer to calculate the transition probability 
matrix for transitions due to a single step of FSM, and 
calculate the transition probability matrix for 
transitions due to fault steps, and then obtain a 
transition probability matrix representing an 
appropriate Markov chain of a  product of fault-free 
and faulty FSMs. For example, let a designer deal with 
a network of interacting FSMs under random inputs, 
operating such that there is only one pair of interacting 
FSMs at any time, thereby one of them is "driving", 
while another is “driven”, and the status of each of the 
FSMs (from "driving" to "driven" and vice versa) may 
be changed in some random moments. In general, the 
process x(t), defined over a set of pairs {(il,jm)}, 
where il,jm are states of the components l,m, when 
they interact as driving and driven units under random 
inputs, cannot be described as a Markov chain. To be 
able to describe the component by a Markov chain, it 
is possible to extend the state vector of such a system 
by a coordinate nl,m, that is the random number of 
clock periods spent from the moment of the last 
change of the status of a given pair of components. 
The time between an instance when a fault corrupting 
the system behavior occurs and the moment the 
manifestation of the fault appears in the outputs of the 
system, as a time to hit in an absorbing state of the 
Markov chain, corresponding to the moment when a 
fault effect is observable. This time can characterize 
both fault detection latency and self-healing ability. 
Analogously, a possibility to represent the coupling 
inequality mentioned above for time to healing in the 
present of the transient faults for the case of a number 
interacting FSMs (e.g., as it has been described in 
Section 4, say by conditions (2)) should be very 
helpful for designers of fault-tolerant computer 
systems. We consider these problem as a very 
important area of future researches. 
References 
 [1]  M. G. Gericota et al., “A self-healing real-time 
system based on run-time self-reconfiguration”, Proc. 
of the 10th IEEE Conf. on Emerging Technologies and 
Factory Automation, pp. 19-22, 2005. 
[2] P. Lala, Self-checking and Fault-Tolerant Digital 
Design, Morgan Kaufmann Publishers, 2000.  
[3] I.Levin, A.Matrosova, S. Ostanin, Survivable Self-
checking Sequential Circuits, Proceedings of the 
DFT’01, p. 395, 2001.   
[4] A. Dhama, O.Theel, and T. Warns, Reliability and 
Availability Analysis of Self-Stabilizing Systems, 
LNCS vol. 2280, Stabilization, Safety and security of 
Distributed systems, pp. 244-261, 2006.  
[5] J.S. Rosenthal, Faithful Couplings of Markov Chains: 
Now Equals Forever, In Advances in Applied 
Mathematics 18, pp.372-381, 1997. 
[6] L. Fribourg, S. Messika,· C. Picaronny, 
Coupling and self-stabilization, Distrib. Comput., DOI 
10.1007/s00446-005-0142-7, Special issue: DISC 04,  
2005. 
[7] J. Shedletsky, E. McCluskey, The Error Latency of 
Fault in a Sequential Digital Circuit", IEEE 
Transaction on Computers, vol. 25, No 6, pp. 655-659, 
1976. 
[8] S. Frenkel, A. Pechinkin, V.Chaplygin, I.Levin, A 
mathematical Tool for Support of Fault-Tolerant 
Embedded Systems Design, ERCIM/DECOS 
Dependable Smart Systems: Research, Industrial 
Applications, Standardization,Certification and 
Education.Workshop on "Dependable Embedded 
Systems", Lübeck, Germany,2007. 
[9] A. Sinclair, Convergence rates for Monte Carlo 
experiments. In Numerical Methods for Polymeric 
Systems, pp. 1–18,. IMA Volumes in Mathematics & 
Its application, 1997.
[10] Hawthorne M., Perry D., Architectural Styles 
for Adaptable Self-Healing Dependable Systems, 
Proceedings of .ICSE’05, May 15–21, 2005, St. Louis, 
Missouri, USA. 
[11] R. Baumann, Soft Errors in Advanced Computer 
Systems, IEEE Design and Test, May-June,  pp.258-266, 
2005. 
[12] M. Huber, Exact sampling and approximate 
counting techniques, Proceedings of the 30-th Annual 
ACM Symp. on Theory of Computing (STOC’98), pp. 
31–40, 1998.
[[13] R.Thara, S. Bhanja, A Stimulus-free Probabilistic 
Model for Single-Event-Upset Sensitivity, 12th NASA 
Symposium on VLSI Design, Coeur d’Alene, Idaho, 
USA, Oct. 4-5, 2005. 
[14] P.K. Lala and B. K. Kumar, An Architecture for 
Self-Healing Digital Systems, Journal of Electronic 
Testing: Theory and Applications, 19, pp. 523–535, 
2003. 
Proportional Cache-fair Scheduling for Multi-core Systems
Apicha Suksompong and Damir Isovic
Ma¨lardalen University, Sweden
Abstract
In this paper we present a cache-aware real-time
scheduling algorithm for multi-core platforms. It uses the
standard timing constrains of the tasks together with the
hardware counter statistics to yield unique task priorities.
Scheduling according to timing constraints criterion en-
forces timely completion of the tasks while the cache-fair
thread scheduling ensures that tasks to which shared caches
are unequally allocated are compensated by assigning more
or less processor time.
1 Introduction
Multi-core architectures in which several processor cores
are integrated on a single chip are becoming mainstream to
achieve higher processor performance. On such platforms,
diverse signals are transmitted on a short distance between
different processing units and do not have to travel off-chip,
because multiple processor cores are combined on a single
die. They also share some electronic components and re-
quire relatively low power. Hence, a multi-core processor
can fit into a small silicon package.
In spite of having high performance, multi-core pro-
cessors require the adaptation of existing software to fully
exploit resources in the system. Ordinary schedulers are
not compatible with the scheme of multi-core processors.
Shared resources among processor cores, e.g., caches, are
often occupied unequally, depending on the demand of run-
ning threads. As a consequence, the performance of a thread
depends on co-runner threads that execute on the same core.
When scheduling tasks on multi-cores, we believe that
both the nature of the architectures and runtime informa-
tion from hardware should be jointly considered. In this
paper, we present a novel scheduling method that extends
the state-of-the-art methodology for MPSOC, in the sense
that the scheduling software requires exhaustive informa-
tion from the hardware layer, such as runtime statistics. The
proposed scheduler allocates shared system resources fairly
to co-scheduled tasks within applications. More specifi-
cally, our method assigns tasks to cores based on the ratio
between the remaining execution time and the deadline of
the tasks, together with their cache fair miss rates. If the
priority assignment between tasks in the system cannot be
resolved by comparing their timing attributes, the fair cache
miss rate is used to yield unique priorities.
2 Related work, Motivation and Approach
Several algorithms for scheduling tasks on multiproces-
sors have been proposed, see e.g., [8, 1, 6, 5, 3, 10, 11,
2]. Proportionate fair (Pfair) scheduling algorithms were
proven that they could optimally schedule periodic tasks on
multiprocessors in polynomial time [4]. Pfair schedulers
are proper for real-time systems because they enable tasks
to execute at constant rates. There are three variants of Pfair
scheduling algorithms; the PF, PD and PD2 algorithms
[4, 5, 3]. All of them are similar in primary algorithms that
are the earliest-pseudo-deadline-first (EPDF) policy to pri-
oritize subtasks, but their secondary algorithms for breaking
ties are distinct. After eligible subtasks are ordered by prior-
ity, the scheduler will retrieve the highest priority subtasks
as many as the number of idle processors in order to execute
at that time. The research shows that the preemption and
migration costs resulting from Pfair scheduling algorithms
are compensated by the improved schedulability.
A detailed study of fairness in cache sharing between
threads in a chip multiprocessor architecture has been pro-
posed in [9]. It proposes static and dynamic L2 cache par-
titioning algorithms that optimizes fairness, and it studies
the relationship between fairness and throughput in detail.
A work on cache-fair thread scheduling has been proposed
in [8] to solve the problems due to co-running threads. Its
prime objective is to adjust quanta in compensation for un-
fairly shared caches.
An important difference between multi-core designs and
others is many levels of shared caches. Migrations of sched-
uled tasks into other processor cores bring out low costs ow-
ing to the existence of needed contexts in shared caches. In
contrast, the tasks compete for deploying the shared caches
and cache thrashing possibly occurs. With these reasons,
it is desirable that relevant scheduling on multi-core pro-
cessors is introduced, since the existing approaches provide
only partial solutions. Some of them attempt to place tasks
into processors in order to meet timing constraints but over-
look the actual amount of work any task finishes.
In real-time systems on multi-core processors, we need
scheduling algorithms that can not only efficiently meet task
deadlines, but also try to do their work evenly, in case equiv-
alent processor time is allocated, to benefit from shared
resources in the systems. The scheduling policy that we
propose requires exhaustive information from the hardware
layer, such as runtime statistics, in order to make good
scheduling decisions. Furthermore, the scheduler has to al-
locate shared system resources fairly to co-scheduled tasks
within applications.
3 New Scheduling Algorithm
We propose a novel real-time scheduling algorithm for
multi-core systems that takes into account task deadlines
and execution time together with hardware counter statistics
and shared caches when making scheduling decisions. Our
method primarily uses the timing constraints of the tasks to
decide their priorities, but if this cannot yield unique priori-
ties, it uses the cache-fair miss rate as a second-level priority
assignment policy.
3.1 Fair L2 cache miss rate
A fair L2 cache miss rate is a rate that a thread would
miss under equally shared caches. The standard unit of a
fair miss rate is misses per cycle (MPC). It has been shown
in [8] that co-runners get quite equal shares of caches if they
have similar L2 cache miss rates, shortly named miss rates.
This condition implies that there will be co-runners’ fair
miss rate when they meet similar miss rates, i.e., they are
sharing the caches equally. Different co-runners together
with a specific thread, Ti, are run on a dual-core processor
to measure their miss rates. After that, we take into consid-
eration the relationship between the given miss rates so as
to estimate Ti’s fair miss rate.
The relationship between co-runners’ miss rates is a lin-
ear equation [8, 7], given by:
MissRate(Ti) = a×
n∑
i=1
MissRate(Ci) + b (1)
where n represents the number of co-runners, Ci represents
the i-th co-runner, and a and b represent the linear equation
coefficients. We use the equation above to derive the miss
rate of the task Ti.
When all co-runners meet the same miss rate, it will be-
come the fair miss rate, i.e., the conditions is that:
FairMissRate(Ti) = MissRate(Ti) = MissRate(Ci)
Hence, the equation 1 is transformed into:
FairMissRate(Ti) = a× n× FairMissRate(Ti) + b
We use the equation above to calculate Ti’s fair miss rate:
FairMissRate(Ti) =
b
1− a× n
(2)
3.2 Priority assignment
We propose two-level priority assignment based on tim-
ing constraints and the fair miss rate of the tasks.
Primary assignment - Let (ei, di, pi) denote the execu-
tion requirement, the relative deadline, and the period of a
periodic task Ti. As we have observed what the decisive
factors in scheduling decisions are, tasks having short dead-
lines and long execution time should obtain high priorities.
Consequently, in our method the priority of a task Ti, de-
noted prio(Ti), is proportional to the remaining execution
time, e′i ≤ ei, and inversely proportional to the remaining
time until deadline, d′i ≤ di, i.e.,:
prio(Ti) ∝
e′i
d′i
or prio(Ti) = k
e′i
d′i
(3)
where k represents the proportionality constant which is a
constant non-zero number.
The value of e′i is computed from the total of execution
time that the job has taken so far subtracted from the exe-
cution requirement within a period. The value of d′i is com-
puted from the current time t subtracted from the absolute
deadline of the current job, Di, as shown in the following
expressions where S(i, u) is a schedule of a task Ti, the
value of which equals 1 if Ti is scheduled in a slot u, other-
wise 0:
e′i = ei −
t−1∑
u=u0
S(Ti, u), u0 =
⌊
t
pi
⌋
pi (4)
d′i = Di − t (5)
Note that the fraction values of equation 3 calculated for
different tasks can be the same. Consequently, we need
mechanism to assign unique priorities based on a second-
level policy, which we describe next.
Secondary assignment - The scheduling decisions will
not only rely on deadlines and execution times of the tasks
but also on hardware counter statistics. To lower the amount
of cache contention, tasks that share a lot of caches with
high priority tasks at time t should obtain low priorities
so that they will not be selected to execute at time t. The
cache contention noticeably corresponds to miss rates, tak-
ing place in L2 caches, of co-scheduled tasks.
The secondary algorithm is deployed when the values
of the fraction in equation 3 are the same. The fair miss
rate model of cache-fair thread scheduling is leveraged to
break ties. Tasks having less fair miss rates will get higher
priorities under the secondary algorithm. That is, the less
the fair miss rate, the higher the priority. In other words, a
task Ti will have a higher priority than a task Tj at a time t
if one of the following conditions holds:
(C1)
e′i
d′i
>
e′j
d′j
(C2)
e′i
d′i
=
e′j
d′j
∧ FairMissRate(Ti)<FairMissRate(Tj)
The intuition behind the above condition (C1) is that the
higher the value of e′i/d′i, the more critical the task Ti is
with respect to a deadline miss. For example, the value of 1
means that e′i equates to d′i so the task Ti immediately and
continuously requires e′i units of processor time. Hence, Ti
will be assigned the highest priority.
Under the primary priority assignment algorithm, the
priorities of tasks in a system change at every slot, depend-
ing on current values of e′ and d′. The tasks are able to
migrate all the time and the priorities belong to each subdi-
vided task. This policy is similar to quantum-based schedul-
ing. Additionally, it is regarded as a work-conserving algo-
rithm because all whole tasks are constantly scheduled at
every period after they are released.
For the secondary algorithm, a tie-breaking condition is
defined. It strives to give certain tasks higher priorities when
the tasks are assigned the same priority by the primary al-
gorithm. The algorithm is in reality planned to judge on
miss rates. Despite being the distinct values, we cannot di-
rectly compare the miss rates provided by hardware coun-
ters. These miss rates potentially come from competition
between some tasks. They have to be standardized, i.e., new
values involved in miss rates and depending on all tasks in
the system are compared instead, so FairMissRate(Ti)
of each task Ti is calculated. Low FairMissRate(Ti) im-
plies that Ti will not terribly afflict its co-runners in that
they including Ti will compete and get shared caches more
satisfactorily than high ones. This is a reason why low
FairMissRate(Ti) is preferred under the above condition
(C2) and deserves a high priority.
3.3 Algorithm description
Assume a multi-core system with m processor cores and
n running tasks. At compile time, all the tasks are placed
into the processor cores and temporarily scheduled by the
global EDF algorithm or an available scheduling algorithm
which allows measurement of the miss rates of each task.
After the system executes twice the hyperperiod of the task
set, 2h (h=least common multiple of all tasks periods),
quanta on the execution, a statistic of miss rates will be
recorded as a data point of n tasks. The 2h quanta ensure
that every task Ti would consume processor time more than
ei. In other words, Ti would experience every state, e.g.,
executing, ready and waiting. This renders a cache access
pattern more stable. We then record other data points with-
out resetting the hardware counter until acquiring five data
points. Five statistics of miss rates are believed adequate
for the assessment. Each data point is plotted in n graphs
showing the relationships between each task and its proba-
ble co-runners.
The next step is to estimate the relationships, i.e., the lin-
ear equation coefficients are computed from equation 1 and
we finally yield the fair miss rate of each task by resolv-
ing equation 2. This process is done before starting online
scheduling.
At run-time, tasks that are released at time t are stored
in a ready queue. A fraction of e′ and d′ is calculated for
each task in the queue at a time to set priorities. The highest
priority tasks under the condition (C1) are chosen to execute
first. When a job is finished, the associated task will be
removed from the queue.
Nevertheless, if some tasks have the same values of the
fraction, another procedure will be invoked to prioritize
them under the condition (C2). When done, it will return
to continue proceeding with primary algorithm.
4 Example
Assume a task set as described in the table of figure 1.
Besides the traditional timing attributes, the fair miss rate is
also given in the table. The hyperperiod equals 4620 so the
statistics of miss rates are recorded every 9240 slots for five
data points. The utilization of the system is U ≈ 2.2 which
means that the system requires three processor cores. Figure
1 reflects begining of a schedule of these tasks according to
our method, comprising three processor cores. The arrows
up denote the activation times of the tasks and the arrows
down denote their completion times. The values of e′/d′
are depicted next to each task in the figure. For example,
at time 5 task A will need 2 time units to execute and its
deadline will occur in 5 time units, giving e′/d′ = 2/5.
At time 0, all five tasks are ready to execute, and we have
three available cores. Task D has the largest e′/d′ ratio,
7/10=0.7, hence it will be assigned to a processor core first.
Task E has the second largest ratio, 2/3=0.66, and it will
be scheduled next. Tasks A and B have the same value of
e′/d′, i.e., 0.5, and there is only one processor core left, so
we need to use the tie-breaking condition (C2) to determine
which one will be scheduled first. Task A has a lower fair
Task c D p
Fair miss rate
(misses per 10 000 cycles)
A 5 10 11 8
B 1 2 2 16
C 2 5 7 14
D 7 10 12 20
E 2 3 5 9
0 5 10
2/3 1/2 2/3 1/2 2/3 1/2 2/3 1/2
D
E
7/10 6/9 5/8 4/7 3/6 2/5 1/4 1/3 7/10 6/9 5/8 4/7 3/6 2/5 1/4
2/5 2/4 2/3 1/2 2/5 1/4 2/5 1/4 1/3
C
1/2 1/1 1/2 1/1 1/2 1/2 1/2 1/2 1/2 1/2 1/2 1/2
B
5/10 4/9 4/8 3/7 3/6 2/5 1/4 5/10 4/9 3/8 2/7 2/6 1/5 1/4
A
Figure 1. Example
miss rate than B, hence it will be scheduled. Tasks B and C
(with the lowest ratio) need to wait to be scheduled.
At time 1, task B has the largest ratio, i.e., its remain-
ing execution time is 1 while it has only one tick left until
its deadline, giving the ration 1/1=1. The next task to be
scheduled is task D, with the next largest ration of 6/9=0.66.
Even here we have the case with two tasks with the same ra-
tio, i.e., both task C and E have the ratio 0.5. According to
the condition (C2), the last core will be given to task E.
Likewise, the condition (C2) of the scheduling is utilized
to breaking ties at time 4, 5, and 6.
5 Conclusions
In this paper we present a real-time scheduling algorithm
for multi-core architectures that utilizes hardware counter
statistics when making scheduling decisions. It assigns
tasks to cores based on two conditions, the ratio between
the remaining execution time and the deadline of the tasks,
and their cache fair miss rates. If the priority assignment
between tasks in the system cannot be resolved by compar-
ing their timing attributes, the fair cache miss rate is used to
yield unique priorities.
There are several advantages of our approach. The
scheduling based on a fraction of e′ and d′ is a work-
conserving algorithm which allocates processor time effi-
ciently. It also is a quantum-based approach which sched-
ules quantum-length subtasks and avoids the bin-packing
problem. Furthermore, a ready queue is only necessary for
the implementation so it does not need operations to merge
queues as well as release queues. Finally, its secondary al-
gorithm is intended to decrease the amount of cache con-
tention. Consequently, the schedule is apt to result in the
increased amount of work. The cost of estimating fair miss
rates does not influence the system because it happens at
compile time.
We are currently working on the formal and the experi-
mental evaluation of the proposed algorithm, i.e., the exis-
tence of feasibility conditions for the proposed scheduling
algorithm and the experimental results assessing the impact
on cache performance. We are also looking into the impacts
of the L1 cache
References
[1] J. H. Anderson, J. M. Calandrino, and U. C. Devi. Real-
time scheduling on multicore platforms. In Proceedings of
the 12th IEEE RTAS 06. Washington, DC, USA, 2006.
[2] J. H. Anderson, P. Holman, and A. Srinivasan. Fair schedul-
ing of real-time tasks on multiprocessors. In Handbook of
Scheduling: Algorithms, Models, and Performance Analy-
sis. Chapman and Hall, Florida, 2004.
[3] J. H. Anderson and A. Srinivasan. Early-release fair schedul-
ing. In Proc. of the 12th IEEE ECRTS ’00. Sweden, 2000.
[4] S. K. Baruah, N. K. Cohen, C. G. Plaxton, and D. A. Varvel.
Proportionate progress: a notion of fairness in resource al-
location. In Proc. the 25th ACM Symposium on Theory of
Computing. ACM Press, 1993.
[5] S. K. Baruah, J. E. Gehrke, and C. G. Plaxton. Fast schedul-
ing of periodic tasks on multiple resources. Technical report,
University of Texas at Austin, Austin, TX, USA, 1995.
[6] J. M. Calandrino, J. H. Anderson, and D. P. Baumberger. A
hybrid real-time scheduling approach for large-scale multi-
core platforms. In Proc. of the 19th ECRTS. Italy, 2007.
[7] J. Carpenter, S. Funk, P. Holman, A. Srinivasan, J. H. Ander-
son, and S. Baruah. A categorization of real-time multipro-
cessor scheduling problems and algorithms,. In Handbook
of Scheduling: Algorithms, Models, and Performance Anal-
ysis. Chapman and Hall, Boca Raton, Florida, 2004.
[8] A. Fedorova, M. Seltzer, , and M. D. Smith. Cache-fair
thread scheduling for multicore processors. Technical re-
port, Cambridge, MA, USA, October 2006.
[9] S. Kim, D. Chandra, and Y. Solihin. Fair cache sharing and
partitioning in a chip multiprocessor architecture. In Proc.
of the 13th PACT, 2004.
[10] S. Ramamurthy and M. Moir. Static-priority periodic
scheduling on multiprocessors. In Proc. of the 21st IEEE
RTSS. Orlando, USA, 2000.
[11] A. Srinivasan and J. H. Anderson. Optimal rate-based
scheduling on multiprocessors. In Proc. of the 34th ACM
Symposium on Theory of Computing, 2002.
Competitive Reward-Based Scheduling for
Real-Time Tasks
Nathan Fisher and Daniel Grosu
Wayne State University
Department of Computer Science
Detroit, MI USA
{fishern, dgrosu}@cs.wayne.edu
Abstract—We introduce the notion of reward-based scheduling
of periodic tasks in a competitive environment. In traditional
reward-based scheduling, each task specifies both a mandatory
and optional execution requirement and an associated reward
function describing the value that the system obtains from
executing some portion of the task. The overall system goal is
to maximize the system welfare over all tasks (i.e., the sum of
each individual task’s reward) while ensuring that the mandatory
portions of each tasks complete by their deadline. Competitive
reward-based scheduling allows each task to be owned by a
separate agent whose objective to maximize its own reward. In
this setting, an agent may have an incentive to lie about its
execution requirement or reward function to obtain a larger share
of the processor execution. In this report, we give an example
that shows that the standard allocation strategy may result in a
suboptimal system welfare. We suggest a mechanism for resource
allocation in the competitive setting. We discuss open questions
and future research in competitive reward-based scheduling.
I. INTRODUCTION
Traditionally, the model for real-time systems has assumed
that system design, development, and execution occurs in a
strictly-controlled environment and that the system designers
know and formally analyze the temporal constraints of all
system processes at design time. However, there has been a
perceptible recent shift in many domains away from tradi-
tional, strictly-controlled development environments to more
“open” systems. In an open system, the different subsystems
that share a processing platform may be developed and con-
trolled by independent, potentially “selfish”, competing agents
that seek to optimize their own objective functions (e.g.,
competing real-time applications co-executing on a shared host
machine in system virtualization/colocation centers). In these
settings, a system owner that controls the processing platform
must determine an efficient allocation of resources among the
competing subsystems. However, without an effective mecha-
nism for determining how to subdivide the system resources,
an efficient and fair allocation of shared resources among
competing subsystems will not be achieved.
As motivation, consider the setting of a shared compu-
tational grid or service-oriented environment. The cost of
acquiring and maintaining the hardware to support large-
scale computation represents a significant financial burden
for an organization. An organization may potentially decrease
this cost by employing the computational services of another
organization that has contributed resources to a shared pool
(e.g., a grid). For computation that requires a specified level of
service, a contract between the organization and the system-
hosting company specifies the quality-of-service (QoS). The
aforementioned scenario is competitive due to the fact other
agents may be vying for access to the same set of resources.
Recently, researchers have analyzed resource allocation of
grid-based and service-oriented computing environments in a
competitive setting [5], [1]. However, scant attention has been
focused on computation that requires real-time guarantees.
Other than the work of Porter [9], we are unaware of any work
that focuses on competitive sharing of computing resources
under real-time constraints.
In an effort to extend a model of competition to real-time
systems, we introduce the concept of competitive reward-
based scheduling. (Non-competitive) reward-based scheduling
was introduced [2] as a means of determining processor
allocation in an overloaded real-time systems. Each task in
the system specifies both a mandatory and optional execution
requirement. The system must determine an allocation of
processing time such that each task entirely completes its
mandatory portion of execution and efficiently allocates the
remaining processing time to the optional execution of the
system’s tasks. To facilitate the system’s decision in allocating
processor time, each task also specifies a function that quan-
tifies the “reward” the system obtains from executing a given
portion of the task’s optional execution. This approach gen-
eralizes previous approaches for dealing with execution under
processor overload (e.g., the imprecise computation [7] and
IRIS (Increased Reward with Increased Service) [4] models).
In a competitive environment, each task may be owned
and controlled by a different agent. Standard game-theoretic
analysis generally assumes that each agent has an associated
utility function which describes the agents preferences (over all
possible outcomes) and that each agent behaves to maximize
its own utility, with respect to the behavior of all other
agents. In competitive reward-based scheduling, each agent
is responsible for reporting to the system its mandatory and
optional execution requirements and its reward (i.e., utility)
function. The system will then determine, based on these
reported values, the proportion of the processor allocated to
each agent’s task. However, if an agent believes they may
increase their reward, there may be an incentive to misreport
the execution requirements and reward function. In this re-
port, we give an example for two agents to show that the
reward-based allocation algorithm (based on one proposed by
Aydin [2]) may result in an suboptimal total system welfare
(the sum of the individual agent’s utilities). We propose
an algorithm based on techniques in algorithmic mechanism
design that we conjecture will induce each agent to truthfully
report its execution requirements and reward function. We also
briefly discuss open questions and ideas for future research in
competitive reward-based scheduling.
The organization of this paper is as follows. Section II
provides background on both reward-based scheduling and
algorithmic mechanism design and presents our model for
competitive reward-based scheduling. Section III briefly de-
scribes prior research in algorithmic mechanism design. Sec-
tion IV gives an example that shows the optimal algorithms
for non-competitive reward-based scheduling are suboptimal
for competitive reward-based scheduling. Section V suggests
a mechanism that we conjecture induces truthful behavior by
the agents. Section VI concludes and gives ideas for future
research.
II. BASIC CONCEPTS
A. Background
Before formally introducing competitive reward-based
scheduling, we first review concepts from (non-competitive)
reward-based scheduling and algorithmic mechanism design.
§Reward-Based Scheduling. We now briefly summarize the
work of Aydin et al. [2] which serves as the foundation of
our model. In reward-based scheduling, there are a set of
N tasks periodic tasks in task system τ def= {τ1, τ2, . . . , τN}.
Each periodic task τi will be characterized by a four-tuple,
(e(m)i , e
(o)
i , pi, ri(·)). A task τi releases a job every period
of pi time units, starting at system start-up time (assumed to
be zero). The absolute deadline of each job coincides with
the next job arrival of the same task. Between a job’s arrival
and its absolute deadline, it must complete at least e(m)i units
of mandatory execution requirement for the task to correctly
execute. A job may execute (up to its deadline) for at most
an additional e(o)i units of optional execution. If the j
th job
of task τi (denoted Jij) completes its mandatory execution
and executes its optional portion for xij ≥ 0 units, it obtains
reward of ri(xij). It is assumed that ri is a concave, non-
decreasing function over R+ and that r(0) is finite.
The goal of the system is to complete each jobs mandatory
execution by its deadline and maximize the total reward over
all tasks in the system. For any schedule of task system τ over
its hyperperiod1 P , let xij be the amount of optional execution
completed by job Jij . The total reward is
P/pi∑
j=1
ri(xij) (1)
assuming that each job completes its mandatory execution by
its deadline. (Otherwise, the total reward is assumed to be
1A hyperperiod is the least common multiple of the task periods,
p1, p2, . . . , pN .
zero). Thus, the problem is to find the optimal values of xij
for all i = 1, . . . , N and j = 1, . . . , P/pi.
Under the assumption that the mandatory execution require-
ments of the task system are feasible on a single preemp-
tive uniprocessor with earliest-deadline-first (EDF) scheduling,
Aydin et al. [2] show the following desirable property of
the above maximization problem: there exists an optimal
assignment of the xij values that maximizes Equation 1 where
each job of any task τi executes the same amount of optional
execution (i.e., xi1 = xi2 = . . . = xi Ppi
). We will use αi
to denote such an optimal optional execution duration for
τi. Aydin et al. show that optimal values for the αi may be
obtained by solving the following linear program.
MAX-REW-NC
.
Maximize
∑N
i=1 ri(αi), subject to the following con-
straints:∑N
j=1
e
(m)
i +αi
pi
≤ 1 (3a)
αi ≤ e(o)i (i = 1, 2, . . . , N) (3b)
0 ≤ αi (i = 1, 2, . . . , N). (3c)
(2)
§Algorithmic Mechanism Design. In recent years, many
research challenges in distributed computer systems have been
successfully addressed using game-theoretic analysis tech-
niques. In the abstract, a game is comprised of a set N of
agents (or players); for each agent i ∈ N , there is a set of
strategies Φi, a private agent type θi from a set of types Θi, and
a utility function ui where ui : Θi×Φ1×Φ2× . . .×ΦN 7→ R
(note: we abuse the notation N to represent both the set of
agents, set of tasks, and the number of agents/tasks – the rea-
son for this will be clear in the next subsection). In its simplest
form, called a “strategic form”, a game requires that each agent
i ∈ N chose a single action simultaneously with all other
agents in the game2; the set of actions chosen is represented
by the vector φ = (φ1, φ2, . . . , φN ) ∈ Φ1 × Φ2 × . . . × ΦN .
In game theory, it is typically assumed that each agent is
“selfish” and seeks to maximize the value obtained in ui by
selecting the maximizing φi ∈ Φi with respect to the strategies
chosen by every other agent. An equilibrium in a game is an
action φ = (φ1, φ2, . . . , φN ) in which no agent has incentive
to deviate; i.e., for any player i ∈ N if all other agents j ∈ N
(where j 6= i) chose equilibrium action φj then the action that
maximizes ui with respect to the fixed φj’s is φi.
At the intersection of game theory and computer science,
the subarea of algorithmic mechanism design (AMD) seeks
to design mechanisms (i.e., games) that induce competing,
selfish agents to play certain strategies. Payments, to and
from the mechanism, provide incentives for the agents to
chose strategies. In settings where the agent’s set of possible
strategies corresponds to the set of possible agent types, a
2Strategic form is presented for ease of presentation; other more sophisti-
cated game models exist that account for alternating moves and other agent
behavior.
widely studied property of mechanism design is incentive-
compatibility. A mechanism is incentive-compatible if for each
agent i ∈ N the strategy φi that maximizes ui (irrespective
of all other agents’ strategies) corresponds to the agent i’s
private type θi; in other words, an incentive compatible method
induces each agent to truthfully reveal its type.
B. Competitive Reward-Based Scheduling Model
There is a clear connection between the AMD setting and
reward-based scheduling. We can consider each task τi to be
owned by an independent agent and the system to be owned
by the mechanism designer. In this setting, each agent has a
private type θi = (e
(m)
i , e
(o)
i , pi, ri(·)). We will denote the type
reported by each agent to the mechanism by θˆi. (Note that it
is allowed that θi 6= θˆi).
One difficulty that may arise in an agent reporting its
type is that, in general, there is no succinct way to report a
function r̂i. To sidestep this challenge and as a starting point
for our research, we will make the simplifying assumption
for this report that each agent’s reward function is a linear
function. Future research will ideally remove this restriction.
An agent will report two “bid” values to the mechanism to
reveal its reward function: bˆ(m)i and bˆ
(o)
i . Intuitively, bˆ
(m)
i
represents the value of rˆi(0); i.e., the amount that agent i
will “pay” for the mandatory execution requirements with
no optional execution. bˆ(o)i represents the value of rˆi(eˆ
(o)
i )
and the amount agent i is willing to “pay” for the entire
optional execution.. The expanded reported type for agent i
is θˆi = (eˆ
(m)
i , eˆ
(o)
i , pˆi, bˆ
(m)
i , bˆ
(o)
i ). It is easy to see that the
reported function can be derived from this five-tuple such that
rˆi(αi) =
[(
bˆ
(o)
i −bˆ(m)i
eˆ
(o)
i
)
× (αi) + bˆ(m)i
]
, using the assumption
that rˆi is linear and if eˆ
(o)
i is zero then the first term of the
function is zero. The interpretation of this function is that if
agent i executes αi(< eˆ
(o)
i ) units of execution then agent i is
willing to pay rˆi(αi). Agent i’s strategy space is, thus, the set
of possible types (i.e., Φi = Θi = (R≥0)5).
A fundamental difference between competitive and non-
competitive reward-based scheduling is in dealing with the
case of processor overload (i.e., total utilization exceeding one)
with respect to just the mandatory portions of execution. In
non-competitive reward-based scheduling as described in [2],
the tasks systems mandatory execution requirements must be
feasible in order for the system to be correct; if not feasible,
then no task is allocated the processor. In competitive reward-
based scheduling, we allow for the reported value of the
mandatory execution requirements to exceed the processor
capacity. The mechanism will determine which tasks maximize
the total system welfare while ensuring the processor is
not overloaded. Any task that causes overload and whose
inclusion on processor does not maximize total system welfare
will be “rejected” from the task system. The remaining set
of “admitted” tasks will be guaranteed to execute for their
mandatory execution requirements.
The following mixed real and integer linear program deter-
mines the set of admitted tasks and the optimal assignment of
optional execution times (i.e., the αi values) that maximizes
total system welfare with respect to the report agent types.
An additional integer variable yi is introduced to determine
whether task τi will be admitted or rejected.
MAX-REW-C(θˆ1, θˆ2, . . . , θˆN )
.
Maximize
∑N
i=1
[(
bˆ
(o)
i −bˆ(m)i
eˆ
(o)
i
)
× (αi) + yibˆ(m)i
]
,
subject to the following constraints and the restriction
that the variables yi, 1 ≤ i ≤ N , take on {0, 1}
values only:∑N
j=1
yieˆ
(m)
i +αi
pˆi
≤ 1 (3a)
αi ≤ yi · eˆ(o)i (i = 1, 2, . . . , N) (3b)
0 ≤ αi (i = 1, 2, . . . , N). (3c)
(3)
If in the solution to MAX-REW-C yi is set to one, then
agent i’s task τi will be admitted into the system; otherwise,
the agent’s task will be rejected. The utility of agent i is equal
to
ui(θi, θˆ1, θˆ2, . . . , θˆN ) =
{
ri(αi), if yi = 1;
0, otherwise. (4)
where yi and αi are the assigned values in the optimal
solution for MAX-REW-C(θˆ1, θˆ2, . . . , θˆN ). We may interpret
the above utility function to mean that agent i get utility equal
the reward from the received allocation, if admitted; otherwise,
agent i will receive utility of zero, if rejected.
III. PREVIOUS WORK
Recently, the need for efficient protocols that deal with the
autonomy and self-interest of resource owners has increased.
This has motivated the use of mechanism design theory as a
foundation of designing new mechanisms for resource man-
agement in open distributed computing systems. In their semi-
nal paper, Nisan and Ronen [8] introduced algorithmic mecha-
nism design (AMD), which bridged computational tractability
with incentive compatibility. In their work, they used the
celebrated Vickrey-Clarke-Groves (VCG) mechanism [3], [6],
[10] to design a strategyproof mechanism for scheduling on
unrelated machines. A large body of subsequent research on
AMD has followed this initial work by Nisan and Ronen.
However, for systems with hard real-time constraints, the only
work that we are aware of is by Porter [9] which develops
incentive-compatible mechanisms for single one-time jobs
with values, but not for recurring task systems.
IV. INEFFICIENCY OF STANDARD
RESOURCE-ALLOCATION MECHANISM
Unfortunately, we will shortly see (via an example) that
allowing users to report their types θˆ1, θˆ2, . . . , θˆN and then
simply applying the ILP of MAX-REW-C over the reported
types to determine the allocations of the processor does not
always result in the optimal allocation that maximizes total
system welfare. In fact, there are cases where one of the
agent’s has an incentive to lie about its type. Below is a two
agent example.
Example 1: Consider a two-agent system N def= {1, 2}. For
agent 1, let θ1 = (e
(m)
1 , e
(o)
1 , p1, r1(·)) = (1, , 2, r1(·)) and
r1(x) = x + 1, for all 0 ≤ x ≤ . For agent 2, let θ2 =
(e(m)2 , e
(o)
2 , p2, r2(·)) = (1, 0, 2, r2(·)) and r2(x) = δ for all
x ≥ 0 where 0 < δ < 1.
Observe that if each agent, truthfully reported θˆi = θi then
optimal solution to MAX-REW-C(θˆ1, θˆ2) would set α1 =
α2 = 0 and y1 = y2 = 1. That is, both agent 1 and 2 could
execute their mandatory portions of execution completely and
none of their optional execution. The total welfare obtained
from this solution is 1 + δ
However, agent 1 has an incentive to lie about some of
its type parameters. For example, agent 1 could report θˆ1 =
(eˆ(m)i , eˆ
(o)
i , pˆi, bˆ
(m)
i , bˆ
(o)
i ) = (1+, 0, 2, δ+γ, δ+γ) where γ is
some arbitrarily small positive parameter. If agent 2 truthfully
reported its type (i.e., θˆ2 = (1, 1, 2, δ, δ), then agent 1 would be
admitted to the system and agent 2 would not. In this case, the
total system welfare is 1 + . Furthermore, it may be shown
that there exist equilibria where the total welfare is 1 + .
For this example, the loss of efficiency (the ratio of achieved
welfare to optimal obtainable welfare) is (1 + )/(1 + δ). As
→ 0 and δ → 1, the ratio approaches 50%.
V. INCENTIVE-COMPATIBLE REWARD-BASED
ALLOCATION MECHANISM
The example of the previous section makes clear the need
for mechanisms that induce truthful behavior from the partic-
ipating agents. However, to induce truthful behavior from the
agents, the mechanism must introduce some payment function
pii for each agent i ∈ N to the mechanism. Significant
research in AMD for computer science problems focuses on
designing mechanisms using payments with the following two
properties: individual rationality and incentive compatibility.
We briefly define (in words) the two properties in the context
of competitive reward-based scheduling. Future research will
introduce more formal notation.
Definition 1 (Individually-Rational Mechanism): A mecha-
nism is individually rational if, for every agent, the payment
required by the mechanism from the agent does not exceed
the agent’s utility.
The above property is useful for ensuring that each agent has
an incentive to participate in the mechanism.
Definition 2 (Incentive-Compatible Mechanism): A mech-
anism is incentive compatible if, for every agent, the best
strategy for an agent (regardless of any other agent’s strategy)
is to truthfully report θi.
A well-known approach for ensuring the individual-
rationality of the mechanism is the Clarke pivot rule [3].
In words, the rule sets each agent i’s payment equal to
the marginal decrease in total welfare of agents (other than
agent i) due to agent i’s participation in the mechanism.
Formally, pii(θˆ1, θˆ2, . . . , θˆN ) equals MAX-REW-C(θˆ−i) −
(
MAX-REW-C(θˆ1, θˆ2, . . . , θˆN )− yi · rˆi(αi)
)
where θˆ−i is
all the report types except agent i’s.
Theorem 1: The mechanism of using MAX-REW-C to
allocate each agents execution time and using the Clarke pivot
rule to calculate the payments is individually rational.
A proof of the above theorem is straightforward and will
be shown in a future paper. We are currently attempting
to evaluate whether the proposed mechanism is incentive
compatible.
VI. CONCLUSIONS AND FUTURE WORK
As motivated in the introduction, real-time systems are
becoming increasingly open. Therefore, designers of such
open systems need to consider the effect of competition by
different entities and agents in these systems. In this short
report, we have introduced the concept of competitive reward-
based scheduling. We believe this model is a first step toward
the design of mechanisms for determining resource allocation
of recurring real-time tasks owned by competing agents.
For the competitive reward-based scheduling model, we
have given an example to show that the straightforward
allocation mechanism (as given by MAX-REW-C) may result
in a suboptimal allocation, with respect to total system welfare.
Using the Clarke pivot rule, our mechanism is shown to be
individually rational. Currently, we are attempting to prove
(or disprove) whether it is also incentive compatible. Beyond
determining whether the mechanism proposed in Section V
is incentive compatible, there are several open questions.
The Clarke pivot rule we have used requires computing the
MAX-REW-C ILP for N − 1 times: are there polynomial-
time algorithms or approximation algorithms for determining
payments? Also, can we reduce the number of parameters in
each agent’s reported type? Multi-dimensional strategy spaces
are potentially problematic for game-theoretic analysis.
REFERENCES
[1] AUYOUNG, A., RIT, L., WIENER, S., AND WILKES, J. Service
contracts and aggregate utility functions. In 15th IEEE International
Symposium on High Performance Distributed Computing (Paris, France,
June 2006), IEEE Computer Society.
[2] AYDIN, H., MELHEM, R., MOSSE, D., AND MEJIA-ALVAREZ, P.
Optimal reward-based scheduling of periodic real-time tasks. IEEE
Transactions on Computers 50, 2 (February 2001), 111–130.
[3] CLARKE, E. Multipart pricing of public goods. Public Choice 8 (1971),
17–33.
[4] DEY, J., KUROSE, J., TOWSLEY, D., KRISHNA, C., AND GIRKAR, M.
Efficient on-line processor scheduling for a class of iris real-time tasks.
In Proceedings of the 13th ACM SIGMETRICS Conference (1993).
[5] GROSU, D. Agora: An architecture for strategyproof computing in grids.
In Proc. of the 3rd International Symposium on Parallel and Distributed
Computing, IEEE Computer Society Press (July 2004), pp. 217–224.
[6] GROVES, T. Incentive in teams. Econometrica 41, 4 (1973), 617–631.
[7] LIU, J., LIN, K., SHIH, W., YU, A., CHUNG, J., AND ZHAO, W.
Algorithms for scheduling imprecise computations. Computer magazine
(May 1991), 58–68.
[8] NISAN, N., AND RONEN, A. Algorithmic mechanism design. Games
and Economic Behaviour 35, 1/2 (April 2001), 166–196.
[9] PORTER, R. Mechanism design for online real-time scheduling. In EC
’04: Proceedings of the 5th ACM Conference on Electronic Commerce
(New York, NY, USA, 2004), ACM Press, pp. 61–70.
[10] VICKREY, W. Counterspeculation, auctions, and competitive sealed
tenders. Journal of Finance 16, 1 (March 1961), 8–37.
Real-Time Triangulation Based on Measurements
from Mobile ADS–B Aircraft
Daniel Uhlig, Negar Kiyavash and Natasha Neogi
Coordinated Science Laboratory
University of Illinois Urbana-Champaign
Email: duhlig2@uiuc.edu, kiyavash@iti.uiuc.edu, neogi@uiuc.edu
Abstract—We explore adaptive real-time localization in a
dynamic environment with many moving agents aware of their
location (a.k.a beacons) and receivers which find their position
using the regularly broadcast information from the beacons.
Specifically, we consider the information from emerging Auto-
matic Dependent Surveillance Broadcast (ADSB) available for
aircraft communications to discover the position of receivers.
We propose an attack-resistant position discovery algorithm
that enables an efficient and secure positioning in presence of
both faults and malicious behavior by using the redundancy of
multiple beacons.
I. INTRODUCTION
Air traffic is divided into sectors for air traffic controllers to
manage, with each each controller designed to typically handle
20-30 aircraft. In busy airspace, the sectors are small, while in
areas with little overflight, the sectors can be quite large. Air
traffic networks employ Automatic Dependent Surveillance
Broadcast (ADS–B), for communication. As air traffic spreads
out, it is possible for the number of aircraft within ADS–
B range of a location to becomes small. As the number of
ADS–B aircraft decrease, the opportunity for a few malicious
nodes to affect the system becomes more pronounced.
Location information based on the position of sensor nodes
is useful in networks with varying node capabilities. Networks
must handle faulty nodes; depending on the level of oversight
in the network, differing incorrect behaviors must be handled.
GPS has seemingly solved the positioning problem for many
users, but shortcomings still exist [1]. A variety of methods
relying on alternative broadcast data have been proposed to
overcome these shortcomings. One method employs existing
radio signals, such as radio or TV, to triangulate within cities
[2]. Triangulation based on existing signals must undergo more
robust metrics to filter out faulty or malicious data. ADS-B
information is available anywhere there is air traffic, including
most large urban areas, due to large international airports
possessing significant air traffic and ADS-B signals. Beyond
ADS-B, the ideas within this paper can be applied to numerous
applications, such as undersea buoy navigation systems [3].
II. ADS-B
The ADS–B protocol facilitates datalink communication
between enabled aircraft [4]. It automatically broadcasts at
regular (1 Hz) intervals without pilot input. The signal is
recieved by everyone within range. The goal of ADS–B is to
improve airspace congestion by increasing the awareness of air
traffic. ADS–B broadcasts have a range of 90% probability of
reception at 150 km decreasing to 0% at 190 km.[5]
The ADS–B broadcast contains a number of different pa-
rameters that are updated for each new packet. The parameters
included are still being finalized, but the position, velocity and
time (based of off GPS) form the most basic layer[6]. When
many aircraft are broadcasting ADS–B information within
range, they form a sensor network of beacons that can be
utilized by receivers. By combining time stamp and location
information the receivers can triangulate their own location
within the sensor network of aircraft.
III. PROBLEM STATEMENT
As ADS-B becomes more prevalent in highly instrumented
aircraft, at any given location the data will be available from
numerous sources within range of a receiver. The instrumented
aircraft form a grid of moving sensors continually broadcasting
position information. Using this network and triangulation
methods, the receivers can discover their position.
Existing triangulation methods [7] are implemented to re-
solve GPS position. They take the location of each broadcast
beacon (satellite) along with a time stamp to calculate a travel
time between the beacon and receiver. The travel time is
proportional to the distance between the two nodes. Using
multiple beacons, the receiver can finds its position.
Unlike GPS, the beacons in the ADS-B network are un-
reliable black boxes. Each node operates with its own goals
(delivering cargo or passengers to an airport) and the informa-
tion being broadcast is not guaranteed to be reliable, but the
beacons operate under FAA regulations. The regulations and
oversight do not guarantee data correctness, but they do limit
malicious actions. Issues with the data can be either faulty
sensor nodes or malicious agents. Faulty beacons can develop
at anytime because of incorrect GPS measures, or other
instrumentation problems. The malicious agents can modify
anything within the message (time stamp, position, velocity)
to deceive a receiver node. The order that the messages are
received, as well as the unique identifier that is associated with
a beacon, cannot be altered.
This work focuses on real-time triangulation for networks
of aircraft (agents) with varying capabilities across the nodes.
Since the degree of manoeuvering, the size of airspace sectors,
and number of malicious/faulty measurements change dynam-
ically at run-time, the frequency of real-time triangulation
becomes adaptive depending on the required accuracy of
estimated location.
We plan to use an FPGA-based real-time reconfigurable
platform like [8] to implement our protocol. In particular,
by developing a HW and SW version of the triangulation
algorithm, we can easily achieve high location accuracy when
the HW version is running without overloading the CPU, while
low frequency triangulation can be performed in software
freeing FPGA area for more demanding tasks. The HW/SW
real-time reconfigurable platform described in [8] will ensure
that the real-time constraints are always met at run-time even
during mode changes of agents (i.e., when the triangulation
task changes its frequency and/or migrate from software to
hardware or viceversa).
IV. UNRELIABLE AGENTS
If we consider a malicious agent as possessing the ability
to intelligently and consistently misrepresent themselves to
other agents, it becomes difficult to determine the position of
a given aircraft. Consider three ADS–B enabled aircraft, two
of which are conforming and one which is not, and a single
aircraft P attempting to triangulate its position with respect
to the three aircraft. Assume, without loss of generality, the
case, where the two conforming aircraft, A and B, broadcast
their signals simultaneously(containing position, velocity and
unaltered time stamp), which are then received by the aircraft
P at times tA and tB . The onboard computer for P converts
the time difference in receiving the signals into the distance
difference:
| tA − tB | vc =| −→PA | − | −−→PB | (1)
where vc is the speed of light. For tA > tB , this locates the
aircraft P on one branch of the hyperbola defined by:
x2
a2
+
y2
b2
= 1 (2)
where:
| −→PA | − | −−→PB | = ±2a (3)
|
−−→
AB
2
|2 = a2 + b2 (4)
and the x-axis lies along the line | −−→AB | with the origin taken
at its midpoint.
Now, the addition of the input from the third aircraft C is
used to fix the position of the aircraft P on the branch of the
hyperbola given by (2).
However, for any two broadcasting aircraft, there are only
two points at which the receiving aircraft can be: P , and
P˜ , where P˜ is the reflection of P on the hyperbolic branch
about the x-axis. If C is faulty, and generates randomly
incorrect broadcasts, after two successive broadcasts this will
become apparent, as P can store its possible positions pairs
{P, P˜}(A,B) with respect to the broadcasting aircraft (A,B).
Given that P knows its own velocity ~v, and receives successive
broadcasts from the pair (A,B), P can then compare its
projected positions {P + ~v∆t, P˜ + ~v∆t}, to the elements of
Fig. 1. Triangulation with ADS-B
the position pair {P ′, P˜ ′}(A,B). Thus, to estimate its position,
instead of processing broadcasts from a large number of
agents, P could start with the smaller subset of its nearest three
neighbors, and determine whether or not a consistent position
is achieved by processing the broadcasts of all pairings over
two successive broadcast intervals. If C does not create its
initial and successive broadcast such that it is consistent with
the velocity of P , as well as the projected position and velocity
of one of the two other aircraft (say B), it will be identified
as faulty in the second round of broadcast, and P will have
the correct estimate of its position.
For C to be consistent, C must be able to adjust the
time at which its broadcast is received by P such that the
parameterized trajectory given by one of {P (t),or P˜ (t)}(C,B)
is consistent with the velocity of P . This would require that
C either alter its broadcast rate (or time stamp), or that there
exists a flight trajectory for C such that | −−→CP (t) |=| −−→CP (C,B) |
for all successive broadcast times. This second condition
requires that C be able to fly an extremely precise route,
with respect to P and B; considering the oversight of the
ADS–B network, such behavoir would be quickly detected.
Furthermore, if an additional aircraft D is considered, even if
it is faulty, after two successive broadcasts, C will be revealed
as faulty, unless C and D are both in collusion. Thus, for a
malicious agent to be able to go undetected, it must be able
to either successively alter its broadcast rate (or time stamp),
or it must be consistent with respect to all but one of the
other conforming aircraft, and collude with all other non-
conforming aircraft. So, statistically, only a small number of
aircraft need be sampled over two subsequent broadcast in
order to identify maliciousness.
V. OUTLIER DETECTION
Several popular position estimation algorithms that do not
use GPS-like infrastructure are presented in [10], [11]. Li et al.
propose the use of robust outlier detection statistical models to
achieve robust position estimation [12]. They propose a prob-
abilistic approximation to the least median of squares (LMS)
approach [13] to circumvent computational complexity. Liu et
al. presented a greedy algorithm to filter out the attacker’s
data on the basis of a consistent minimum mean square
error (MMSE) criterion between received measurements from
multiple beacons [14].
As shown in [15], our approach removes the anomalies
in a shorter runtime than both the greedy algorithm of [14]
and LMS with superior accuracy. For independent attackers,
the performance of all three algorithms is comparable, with
our algorithm having a slight edge. However, when attackers
collude, the new approach clearly dominates both the greedy
algorithm of [14] and LMS. In the context of sensor networks,
randomized consensus has been applied to distributed object
tracking [16] and time-synchronization [17].
A. Attack Model
In our attack model, malicious aircraft modify the position
measurements (time stamp) without any restrictions. However,
transmissions authenticated, so the same attacker cannot im-
personate other aircraft. Thus, data from a malicious aircraft
is entered only once in the overall data set available to the
receiver aircraft.
Note that the beacons and receivers are not stationary as the
aircraft travel at cruise speeds of approximately 250 m/s. As
the aircraft move, each radio signal propagates at the speed
light (3x108 m/s). At maximum range (150 km), the radio
signal takes approximately 500 microseconds to travel to the
receiver aircraft. During the signal’s travel time, an aircraft
moves 0.123 meters, well below precision of the navigation
solution being broadcast over ADS-B. In the worst case (two
aircraft closing towards each other) the distance between the
aircraft would shrink by a mere 24 centimeters. Therefore, in
the time the radio signal travels it can be assumed that the
aircraft is stationary.
Even though, position estimation occurs in three dimen-
sions, the altitude estimation task is augmented by onboard
instrumentation (i.e. pressure sensors), thus decoupling it from
the planar positioning task. Hence, in the rest of this paper we
consider triangulation only in two dimensions. But, the same
algorithm with slight modification can be extended to three
dimensions.
B. Position Estimation in Presence of Attacks
In presence of attackers (outliers), the we seek to construct
good estimates of the unknown. A good estimate is the one that
is consistent with benign measurements while it differs from
the corrupted measurements according to a given criteria. In
our proposed framework, this criteria is a consistency metric
δ. The metric δ is selected by the user and it is driven by
nature of the attack, i.e., the statistical properties of the attack.
Formally stated, the position problem is:
Instance. A node s0 with unknown coordinates (x0, y0);
a set L of position information tuples {(xn, yn, dn)} cor-
responding to ADS–B aircraft {sn} where (xn, yn) are the
coordinates of the n-th aircraft sn and dn is the measured
distances from sn to s0 for n = 1, . . . ,N ; a consistency metric
δ(sn, s0); a consistency threshold t.
Problem. Find an estimate for the coordinates of s0 denoted
as sˆ0 = (xˆ0, yˆ0), such that it is at least δ-consistent with t
points in the set L.
A measurement (xn, yn) is δ-consistent with the estimate
sˆ0 if and only if δ(sn, sˆ0) is within a given confidence interval
CI . We call the set of δ-consistent points with the estimate
sˆ0 = (xˆ0, yˆ0), consensus set of sˆ0. The parameter t is the size
of the consensus set. We choose the metric δ as the Euclidean
distance.
VI. ATTACK-RESISTANT RANDOMIZED POSITION
ESTIMATION ALGORITHM
Unlike the previous approaches to attack-resistant position
estimation [13], [12], [14], which use as much data as possible
to estimate the unknown coordinates, our approach starts
by picking a small (but sufficient) subset of the data and
subsequently augments it with consistent data. The proposed
framework arbitrarily selects an initial subset and employs
a randomized algorithm to determine the set of consistent
measurements.
Algorithm 1. Randomized Consistent Position Estimation
Input: set L, δ-consistency interval CI , threshold t,
maximum number of iterations imax.
1. Initialize i=1;
2. While (i < imax) {
3. Randomly draw a subset Si of size 3 from L;
4. Use Si to estimate the position sˆ0;
5. Calculate K, the number of δ-consistent points with
respect to the estimate sˆ0 in L\Si ;
6. If (K > t) {
7. Form new estimate sˆ0 from K consistent points;
8. Terminate the program;}
9. Increment i; }
10. Terminate program either by announcing failure or
output the largest consistent estimate;
In light of the new algorithm’s minimalist methodology,
the approach first estimates the position of the unknown
node sˆ0 from some randomly selected subset of 3 nodes,
Si (Steps 3 − 4). To find this position, we use the MMSE
approximation algorithm described by Savvides et al. [10].
Next, the algorithm verifies if this estimate sˆ0 is consistent
with enough data points, or equivalently if the size of the
consensus set, (parameter K), is large enough, i.e it is larger
than the given threshold t (Steps 5− 6). As mentioned earlier
in Section V-B, consistency is computed with respect to the
measure δ. Two methods for determining the threshold t which
determines the size of consensus set, are presented in Section
VI-C.
Ideally, one would like to test all possible subsets of size
3, i.e.
(
N
3
)
and select the one with largest consistency set.
However, for large L, this is computationally intractable.
Instead, we attempt a total of imax times where, the quantity
imax is the predetermined total number of the trials. In Section
VI-A we demonstrate how to choose imax such that the
algorithm can find a consensus set with high probability.
Once a consistency set has been identified, the algorithm
uses all points in the consensus set to form the final estimate
of sˆ0, then terminates (Steps 7-8). We use a MMSE procedure
for computing both the initial estimate sˆ0 from the subset Si
and for the final estimate derived from the consensus set. If
the algorithm performs imax iterations and does not find a
consensus set of at least size t, it either declares a failure
or it outputs the MMSE estimate obtained from the largest
consensus set that has been found (Step 10).
A. Choice of Total Number of Iterations
The number of iterations imax of Algorithm 1 depends on
the percentage of the outliers. Intuitively, the algorithm must
keep picking random subsets of data set L for at least expected
number of trials E[i] to find a good subset of size 3. Let Na
denote the number of the outliers in the data set L and let q be
the probability that a randomly drawn data point is consistent
with the model. The expected number of trials E[i] = 1q3 .
One way of computing imax is to exceed E[i] by two
standard deviations. It is shown [18] that the standard deviation
of E[i] is approximately equal to E[i]. Therefore, we can
choose imax ≈ 3E[i]. Another approach for choosing imax is
to ensure that the probability of missing a good subset is below
a threshold η. This implies that the total number of iterations
imax must satisfy: (1− q3)imax = η, or equivalently,
imax =
ln η
ln(1− q3) (5)
Let I denote the set of inliers. If ρ = 1− NaN is the percentage
of inliers, then q = (
I
3)
(N3 )
=
∏2
j=0
I−j
N−j and for large data set,
q ≈ ρ3 and E[i] = ρ−9. Substituting for q in (5), gives the
number of iterations imax in terms of the percentage of the
outliers in the data set L, imax = ln ηln(1−ρ9) .
Note that, if an estimate of the number of the attackers Na
is available, then (5) gives the maximum number of subsets
Algorithm 1 must try before quitting to search for outlier free
subsets of size 3.
B. Determining δ-Consistency Interval
Our assumption is that non-malicious (benign) distance
measurement errors are i.i.d Gaussian random variables dis-
tributed according to N (0, σ2). The consistency metric δ
calculates the distance between the real position of s0 (x0, y0)
with respect to the position reference (xn, yn), i.e. δ(sn, s0) =
dn −
√
(x0 − xn)2 + (y0 − yn)2.
The assumption of Normal distribution for the errors is
based on s0’s real position (x0, y0). We apply the same
distribution to approximate the distribution of δ which mea-
sures the error in estimated position coordinates sˆ0=(xˆ0, yˆ0).
For example, under the assumption of Normality, the 95%
confidence interval (CI) that a given δ′n is drawn from the
Normal distribution N (0, σ2), is [-1.96 σ, 1.96 σ]. In other
words, given δ′n a realization of the random variable δn, the
confidence interval CI determines whether δ′n is drawn from
the same distribution as δn, with more than 95% probability.
We refer to the confidence interval as the δ-consistency interval
CI , which is an input to the Position Estimation Algorithm
1. Note that the error variance σ2 is usually dependent on
the distance measurement technique (e.g., RSSI, TDoA) and
the environment where the aircraft fly. Therefore, the error
variance can be estimated via a set of offline measurements.
C. Consensus with Respect to t
The randomized paradigm presented in Algorithm 1 as-
sumes that t, the estimate of the size of the consensus set
is given. If the number of attackers (Na) are known a priori,
one could set the size of the consensus set equal to N −Na.
However, in many practical scenarios one would need to
estimate t. In [15], we offer two methods for estimating t.
One is to employ the threshold selection strategy proposed by
Liu et al. [14]. The second method is based on dynamic search
to determine the number of outliers Na.
REFERENCES
[1] Y. Cui and S. S. Ge, “Autonomous Vehicle Positioning With GPS
in Urban Canyon Environments,” IEEE Transactions on Robotics and
Automation, vol. 19, pp. 15–28, 2003.
[2] J. Dittmer, “Solving the GPS Urban Canyon Problem,” Frost & Sullivan
Market Insight, 2005.
[3] A. P. A. Alcocer, P. Oliveira, “Underwater Acoustic Positioning Systems
Based on Buoys with GPS,” Proceedings of the Eighth European
Conference on Underwater Acoustics, 2006.
[4] G. Loveness and R. Barhydt, “ADS-B and AOP Performance within a
Multi-Aircraft Simulation for Distributed Air-Ground Traffic Manage-
ment.”
[5] J. Scardina, “Overview of FAA ADS-B Link Decision,” 2002.
[6] E. Valovage, “Enhanced ADS-B Research,” IEEE A&E Systems Maga-
zine, pp. 35–39, 2006.
[7] J. L. Awange and E. W. Grafarend, “Algebraic Solution of GPS Pseudo
Ranging,” GPS Solutions, vol. 5, pp. 20–32, 2002.
[8] R. Pellizzoni and M. Caccamo, “Real-time management of hardware and
software tasks for fpga-based embedded systems,” IEEE Transactions on
Computers (TC), vol. 56, no. 12, pp. 1666–1680, Dec. 2007.
[9] N. Priyantha, A. Chakraborty, and H. Balakrishnan, “The Cricket
Location-Support System,” in Proceedings of ACM International Con-
ference on Mobile Computing and Networking (MobiCom), 2000, pp.
32–43.
[10] A. Savvides, C. Han, and M. Strivastava, “Dynamic fine-grained local-
ization in ad-hoc networks of sensors,” in Proceedings of AC Interna-
tional Conference on Mobile Computing and Networking (MobiCom),
2001, pp. 166–179.
[11] D. Niculescu and B. Nath, “Ad hoc positioning system (APS) using
AoA,” in Proceedings of IEEE Conference on Computer Communica-
tions (INFOCOM), 2003, pp. 1734 – 1743.
[12] Z. Li, W. Trappe, Y. Zhang, and B. Nath, “Robust statistical methods
for securing wireless localization in sensor networks.” in Proceedings
of The International Symposium on Information Processing in Sensor
Networks (IPSN), 2005, pp. 91–98.
[13] P. Rousseeuw and A. Leroy, Robust Regression & Outlier Detection.
New York, NY: John Wiley & Sons, 1987.
[14] D. Liu, P. Ning, and W. Du, “Attack-resistant location estimation
in sensor networks,” in Proceedings of International Symposium on
Information Processing in Sensor Networks (IPSN), 2005, pp. 99– 106.
[15] N. Kiyavash and F. Koushanfar, “Anti-Collusion Position Estimation in
Wireless Sensor Networks,” in IEEE International Conference on Mobile
Adhoc and Sensor Systems (MASS), Pisa, Italy, 2007, pp. 1–9.
[16] T. Roosta, M. Meingast, and S. Sastry, “Distributed reputation system
for tracking applications in sensor networks,” in International Workshop
on Advances in Sensor Networks (IWASN), 2006.
[17] M. Manzo, T. Roosta, and S. Sastry, “Time synchronization attacks in
sensor networks,” in Proceedings of the ACM workshop on Security of
ad hoc and sensor networks (SASN), 2005, pp. 107–116.
[18] M. A. Fischler and R. C. Bolles, “Random sample consensus: a paradigm
for model fitting with applications to image analysis and automated
cartography,” Communications of the ACM, vol. 24, no. 6, pp. 381–395,
1981.
Towards Server-based Switched Ethernet for Real-Time Communications∗
Ricardo Marau, Luı´s Almeida, Paulo Pedreiras
DETI / IEETA
Universidade de Aveiro, PORTUGAL
email: {marau,lda,pbrp}@ua.pt
Thomas Nolte
MRTC
Ma¨lardalen University, SWEDEN
email: thomas.nolte@mdh.se
Abstract
This paper presents work-in-progress on server-based
Switched Ethernet (SE) for real-time communications. It
joins the recent FTT-SE protocol [7] with concepts from the
Server-CAN protocol [9] to allow handling aperiodic mes-
sage streams with arbitrary arrival patterns while support-
ing the derivation of timeliness guarantees. The presented
approach enables an efficient implementation of arbitrary
server schedulers in the switch, as well as their hierarchi-
cal composition. Moreover, the presented approach is very
suitable for open systems as servers can easily be added,
changed and removed during runtime of the switch. Cur-
rently, several server-based policies are being implemented,
which will allow carrying out comparisons among different
policies as well as verifying the capability of the protocol
for on-line integrated management of the servers.
1 Introduction
Ethernet has established itself as one of the most impor-
tant networking technologies for systems with extra func-
tional requirements on timing, and after the introduction
of Switched Ethernet (SE) a number of approaches guar-
anteeing real-time communications have been introduced.
This paper presents a new approach for SE systems that
(1) are free of queues overflow, (2) support advanced traffic
scheduling policies and (3) can provide real-time guarantees
even in the presence of aperiodic communication and/or
faults in the time domain (e.g. babbling idiots).
Over the years, dealing with real-time messages with
different arrival patterns, such as periodic and aperiodic,
has been commonly achieved by treating them in different
ways. Instead, this paper proposes an approach where all
types of messages are scheduled the same way, i.e., there
is no distinction between periodic and aperiodic messages.
The core of the approach is the use of a Master/Slave ap-
∗This work was partially supported by the European Comission through
grant ArtistDesign ICT-NoE-214373, the Portuguese Government through
grant FCT - SFRH/BD/25261/2005 and the Swedish Foundation for Strate-
gic Research (SSF), via the research programme PROGRESS.
proach that facilitates enforcing full control over streams of
messages, no matter their corresponding arrival patterns.
Centralized scheduling enables a flexible treatment of
any kind of messages, a property specially suited for sys-
tems with dynamic behaviour concerning the communica-
tion requirements. When compared to non-centralized ap-
proaches, the Master easily implements admission control
functionality as well as advanced Quality-of-Service (QoS)
management mechanisms. This flexibility makes the ap-
proach particularly suitable for open environments where
servers might join, leave and be modified during runtime.
The paper starts by overviewing some related work on
real-time Ethernet as well as some server policies for CPU
and network scheduling. Then, it proposes using a Mas-
ter/Slave protocol over SE to allow a local management of
all servers, facilitating their on-line creation, deletion, adap-
tation and composition. The paper advocates that such a
centralized management of the servers provides the required
support for open distributed real-time systems as well as
for dynamic QoS management. Moreover, this approach
also allows implementing on a network any CPU-oriented
server-based scheduling policy, possibly with hierarchical
composition, increasing the flexibility of the system. The
required signaling for the Slaves to notify the Master about
their requests is also discussed. An efficient signaling is
provided by the FTT-SE protocol, which takes advantage of
the duplex feature of common Ethernet Switches. It is then
possible to use FTT-SE to handle general message streams
with any arrival patterns and still provide timing guarantees.
The paper is organized as follows: Section 2 presents
related work; Section 3 revisits the basics of FTT-SE; Sec-
tion 4 presents the core of our proposal including prelim-
inary experiments; and finally, Section 5 summarises the
paper and addresses the current work-in-progress.
2 Related work
2.1 Real-time communications with SE
This section briefly reviews some of the most relevant
techniques to enforce a real-time behaviour on Switched
Ethernet (SE); a deeper discussion can be found in [10].
One approach consists in enhancing the switch with traf-
fic control and scheduling capabilities, e.g., Hoang et al [5]
propose the inclusion of EDF traffic scheduling and on-line
admission control inside a switch. The EtheReal protocol
[16] presents a similar architecture, also based on a mod-
ified Ethernet switch. PROFINET Isochronous Real Time
(IRT), a new PROFINET real-time profile [12], employs a
distributed cyclic time-slotting scheme encompassing a de-
terministic time-triggered phase and an asynchronous phase
for non-real-time traffic. Another class of techniques con-
sists in using a traffic shaper in each node to limit the bursti-
ness and amount of the load submitted to the network and
prevent memory overflows, e.g., as proposed in [6]. Exam-
ples of master/slave techniques include the EtherCAT pro-
tocol [12], which uses specialized switches and an open-
ring topology. Another example is the ETHERNET Pow-
erlink protocol (EPL) [1], where a master node explicitly
triggers each transaction according to a table schedule. Fi-
nally, standard switched Ethernet infrastructures, relying on
plain COTS switches, network interface cards (NIC) and IP
stacks, can also be used, e.g., as in Ethernet/IP [2]. Avoid-
ing overloads and achieving timely behaviour in this case
requires a careful analysis by the system designer since
there are no run-time mechanisms to enforce it. The use of
traffic shapers [6] is the approach that relates more closely
to the one proposed in this paper. Shapers are indeed servers
that bound the maximum amount of traffic that a node can
send to the network. In this paper we propose a centralized
management of the servers, which efficiently supports their
prompt reconfiguration as needed for dynamic systems and
particularly for dynamic QoS management.
2.2 Server-based CPU scheduling
In the real-time scheduling literature many types of
server-based schedulers have been presented for Fixed Pri-
ority Systems (FPS) and Dynamic Priority Systems (DPS).
These schedulers are characterised partly by the mechanism
for assigning deadlines, and partly by a set of parameters
used to configure the servers, e.g., bandwidth, period and
capacity. The Polling Server (PS) [13] is one of the sim-
plest FPS servers. A PS allocates a share of a resource to the
users of the server. This share is defined by the server pe-
riod and capacity. The Deferrable Server (DS) [15] is more
responsive compared to the PS. In general, the DS gives
better response times than the PS, but it has a lower schedu-
lability bound. By changing the way capacity is replenished
for a server, the Sporadic Server (SS) [13] is a server-based
scheduler for FPS systems that allows high schedulability
without compromising too much the responsiveness.
Examples of Earliest Deadline First (EDF) based DPS
servers include, e.g., the Dynamic Sporadic Server (DSS)
[14]. A very simple (implementation wise) server-based
scheduler that provides faster response-times compared
with SS is the Total Bandwidth Server (TBS) [14]. TBS
makes sure that the server never uses more bandwidth than
allocated to it, yet providing a fast response time to its users
(under the assumption that the users do not consume more
capacity than what they have specified). When the users
desired usage is unknown, the Constant Bandwidth Server
(CBS) [3] can be used, guaranteeing that the server users
will never use more than the server capacity.
2.3 Server-based traffic scheduling
In the network domain, probably for historical reasons,
the names given to servers are different. For example, a
common server used in networking is the leaky bucket. This
is a specific kind of a general server category called traf-
fic shapers, which purpose is to limit the amount of traf-
fic that a node can submit to the network within a given
time window, bounding the node burstiness. These servers
use a technique similar to those described in the previous
section, based on capacity that is eventually replenished.
Many different replenishment policies are also possible, be-
ing the periodic replenishment as with PS or DS, the most
common. However, it is hard to categorize these network
servers similarly to the CPU servers referred in the pre-
vious section because networks seldom use clear fixed or
dynamic priority traffic management schemes. For exam-
ple, there is a large variability of Medium Access Control
(MAC) protocols, some of them mixing different schemes
such as round-robin scheduling with fixed priorities, first-
come-first-served, first-come-first-served with mutiple pri-
ority queues, etc.
In this paper we advocate that, using an adequate proto-
col, such as one based on the FTT paradigm [4, 7, 11], it
is possible to control the traffic in a way that allows imple-
menting any of the CPU-oriented server-based scheduling
techniques, which are probably better studied from the tim-
ing behaviour point-of-view.
3 FTT-SE brief overview
FTT-SE [11] is an RT communications protocol that ex-
ploits the FTT and the Mater/Multi-slave paradigms and the
advantages brought by Ethernet micro-segmentation such
as parallel forwarding and absence of collisions. The Mas-
ter token, called Trigger Message (TM), is transmitted pe-
riodically, creating Elementary Cycles (EC) and polling the
Slaves to transmit messages in two consecutive time win-
dows designated synchronous and asynchronous. The for-
mer is associated with the transmission of the synchronous
(periodic) traffic, which is triggered by the Master node.
The asynchronous traffic (aperiodic) is triggered by the
nodes and transmitted in the asynchronous window, being
further divided into RT and NRT traffic. The former class
has a minimum inter-transmission time (Tmit) associated
Figure 1. Full-duplex signaling mechanism
to it that is enforced by the Master, i.e., consecutive polls
are, at least, that interval apart. This traffic is also polled
on a per message basis. The NRT traffic is handled when
there is no RT traffic pending and it can be handled in a
per message, per node or per type basis. In order to handle
aperiodic requests, the FTT-SE protocol uses an efficient
signaling mechanism explained in [8] that exploits the full
duplex feature of most common switches. All asynchronous
messages (AM) generated in a node during one EC are in-
tercepted and enconded in a signaling message that is sent
to the Master during the guarding and turnaround windows
(Fig. 1). This mechanism ensures that the signalling mes-
sages do not interfere with the operation of the protocol, i.e.,
with the TM or the polled messages.
The Master receives the requests and inserts them in the
appropriate scheduling queues. Then, the scheduler decides
in which EC such traffic will be polled (Fig. 2). The signal-
ing latency (Lsig) can take between one and two ECs while
the polling latency (Lpol) always takes less then one EC.
The scheduling latency (Lsch), not shown in Fig. 2, varies
according to the scheduling policy used, the attributes of the
specific message being scheduled, and the current load.
Figure 2. Handling aperiodic traffic in FTT-SE
4 A proposal for a Server-SE protocol
This paper proposes a protocol that is capable of han-
dling general message streams in SE with any arrival pat-
tern and still provide timing guarantees by making use of
server-scheduling techniques. For this purpose we propose
using FTT-SE configured without synchronous window and
with asynchronous traffic, only. Then, a set of servers is
implemented in the Master to handle the requests generated
by the nodes and arriving at the Master by means of the
FTT-SE asynchronous signaling mechanism.
At run-time, all nodes must negotiate with the Master
the creation of adequate servers to handle specific types of
traffic. This negotiation is carried out before the actual com-
munication and using signalling messages, too, in which the
node states its communication requirements. The Master
answers using the following TM in which it piggybacks the
appropriate information, e.g., whether the server was actu-
ally created or not. Typically, the communication require-
ments would be expressed in terms of admissible ranges ac-
cording to different levels of admissible QoS. These ranges
can be used by the Master to manage dynamically the QoS
of the servers already running, in order to try to accomodate
the new request, if needed.
This mechanism is not transparent for the nodes. Thus,
for legacy applications, a small program to carry out this
negotiation must be executed before the respective appli-
cation. Conversely, it is possible to automatically create a
background server per node, once it is connected, consider-
ing its traffic as broadcast and dividing the bandwidth avail-
able among all nodes, and schedule its traffic in a per-node
basis. This background server does not provide minimum
QoS guarantees, since it uses the bandwidth left available
by other higher priority servers, but grants any node some
immediate communication capabilities without the need for
specific server negotiations. Specific servers can then be
created for specific traffic, if needed later on.
Figure 3. Internals of the Server-SE Master
4.1 Preliminary experiments
In order to validate the feasibility of the proposed ap-
proach we configured an FTT-SE system with an EC of
1ms over a 100Mbps switch. Two slaves were used. We
then created 3 aperiodic streams with the following proper-
ties AM1 and AM2 sent by Slave 1 with 2kB and a min-
imum inter-arrival time mit of 6 and 10 ECs, respectively,
and AM3 sent by Slave 2 with 1kB and a mit of 5 ECs.
The first two messages were automatically fragmented in 2
packets per instance. Three sporadic servers were set, one
for each message, and each one with capacity for only one
instance and a period equal to mit. Moreover, all three spo-
radic servers were executed within a polling server with ca-
pacity for 240µseconds per EC. This server does not allow
scheduling more than 2 packets of the referred messages
per EC. The generation of the message requests was car-
ried out by a program in the Slaves that would run every EC
and produce the messages with a given probability so that
the average rate was slightly below the capacity of the spo-
radic servers. Figure 4 shows the results for message AM3
with time measured in number of ECs. The top histogram
shows the inter-arrival times of the requests (signalling mes-
sages arrived at the Master). The lower histogram shows
the inter-arrival times of message AM3 as scheduled by the
Master (and transmitted by the respective Slave). The mea-
sures were actually carried out within the Master (differ-
ence between consecutive requests arrivals and difference
between consecutive scheduled transmissions). It is clear
that, despite the frequent bursts of consecutive requests, the
transmissions respect the mit corresponding to the respec-
tive sporadic server period. The histograms of the other two
messages show similar behavior.
Figure 4. Histogram of inter-arrival times of
messages AM3
5 Summary and work-in-progress
This paper laid the ground towards a new real-time com-
munications protocol for Switched Ethernet (SE) that is ca-
pable of handling aperiodic message streams with arbitrary
arrival patterns while still providing real-time guarantees.
This fact grants a high level of flexibility to this protocol,
as well as a high robustness, since it ensures that no over-
loads will occur within the switch and provides an adequate
level of temporal isolation among the streams. Moreover,
the protocol supports the dynamic creation, deletion and
adaptation of the servers that handle the message streams,
which is done in an expedite way, given their centralization
in a Master node.
We are currently testing the traffic isolation capabili-
ties of the proposed protocol and evaluating different server
policies as well as developing the corresponding timing
analysis.
References
[1] Ethernet Powerlink protocol home page. http://www.ethernet-
powerlink.org/.
[2] Ethernet/IP (Industrial Protocol) specifications. http://www.odva.org.
[3] L. Abeni and G. Buttazzo. Integrating multimedia applications
in hard real-time systems. IEEE Real-Time Systems Symposium
(RTSS’98), pages 4–13, Madrid, Spain, December 1998. IEEE Com-
puter Society.
[4] L. Almeida, P. Pedreiras, and J. A. Fonseca. The FTT-CAN protocol:
Why and how. IEEE Trans. Industrial Electronics, 49(6):1189–1201,
December 2002.
[5] H. Hoang, M. Jonsson, U. Hagstrom, and A. Kallerdahl. Switched
real-time Ethernet with earliest deadline first scheduling: protocols
and traffic handling. IEEE Parallel & Distributed Processing Sym-
posium (IPDPS’02), pages 94–99, Fort Lauderdale, FL, USA, April
2002. IEEE Computer Society.
[6] J. Loeser and H. Haertig. Using Switched Ethernet for Hard Real-
Time Communication. Parallel Computing in Electrical Engineering,
Int. Conf. on (PARELEC’04), pages 349–353, Dresden, Germany,
Sept. 2004.
[7] R. Marau, L. Almeida, and P. Pedreiras. Enhancing real-time commu-
nication over COTS Ethernet switches. WFCS’06: IEEE Workshop
on Factory Communication Systems, pages 295–302, 27 June 2006.
[8] R. Marau, P. Pedreiras, and L. Almeida. Asynchronous Traffic Sig-
naling over Master-Slave Switched Ethernet Protocols. RTN07: 6th
Int. Workshop on Real-Time Networks, 3 July 2007.
[9] T. Nolte. Share-Driven Scheduling of Embedded Networks. PhD
thesis, Department of Computer and Science and Electronics,
Ma¨lardalen University, Sweden, May 2006.
[10] P. Pedreiras and L. Almeida. Approaches to enforce real-time be-
haviour in ethernet. In R. Zurawski, ed., The Industrial Communica-
tion Technology Handbook. CRC Press, Taylor & Francis, 2005.
[11] P. Pedreiras, P. Gai, L. Almeida, and G. C. Buttazzo. FTT-Ethernet:
a flexible real-time communication protocol that supports dynamic
QoS management on Ethernet-based systems. IEEE Trans. Industrial
Informatics, 1(3):162–172, Aug. 2005.
[12] Real-Time PROFINET IRT. http://us.profibus.com/profinet/07.
[13] B. Sprunt, L. Sha, and J. P. Lehoczky. Aperiodic task scheduling for
hard real-time systems. Real-Time Systems, 1(1):27–60, June 1989.
[14] M. Spuri and G. C. Buttazzo. Efficient aperiodic service under
earliest deadline scheduling. IEEE Real-Time Systems Symposium
(RTSS’94), pages 2–11, San Juan, Puerto Rico, Dec 1994.
[15] J. K. Strosnider, J. P. Lehoczky, and L. Sha. The deferrable server
algorithm for enhanced aperiodic responsiveness in hard real-time en-
vironments. IEEE Trans. Computers, 44(1):73–91, January 1995.
[16] S. Varadarajan and T. Chiueh. EtheReal: A Host-Transparent Real-
Time Fast Ethernet Switch. 6th Int Conference on Network Protocols,
pages 12–21, Oct. 1998.
ITEM - Implementation of Integrated TDMA and E-ASAP Module
Inderjit Singh
CTU Faculty of Electrical Engineering
Department of Control Engineering
Karlovo na´m. 13, Prague 2, Czech Republic
inderjit.arora@gmail.com
Jirˇı´ Trdlicˇka
CTU Faculty of Electrical Engineering
Department of Control Engineering
Karlovo na´m. 13, Prague 2, Czech Republic
trdlij1@fel.cvut.cz
Zdeneˇk Hanza´lek
CTU Faculty of Electrical Engineering
Department of Control Engineering
Karlovo na´m. 13, Prague 2, Czech Republic
hanzalek@fel.cvut.cz
Abstract
We present a new implementation of TDMA communi-
cation protocol for wireless sensor networks. The ITEM
(Integrated TDMA E-ASAP Module) is a TinyOS 1.x mod-
ule primarily determined for TelosB platform. It pro-
vides a TDMA (Time Division Multiple Access) commu-
nication protocol integrated with slot assignment protocol
E-ASAP (Extended - Adaptive Slot Assignment Protocol).
The ITEM module deals with a collision free multi-hop
communication, hidden-node problem and network struc-
ture changes. To achieve a good data throughput the E-
ASAP sets the TDMA period individually for each node
depending on the actual network structure. The current
version of the ITEM module is implemented for TelosB
and TmoteSky platforms.
1. Introduction
The Wireless Sensor Networks (WSN) is a very
quickly developing technology. There are several different
platforms and operating systems. One of the most popu-
lar operating systems today is TinyOS [1] for which many
applications and modules have been implemented. How-
ever, according to our knowledge, there is no TDMA im-
plementation in TinyOS suitable for the TelosB platform
[2].
Decision whether the TDMA communication is bet-
ter than the CSMA depends on the concrete application.
However, the TDMA eliminates the data collision during
the communication and thereby it decreases the commu-
nication energy consumption and holds a constant data
throughput independent of the medium load. Simulta-
neously, the collision-free communication ensures a con-
stant one-hop communication delay between two nodes,
which is very important for the real-time communication.
The real-time constraint was our main motivation to im-
plement the TDMA module. Due to the knowledge of the
timing of the TDMA, the nodes of the WSN can switch to
a sleep mode and save much more energy.
An important part of the TDMA mechanism is a slot
assignment protocol, which assigns the slots to the nodes,
so that no node can interfere with a communication of dif-
ferent nodes. For this task, we have adopted the E-ASAP
(Extended - Adaptive Slot Assignment Protocol) which
has been published in [6]. The E-ASAP is an on-line dis-
tributed protocol, which takes care about the hidden node
problem and individually sets the TDMA period for each
node to achieve a better data throughput. The protocol is
able to dynamically adapt the slot assignment according
to the actual network structure. Especially the dynamic
behavior of the E-ASAP is very important, because the
structure of the WSN is changing in many applications.
There is a lot of works focused on the TDMA proto-
cols for the sensor networks (e.g. see [9]). However, we
were able to find just some few implementations with a
source code for the TinyOS: Lin Gu has implemented a
TDMA protocol PRIME in TinyOS 1.x for Mica2 plat-
form. The implementation of the PRIME protocol can
be found in contrib/prime of TinyOS 1.x [3]. In [7] the
authors propose a Unified Power Management Architec-
ture (UPMA) for flexibly integrating the use of different
radio power management protocols into a complete wire-
less sensor network system. The UPMA has been added
into the contrib/wustl/upma of TinyOS 2.x [4]. We should
mention the open implementation of the IEEE 802.15.4
with the GTS allocation too [8].
The paper is organized as follows: Section 2 introduces
the basic ideas of the E-ASAP protocol. In Section 3 the
structure and functions of the ITEM is presented. The ex-
ample of ITEM behavior is given in Section 4 and the Sec-
tion 5 concludes the work.
AB
D
C
I
H
J
G
FE
Frame length: 8
Frame length: 4
Neighbors
NodeA
Figure 1: Example of a WSN.
2. E-ASAP
In this section, we briefly introduce the E-ASAP (Ex-
tended - Adaptive Slot Assignment Protocol). For more
detailed description see [6].
The TDMA protocol divides the usage of the commu-
nication channel into several non-overlapping slots, which
are repeated within some period. The number of slots in
one period is called ”frame length”. The E-ASAP sets
the frame length of a new node (node joining the network)
and assigns a free slot to this node. Eventually, if there is
no free slot, the E-ASAP doubles the frame length of the
interested nodes.
2.1. Frame Format
The frame length is set as a power of two in E-ASAP
and the minimum frame length is fixed to four slots. By
setting the frame length as a power of two, packet col-
lision can be avoided between nodes with different frame
lengths. The first slot (slot zero) is reserved for a new node
to transmit a request for doubling the frame length of its
neighbor nodes if its needed. An example of a TDMA
frame is on Figure 3.
2.2. Data Format
Each node maintains the information about the frame
length and slots assignment of itself, its neighbors and
its hidden nodes (nodes, which cannot communicate each
other and have a common neighbor). An example of infor-
mation held by the node H in Figure 1 is shown on Figure
2. The slot means the number of the assigned slot and the
frame length means the number of slots in the TDMA pe-
riod of the node. An example of the TDMA schedule in
view of nodeH is shown on Figure 3. The slot zero is free
and reserved for a new node. The letters in the slots de-
note the nodes, which can send data in a given slot. There
are assigned two nodes in slot number one. It is possible,
because the transmission from node I and from node C do
not interfere with each other.
2.3. Packet Format
There are two types of data packets in the E-ASAP.
A data packet (DAT) and an information packet (INF).
Slot(s) / Frame length
2 / 8
Slot(s) / Frame lengthNode
D
G
I
J
6 / 8
4 / 8
1 / 4
3 / 4
F
Node
C
Slot(s) / Frame length
1 / 8
5 / 8
Own Neighbors Hidden nodes
Figure 2: Information held in node H in Figure 1.
H J G DF + II + C
(0) (1) (2) (3) (4) (5) (6) (7)
JJ
Figure 3: TDMA schedule in node H view in Figure 1.
The DAT packet contains the information on the frame
length of the sender, the current slot number, the max-
imum frame length among the sender and its neighbors
and the transmitted data. The INF packet contains the in-
formation about the network structure in the node neigh-
borhood. There is information about the frame length and
the assigned slots of the sender and its neighbors. e.g. in
Figure 1, the INF packet of node H contains the fist and
the second table from Figure 2 (Own, Neighbors).
2.4. Slot Assignment
The newly joined node collects INFs transmitted by its
neighbors to obtain the slot assignment information. After
some period , the new node sets its frame length to four
slots (minimum frame length) and finds a free slot via the
following procedures:
1. Getting an unassigned slot
If the first slot is not assigned to any neighbor, and
there is an unassigned slot in the TDMA period (ex-
cept the first one), the new node assigns one of the
free slots to itself (except the first one).
2. Doubling the frame length
If no slot is available in the current frame length, the
new node doubles the frame and tries again to assign
an unassigned slot. This procedure is repeated until
the new node finds an unassigned slot, or until the
maximum allowed frame length is reached.
After selecting a slot, the new node sends the INF
packet to its neighbors. The INF packet contains the in-
formation about itself and about the node neighbors.
2.5. Releasing Slot Assignment
When a node exits from the network, it just stops send-
ing the DAT and the INF packets. The neighbors detect
departure of the node after some time when no DAT or
INF packet has been received. Each neighbor changes its
assignment information and sends them by an INF packet.
If possible, each node decreases its frame length.
Core
E-ASAP
TDMA
Data
TimeSync
Comm
WatchDog
ITEM Interface
Figure 4: ITEM structure
3. Implementation
The ITEM is designed to be flexible and easy to mod-
ify. In this section, we present the structure of the ITEM
and functions of the modules. The full documentation
with functionality diagrams, user interfaces and TinyOS
documentation can be found in [5].
3.1. Structure
The structure of the ITEM is divided into seven mod-
ules (see Figure 4) and each module is implemented as an
independent component. This does not hold for the Core
module, which interconnects all other modules to make
them cooperate. This structure enables an easy modifica-
tion and improvement of the ITEM.
A brief description of the modules follows:
E-ASAP The E-ASAP implements the E-ASAP proto-
col [6]. It handles the network information about the node,
its neighbors and its hidden nodes and updates this in-
formation according the network changes. The E-ASAP
module takes care about the slots assignment, slots releas-
ing and frame length changes (see section 2).
TDMA The Time Division Multiple Access module
manages the TDMA period in terms of timing. It han-
dles the information about duration of the slots and about
the TDMA period. It signalizes the beginning of a new
slot and enables changes of the TDMA parameters.
Comm Comm module is responsible for all communi-
cations related to other nodes via radio and host computer
via the USB. It has a basic functionality. Either it can be
requested to transmit a message, or when a messaged is
received, an event is signaled to all other modules.
TimeSync The TimeSync module handles the time syn-
chronization between the nodes. For the time synchro-
nization in the network, the module uses a simple algo-
rithm:
tLocal =
tremote + tlocal
2
(1)
Where tLocal denotes the new node local time, tremote
denotes the time received from a neighbor node and tlocal
Neighbors
Node0
4
0
15
2
3
Figure 5: Network structure for a test of ITEM dynamic. (without hidden
nodes)
denotes the old node local time. The actual local node
time is send with each DAT and INF package, so the time
update is performed during each message reception. The
same algorithm has been used e.g. in [10] for synchroniza-
tion of a hexagonal sensor network. The worst-case error
(time difference between two neighbor nodes) of this al-
gorithm was 10 ms in our experiments (TelosB platform,
TinyOS 1.x). The error is cased by the fact, that the algo-
rithm does not take into account the message propagation
delay. We intend to improve the time synchronization al-
gorithm in future work. However, we consider if we are
able to achieve better accuracy because of the non-Real-
time behavior of the TinyOS.
Data Data module has two queues that hold the data for
both reception and transmission. If any of the queues is
full, new data are discarded until an empty queue element
is available. The module provides functions to manipulate
with the data in the queues.
Watchdog As any watchdog, its main function is to reset
the system if it locks itself up due to an error. Please note
that the watchdog only works for the MSP430 architecture
in this implementation.
Core Core module’s function is to combine all the other
modules in ITEM and make them work together appropri-
ately.
4. Experiments
We present two experiments focused on the slots as-
signment and the frame length behavior. The nodes are
added into the network and removed from it and the
slots assignment and the frame length are monitored. We
present an experiment in a network, where each node can
communicate with all the other nodes (no hidden nodes)
at first. Than we present the same experiment in a network
with hidden nodes.
4.1. Network without Hidden Nodes
The network structure for the firs experiment is shown
on Figure 5. The experiment has been initiated by node 0.
The table with slot assignment and frame length progress
is in Table 1. You can see the changes of the frame length
in the table. All nodes in the network have the same frame
Table 1: ITEM dynamic in network without hidden nodes.
Node ID (slot / frame length)
Action 10 11 12 13 14 15
Started 0 1/4 - - - - -
Added 1 1/4 2/4 - - - -
Added 2 1/4 2/4 3/4 - - -
Removed 1 1/4 - 3/4 - - -
Added 3 1/4 - 3/4 2/4 - -
Added 1 1/8 4/8 3/8 2/8 - -
Added 5 1/8 4/8 3/8 2/8 - 5/8
Removed 3 1/8 4/8 3/8 - - 5/8
Added 4 1/8 4/8 3/8 - 2/8 5/8
Added 3 1/8 4/8 3/8 6/8 2/8 5/8
Removed 1 1/8 - 3/8 6/8 2/8 5/8
Removed 5 1/8 - 3/8 6/8 2/8 -
Removed 3 1/4 - 3/4 - 2/4 -
Removed 4 1/4 - 3/4 - - -
3
4
0
2
5
1
Neighbors
Node0
Figure 6: Network structure for test of ITEM dynamic. (with hidden
nodes)
length. It is caused by the fact that all nodes can commu-
nicate with each other and interfere with each other.
4.2. Network with Hidden Nodes
The network structure for the second experiment is
shown on Figure 6. The node 2 works as a gate between
two parts of the network. These two parts cannot directly
communicate, however the nodes of both parts are neigh-
bors of the node 2. The experiment has been initiated by
the node 0 and the dynamics of the network is shown in
Table 2.
There is an example of different frame lengths in the
network in the 4th row of the table. The node number 4
did not need to increase its frame length until the node
number 3 was added. The expression 3(7)/4 means that
the node has assigned the slot number 3. However, for
nodes with frame length equal to 8 it seems that the node
4 has assigned the slots 3 and 7.
5. Conclusion
We have introduced a new implementation of a TDMA
communication protocol for TinyOS 1.x and TelosB plat-
form, called ITEM. The module uses the E-ASAP to as-
sign TDMA slots to nodes and provides an autonomous
control of the slot assignment in the network. The pro-
tocol adapts the slot assignment according the changes in
the network structure (appear / disappear of the nodes).
Table 2: ITEM dynamic in network with hidden nodes.
Node ID (slot / frame length)
Action 10 11 12 13 14 15
Started 0 1/4 - - - - -
Added 2 1/4 - 2/4 - - -
Added 4 1/4 2/4 - 3/4 - 3/4
Added 1 1/8 4/8 2/8 - - 3(7)/4
Added 3 1/8 4/8 2/8 5/8 3/8 -
Added 5 1/8 /8 2/8 5/8 3/8 6/8
Removed 4 1/8 4/8 2/8 5/8 - 6/8
Removed 1 1/8 - 2/8 5/8 - 6/8
Removed 3 1/4 - 2/8 - - 6/8
Removed 5 1/4 - 2/4 - - -
Moreover, the protocol adapts the frame length (TDMA
period) to obtain a more efficient data throughput.
We are working on a new version of the ITEM now.
The new version will be implemented in TinyOS 2.x and
it will improve interface and some functions to be more
user friendly. We intend to use the ITEM module in our
next applications from sensor networks area, especially to
ensure a real-time behavior of the network and to set the
nodes into the sleep mode.
Acknowledgement: This work was supported by
the European Commission under project FRESCOR IST
034026.
References
[1] [Online]:http://www.tinyos.net/.
[2] [Online]:http://www.xbow.com/Products/
/productdetails.aspx?sid=252.
[3] [Online]:http://tinyos.cvs.sourceforge.net/tinyos/tinyos-
1.x/contrib/prime.
[4] [Online]:http://tinyos.cvs.sourceforge.net/tinyos/tinyos-
2.x-contrib/wustl/upma/.
[5] [Online]:http://rtime.felk.cvut.cz/∼trdlij1/doku.php?id=ITEM.
[6] A. Kanzaki, T. Hara, and S. Nisho. An adaptive tdma slot
assignment protocol in ad hoc sensor networks. Proceed-
ings of the 2005 ACM symposium on Applied computing,
pages 1160 – 1165, 2005.
[7] K. Klues, G. Xing, and C. Lu. Towards a unified radio
power management architecture for wireless sensor net-
works. in WWSNA, 2007.
[8] A. Koubaa, M. Alves, and E. Tovar. GTS allocation anal-
ysis in IEEE 802.15.4 for real-time wireless sensor net-
works. The 14th International Workshop on Parallel and
Distributed Real-Time Systems (WPDRTS’06), Rhodes,
Greece, April 2006.
[9] A. Rowe, R. Mangharam, and R. Rajkumar. Rt-link: A
time-synchronized link protocol for energy constrained
multi-hop wireless networks. Third IEEE International
Conference on Sensors, Mesh and Ad Hoc Communica-
tions and Networks (IEEE SECON), 2006.
[10] K. Shashi Prabh and T. Abdelzaher. On scheduling and
real-time capacity of hexagonal wireless sensor networks.
19th Euromicro Conference on Real-Time Systems, 2007.
(ECRTS), pages 136 – 145, 2007.
