98-3 Scratchpad Memory Aware Task Scheduling with Minimum Number of Preemptions on a Single Processor by Qingwan Huiwu & Jingling Xue
98-3 
Scratchpad Memory Aware Task Scheduling with Minimum Number of 
Preemptions on a Single Processor 
QingWan  HuiWu  Jingling Xue 
School of Computer Science and Engineering 
University of New South Wales 
Sydney, NSW  2052, Australia 
e-mail:  {qingwan.huiw.jingling@cse.unsw.edu.au } 
Abstract- We  propose  a  unified  approach  to  the  problem  of 
scheduling  a  set of  tasks with  individual  release  times,  deadlines 
and precedence constraints,  and allocating  the data of each task 
to  the  SPM  (Scratchpad  Memory)  on  a  single  processor  system. 
Our approach consists of a task scheduling algorithm and an SPM 
allocation  algorithm.  The  former  constructs  a  feasible  schedule 
incrementally, aiming to minimize the number of preemptions in 
the feasible schedule.  The latter allocates a portion of the SPM to 
each task in an efficient way  by employing a novel data structure, 
namely, the preemption graph.  We have evaluated our approach 
and a previous approach by  using six task sets.  The results show 
that our approach achieves up to 20.31 %  on WCRT (Worst-Case 
Response Time) reduction over the previous approach. 
I. INTRODUCTION 
In a typical embedded system, there are multiple concurrent 
tasks.  Tasks  may  be  subject  to  release  times,  deadlines,  and 
precedence constraints.  The release time and the deadline of a 
task specify its earliest start time and the latest completion time 
in a feasible schedule.  The precedence constraints specify the 
data and control dependencies between tasks. In hard real-time 
embedded systems, it is essential to find a feasible schedule for 
all the tasks at the design stage. 
The problems of scheduling a set of tasks with various con­
straints have been extensively studied [8, 15, 16]. Most schedul­
ing problems are NP-complete.  On a single processor.  if tasks 
are  preemptible,  the  EDF  (earliest  deadline  first)  strategy  is 
guaranteed  to  find  a  feasible  schedule  for  a  set  of  tasks  with 
individual release times,  deadlines and precedence constraints 
whenever one exists [8]. However, if tasks are not preemptible, 
the  problem  of finding a  schedule  with minimum  lateness for 
a  set  of  independent  tasks  with  individual  release  times  and 
deadlines on a single processor is NP-complete [8]. 
Scratchpad  memory  is  the on-chip  SRAM  managed  by  the 
compiler.  It  is  an  attractive  alternative  to  cache  in  embedded 
systems due to its three major advantages. Firstly, it consumes 
less  energy  than  cache.  Secondly,  it  is  easier  to  compute  the 
WCET (Worst-Case Execution time) of a task because the ac­
cess  time  of  each  variable  or  instruction  is  known  at  compile 
time.  Thirdly,  the  compiler  can  usually  hide  data  hazards  in 
modem RISC processors without any  hardware support as the 
latency of each data access to SPM is known at compile time. 
However, SPM also introduces additional challenges.  One ma-
jor challenge is that the task scheduling problem and the SPM 
allocation problem are mutually  dependent.  On  one hand,  the 
WCET  of  a  task  is  dependent  on  the  size  of  the  SPM  allo­
cated  to it.  On  the  other  hand,  the size  of the SPM allocated 
to each task is dependent on  whether this task is preempted in 
the schedule or not.  If a task Ti is preempted by  another task 
Tj, then Ti and Tj cannot use the same section of SPM to store 
data assuming there is no dynamic SPM reallocation. 
In  this  paper,  we  study  the  problem  of  scheduling  a  set  of 
tasks  with  individual  release  times,  deadlines  and precedence 
constraints on a single processor of an embedded system where 
SPM is used to replace data cache, and the problem of allocat­
ing the SPM to each task. We assume that the target embedded 
system is a hard  real-time  system  where  the deadline  of  each 
task must be met.  We make the following major contributions. 
1.  We propose a novel unified approach to the task schedul­
ing  problem  and  and  the  SPM  allocation  problem.  The 
unified  approach  consists  of  a  task  scheduling  algorithm 
and an SPM allocation algorithm. The task scheduling al­
gorithm  aims  at  minimizing  the  number  of  preemptions 
in  a  feasible  schedule  for  the task  set.  The  SPM  alloca­
tion algorithm employs a novel data structure, namely, the 
preemption graph, to efficiently allocate SPM to tasks. 
2.  We  have  evaluated  our  approach  and  the  one  proposed 
by  Suhendra  et  al.  [19]  by  using  six  task  sets  with 
tasks  selected  from  three  benchmark  suites:  Powerstone 
[13],  MaJ.ardalen WCET Benchmarks [7],  SNU real-time 
benchmarks  [14], and  an  open-source  UAV  (Unmanned 
Aerial  Vehicle)  control  application  from  PapaBench  [5]. 
For  all  the  task  sets,  our  approach  achieves  a  maximum 
improvement of 20.31  % on the WCRT reduction. 
The rest of this paper is organized as follows.  Section 11 de­
scribes the system model and key definitions. Section III shows 
how to determine the maximum SPM size for each task.  Sec­
tion IV describes our unified approach to task scheduling and 
SPM allocation.  Section  VI describes related work.  Section  V 
presents  our  experimental  results,  followed by  the conclusion 
section in Section  VII. 
11.  SYSTEM  MODEL AND  DEFINITIONS 
The target hard real-time embedded system uses a single pro­
cessor  where an  SPM is used to replace data cache.  The size 
978-1-4673-3030-5/13/$31.00 ©2013 IEEE  741 of the SPM is  m  bytes.  The SPM occupies a contiguous sec­
tion of the processor's memory space.  The start address of the 
SPM is O.  The SPM is only used to store the local (stack) data 
of tasks.  The problem of  allocating the global data,  heap data 
and code  of  a task set to SPM  will be  studied in  future  work. 
There is a set V  =  {Tb T2,···,  Tn} of n tasks to be executed 
on the processor.  Each task is preemptible by any  other tasks. 
However,  our  unified  algorithm  for task  scheduling  and  SPM 
allocation  preempts  a  task  only  if  it  is  necessary.  Tasks have 
individual release times, deadlines and precedence constraints. 
The precedence constraints are represented by a DAG (directed 
acyclic  graph)  G  =  (V, E),  where V  =  {Tb T2,···,  Tn} is 
the set of tasks, E  =  {(Ti,  Tj) : Tj can be executed only after 
Ti finishes } is a set of precedence constraints between tasks. 
Each task Ti has the following attributes: 
1.  Pre-assigned release time R(Ti)' 
2.  Pre-assigned deadline D(Ti)' 
3.  The maximum size Si  (Si :::;  m) of the SPM space needed 
by Ti, and 
4.  The worst-case execution time wcet(Ti,  x) when an SPM 
with size of x  bytes is allocated to Ti. 
Given  a  schedule for  a  set  of  tasks  with  individual release 
times, deadlines and precedence constraints, a task Ti is ready 
at  time t if all its predecessors have been completed by t and 
t is greater than or equal to the release time of Ti.  The ready 
time of Ti is the earliest time at which Ti is ready. 
EDF  is  a  classical  scheduling  strategy.  There  are  two 
versions,  preemptive  EDF  (pEDF)  and  non-preemptive  EDF 
(npEDF).  The npEDF schedules a task  only  when the current 
running  task  is  completed.  The  pEDF  performs  scheduling 
whenever a new task is ready.  Both pEDF and npEDF schedule 
a ready task with the smallest deadline. 
Definition 1  Given a schedule a and a task Ti, the live range 
of  Ti,  denoted  as  L(Ti),  is  a  time  interval  [S(Ti),  F(Ti)], 
where S(Ti) and F(Ti) are the start time and the finish time, 
respectively, ofn in a. 
Given two tasks,  they  can share a section of SPM iff their live 
ranges do not overlap. 
Definition 2  Given a schedule a for a set of tasks, the inteifer­
ence graph of a is an undirected graph G(a)  =  (V, E), where 
V  =  {T1,  T2,"',  Tn} is the set of tasks, and E  =  {(n,  Tj) : 
Ti,  Tj E V and L(Ti) n L(Tj) -=I- 0 }. 
Definition 3  Given a schedule, a task Ti is said to preempt a 
task Tj if!  one of the following conditions holds: 1) Ti preempts 
Tj directly. 2) Ti is scheduled immediately after the completion 
of another task Ts, and Ts preempts Tj in the schedule. 
Notice that our definition of preemption is a generalization of 
the traditional one. 
Definition 4  Given a schedule a for a set of tasks,  the pre­
emption graph of a is a directed graph G  =  (V, E),  where 
V  =  {Tl' T2,"',  Tn} is the set of tasks, and E  =  {(n,  Tj) : 
Ti,  Tj E V and Tj preempts Ti in a}. 
98-3 
It is  easy  to  see  that  the  preemption  graph  of  any  schedule 
constructed  by  using  an  EDF  scheduler  is  a  forest.  The  pre­
emption graph is a key  data structure of our unified algorithm 
for task scheduling and SPM allocation.  We can  easily  prove 
that for each path in a preemption graph, the live ranges of any 
two tasks on the path overlap. 
Definition 5  Given a set of tasks with precedence constraints, 
individual  release  times  and  deadlines,  the  edge-consistent 
deadline of a task Ti, denoted D'  (Ti), is recursively defined as 
follows.  D'(Ti)  =  min{D(Ti), min{D'(Tj) - wcet(Tj,  Sj): 
Tj is an immediate successor ofTi in the precedence graph}}. 
Ill. DETERMINING THE  SPM SIZES OF INDIVIDUAL TASKS 
Our  unified  algorithm  for task scheduling and  SPM alloca­
tion needs to know the maximum SPM size Si of each task Ti. 
The impact of SPM on the WCET of each task may vary.  For 
some  tasks,  SPM  may  drastically  reduce  their  WCETs.  For 
some other tasks, SPM may not be very effective. Therefore, it 
is very important to determine the maximum SPM size of each 
task in a fair manner. 
The  approaches  proposed  in  [20, 21] assume that the  maxi­
mum  SPM  size  of  each  task  is  known  without  proposing  any 
approach to determining  the maximum SPM size of each task 
in  a  fair  manner.  The  approaches  proposed  in  [18, 19]  use  a 
heuristic  based  on  ILP  (Integer  Linear  PrograDllling).  Since 
the tasks in [18, 19] do not have individual deadlines, their ILP 
based heuristics are not applicable to our task model. 
Next,  we propose a new approach for determining the maxi­
mum size of the SPM for each task based on our previous work 
on allocating variables of a single task to SPM [23].  For each 
variable Vi, we define a benefit vector benefit(  Vi) as follows. 
(1) 
where I is the vector of the lengths, in non-increasing order, of 
the k longest paths of the task immediately before allocating Vi 
to the SPM, 1'( Vi) is the vector of the lengths, in non-increasing 
order, of the k longest paths of the task immediately after allo­
cating Vi to the SPM, and size(  Vi) is the size of Vi. Intuitively, 
the benefit vector of a variable Vi is the normalized contribution 
of Vi to the k longest path lengths of the task.  To compare any 
two benefit vectors, we use lexicographical ordering. 
In our definition of benefit vector,  k is a parameter.  On one 
hand, the larger the value of k, the more accurate a benefit vec­
tor.  On the other hand,  the larger the value of  k is,  the higher 
time complexity for computing a benefit vector. 
In order to  determine  the  maximum  SPM size for each task 
in a fair manner,  we introduce  a threshold  benefit vector amin 
for all the tasks.  For each task,  we select a variable as an SPM 
resident  only  if  its  benefit  vector  is  greater  than  amino  The 
threshold benefit vector is a parameter of our approach.  Given 
a specific task set,  its value needs to be tuned for the best per­
formance of a given task set. 
We determine the maximum SPM size Si of each task Ti as 
follows:  Keep  selecting  a  variable  of Ti on  the  longest  path 
with the maximum benefit vector being greater than amin, and 
742 allocating it to the SPM of the task until no variable can be se­
lected. For more details on selecting a most beneficial variable 
and allocating it to SPM, we refer to [23]. 
IV . UNIFIED  TASK  SCHEDULING AND  SPM ALLOCATION 
Given a  set S of tasks with individual  release times, dead­
lines, and precedence constraints, our objective is to find a fea­
sible  schedule  for S on  a  single  processor  with  an  SPM  with 
a  size  of m bytes  to  store  local  data  of  the  tasks.  A  feasible 
schedule is the one satisfying all the constraints. 
Our unified approach to task scheduling and SPM allocation 
consists of two major parts:  the task scheduling algorithm and 
the SPM allocation algorithm.  The task scheduling algorithm 
aims at minimizing the number of preemptions when finding a 
feasible schedule for the task set.  By default, it uses the npEDF 
scheduling.  It uses the pEDF scheduling only  if a task misses 
its  deadline  under the npEDF  scheduling.  Initially,  no  task is 
preempted.  Therefore, the whole SPM is allocated to each task 
Ti.  When  a  task  currently  scheduled  meets  its  deadline,  the 
task  scheduling  algorithm  calls the SPM  allocation  algorithm 
to  allocate  SPM  to  the  task  and  each  of  the  predecessors  of 
the task in the preemption graph.  During the execution of our 
unified approach,  if a task is preempted.  the SPM size of each 
predecessor of the task may decrease. 
Our unified approach uses the following variables: 
•  D(Ti): the deadline ofTi. 
•  wcet(Ti):  the  current  worst-case  execution  time  of  task 
Ti, 
•  accu_time(Ti):  the  accumulated  execution  time  of  task 
Ti, 
•  preempted(Ti):  a  Boolean  variable,  denoting  if task Ti 
has been preempted before, 
•  miss(Ti):  a  Boolean  variable,  denoting  if  task  Ti  has 
missed its deadline before, and 
•  start: storing successive scheduling points.  A scheduling 
point is a time point at which a task is scheduled. 
Our unified approach works as follows: 
1.  Compute  the  edge-consistent  deadlines  for  all  the  tasks, 
and initialize the relevant data structures. 
2.  If  a  task Ti meets  its  deadline  under  the npEDF,  do  the 
following: 
(a)  If Ti is  not  in  the  preemption  graph,  add Ti to  the 
preemption graph. 
(b)  If Ti has not been preempted before,  do the follow­
ing: 
1.  Find the task Tj that is most recently preempted 
and has not finished. 
ii.  If Tj exists,  add  the  directed  edge  (Tj, Ti) to 
the preemption graph.  and call our incremental 
SPM  allocator  to  allocate  SPM  to Ti and  each 
of its predecessors in the preemption graph. 
98-3 
3.  If a task Ti misses its deadline by the npEDF scheduling. 
Let Tj be the task scheduled at the release time of Ti. The 
following two cases are distinguished: 
(a)  The deadline of Tj is not larger than Ti. In this case, 
no feasible schedule exists. 
(b)  The deadline of Tj is larger than that of Ti.  In  this 
case, do the following: 
1.  Preempt Tj at the ready time of Ti. 
ii.  Find the set C of all the tasks scheduled after Tj 
in the current schedule. 
iii.  Remove  all  the  edges  incident  to  the  tasks  in 
C in the current schedule from the preemption 
graph. 
iv.  Undo the current schedule for C, and continue 
to schedule all the unscheduled tasks, including 
the tasks in C. 
4.  Repeat steps 2 and 3 until all the tasks have been  sched­
uled or a task cannot meet its deadline. 
The details of our unified algorithm for task scheduling and 
SPM allocation are shown in Algorithms 1 and  2. 
Algorithm 1:  Our unified approach to task scheduling and 
SPM allocation 
Input:  A set S of tasks with individual release times, 
deadlines and precedence constraints, an SPM with 
a size of m bytes, the maximum SPM size Si 
needed by each task Ti, and a processor P 
Output:  A feasible schedule and an SPM allocation 
scheme for S 
1  Compute the edge-consistent deadline for each task; 
2  foreach task Ti E S do  3l  preempted(Ti)  =  false; 
4  Compute wcet(Ti, Si); 
5  wcet(Ti)  =  wcet(Ti, Si); 
6  accu_time(Ti)  =  0; 
7  D(Ti)  =  D'(Ti); 
8  Create an empty preemption Graph G; 
9  start  =  the earliest release time of all the tasks in S; 
10  Scheduler  _Allocator(S, start); 
The  SPM  allocation algorithm  works  incrementally  based 
on  the current partial  SPM allocation scheme.  It  is  called  by 
the task  scheduling algorithm  whenever a  new  task Ti is suc­
cessfully  scheduled.  When  being  called,  it  starts  with Ti and 
works  toward  the  source  (root)  task  along  the  path  from  Ti 
to  the  root  in  the  preemption  graph.  For each  task Tj visited 
in  the  preemption  graph,  our  SPM  allocation  algorithm  tries 
to  allocate  Sj  bytes  to  it.  If  Sj  bytes  is  not  available,  it  al­
locates  the  remaining  free  SPM  space  to Tj  considering  the 
interference  constraints.  Once  a  task Tk cannot  be  allocated 
Sk bytes,  all its predecessors in the preemption graph  will not 
be  allocated  any  SPM  space.  For  each  task Ti,  we  introduce 
four variables, starLaddr(Ti), end_addr(Ti), spm_size(Ti), 
and  wcet(Ti),  where  starLaddr(Ti)  and  end_addr(Ti)  are 
the start address and the end address of Ti, respectively,  in the 
743 Algorithm 2: 8cheduler �llocator(8, start) 
Input:  A set 8 of tasks, the earliest release time start of 
all the tasks 
Output: Task schedule and SPM allocation results 
1  while 8 =f. 0 do 
2  Find a task Ti in 8 that is ready at time start and has 
the earliest deadline among all the ready tasks; 
3  if  start + wcet(Ti) - accu_time(Ti) :::; D(Ti) then 
4  Schedule Ti at time start; 
5  start  =  start + wcet(Ti) - accu_time(Ti); 
6  accu_time(Ti)  =  wcet(Ti); 
7  8  =  8 - {Td; 
8  if Ti is not in G then 
9  L  Add Ti to  G; 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34 
else 
35  return; 
ifpreempted(Ti)  =  false then 
Let Tj be the most recently preempted task 
that has not finished by time start; 
if  Tj exists then  l  Add (Tj, Ti) to  G; 
II  Re-allocate  SPM  to  �  and 
all  its  predecessors 
Incr  _8P  M �llocator(Ti); 
if  the ready time OfTi  =  start 11 miss(Ti)  =  true 
then 
L return No feasible schedule; 
else 
start  =  the release time of Ti; 
Let Tj  be the task executing at time start in 
the current schedule; 
if D(Tj) :::; D(Ti) then 
L return No feasible schedule; 
else 
miss(Ti)  =  true; 
preempted(Tj)  =  true; 
C =  {Tj} U {Tk  :  Tk  is scheduled after Tj 
in the current schedule}; 
Preempt Tj at time start; 
Undo the partial schedule for all the tasks 
scheduled after time start; 
foreach task Tk  E  C do  l  Recalculate accu_time(Tk); 
Remove  Tk  and all its incident edges 
fromG; 
8  =  8U {Td; 
8cheduler  _Allocator  (8, start); 
break; 
SPM,  spm_size(Ti)  is the size of the SPM allocated to Ti, and 
wcet(Ti) is the WCET of Ti. For a leaf task, its start address is 
O.  For a non-leaf task that can be allocated to SPM, its start ad­
dress is one plus the maximum end address of all its children. 
Our SPM allocation algorithm is shown in Algorithm 3. 
98-3 
Algorithm 3: I  ncr  _8 P  M �llocator(Ti) 
Input:  A preemption graph G, an allocation scheme for all 
the tasks in the preemption graph, and a new task 
Ti 
Output:  A new SPM allocation scheme for all the tasks in 
G 
1  start-addr(Ti)  =  0; 
2  end_addr(Ti)  =  min{m - 1, Si - 1}; 
3  Tj  =  the parent of Ti in  G; 
4  while Tj  =f. null do 
5  temp  =  max{  end_addr(Ts) : Ts is a child of Tj  in 
G}; 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
if temp 2 m - 1 then  l  II  No  SPM  space  for  Tj 
start-addr(Tj)  =  m; 
end_addr(Tj) =  m; 
spm_size(TJ)  =  0; 
wcet(TJ)  =  wcet(Tj, 0); 
else 
start-addr(Tj)  =  temp + 1; 
end_addr(Tj)  =  min{  m - 1, temp + Si}; 
spm_size(Tj)  = 
end_addr(Tj) - start-addr(Tj) + 1; 
if  spm_size(Tj) < Si then  l  II  Not  enough  SPM  space  for  Tj 
Compute wcet(Tj, spm_size(Tj)); 
wcet(Tj)  =  wcet(Tj, spm_size(Tj)); 
18  Tj  =  the parent task of Tj in G; 
Next,  we  use  an  example  to  explain  how  our  unified  ap­
proach to task scheduling and SPM allocation works.  We also 
use it to compare our SPM allocation algorithm with the graph 
coloring based SPM allocation technique proposed in [19]. 
There are a set of 10 independent tasks to be executed on a 
single  processor  where  an  SPM  of  2K bytes  is  used  to  store 
local  data  of  the  tasks.  The  task  attributes  are  shown  in  Fig­
ure la, where the SPM size is the size of the SPM space needed 
by each task, and the WCET of each task is its WCET when its 
SPM  size requirement  is satisfied.  For example,  if  Tl is allo­
cated 648 bytes of SPM space, its WCET is 4.5. 
A feasible schedule found by our task scheduling algorithm 
is shown in Figure  lb, and the preemption graph of this sched­
ule  is  constructed  as  in  Figure  lc.  Based  on  the  preemption 
graph, the SPM allocation scheme computed by our SPM allo­
cation algorithm is shown in Figure  le.  As we can see,  all the 
tasks are allocated to the SPM. 
Notice that by our SPM allocation algorithm, a task may not 
be  fully  allocated  to  the  SPM.  In  this  example,  if  we  change 
the  SPM  size  requirement  of Tl  to  1K  bytes,  our  algorithm 
will allocate only 648 bytes of SPM space to Tl. 
Consider the final schedule shown in Figure lb.  For simplic­
ity,  we  ignore the  start times and  finish times of all the tasks, 
and  only  consider  their  execution  order.  Next,  we  will  show 
how  the  graph  coloring  based  SPM  allocation  technique  pro­
posed by in [19] works.  The interference graph of all the tasks 
in the final schedule is shown in Figure  Id.  After applying the 
744 Task  Tl  T2  T3  T4  T5  T6  T7  Ts  Tg  TlO 
Release time  0  1  1  1  6  12  12  16.S  18.5  18.S 
Deadline  17  12  S  8  9  14  IS  22  20  23 
WCET  4.S  3.S  1.5  2.S  I.S  1  2  3.S  1  2 
SPM Size  648  600  800  300  700  400  800  1000  1048  1200 
(a) Task attributes 
I
Tll 
T2  I 
T31 
T4  I n I 
T21 
Tl  I
T61 
T7  I 
Tl I 
Ts  I
Tgl Tsl TlO I 
0  1  3  4.5  7  8.5  10  12 13  15  16.5  18.5 19.5  21  23 
(b) Task schedule 
Tl  Tl 
T4 
1\ 
TB  T7  TB 
T2  T7 
I  I 
T6  TlO  TlO 
0  T2  0 
T3  Ts  T6 
T4  Tg  T3  Ts  Tg 
(c) Preemption graph  (d) Interference graph 
TlO  I 
Tg  Ts 
T3  I  T2  I  Tl 
T4  I 
T5  I 
n  I 
T7  I 
0  300400  700800  10481200  1400  2048 
(e) SPM allocation results based on the preemption graph 
I 
Color  1: T1, Ts, T
lO  I 
Color 2:  T2, T6, T7, Tg  I 
Color 3:  T3, T4, T5  I 
0  1200  2248  3048 
(f) SPM allocation results based on graph coloring 
Fig. 1.:  An example comparing preemption graph based and graph coloring based SPM allocation 
coloring algorithm, we have three colors.  T1, T8 and TlO are 
assigned color 1, T2, T6, T7 and T9 are assigned color 2, and 
T3, T4  and T5  are assigned color 3.  The SPM is partitioned 
into three disjoint sections for color 1, color 2 and color 3, re­
spectively,  and all the tasks with the same color share a section 
of the SPM, as shown in Figure If.  As we can see, in order 
to place all the tasks in the SPM,  the size of the SPM must be 
at least 3048 bytes, in contrast to the SPM size of 2K bytes 
needed by our SPM allocation algorithm. As a result,  their ap­
proach cannot find a feasible schedule given an SPM size of 
2Kbytes. 
V.  EXPERIMENTAL  RESULTS 
A. Experiment Setup 
for the first five task sets. 
TABLE I 
:  Task sets 
Groups  Applications 
Setl 
minver, jfdctint, fdct, statemate, ludcmp, 
compress, nsichneu, qurt, fir, select 
Set2 
lms, adpcm, crc, engine, pocsag, 
matmult,jpeg, fftlk, edn, v42 
Set3  all the tasks from Set! and Set2 
Set4 
adpcm, jfdctint, ludcmp, edn, fftlk, 
fdct, lms, minver,  jpeg, matmult 
SetS 
statemate, nsichneu, qurt, select, fir, 
crc, engine, pocsag, v42, compress 
Set6  PapaBench 
98-3 
In order to evaluate our unified approach,  we created six task 
sets as  shown  in Table  I.  We selected 20  applications from 
three benchmark suites:  Powerstone [13],  Mlilardalen WCET 
Benchmarks  [7]  and  SNU  real-time  benchmarks  [14].  The 
statistics of all the applications are given in Table 11. Each ap­
plication is a task.  Each task set consists of a subset of tasks 
from the 20 applications.  There are no precedence constraints 
The 6th  task  set  comes from  a  real-life  open-source DAV 
control application, the  PapaBench [5].  It consists of  28 tasks 
and operates in two modes:  fly by wire and autopilot, which 
means that the aircraft can be controlled both manually and au­
tomatically.  Each mode consists of several tasks to control the 
745 TABLEll 
: Tasks information 
Benchmark  Data size (bytes)  I  WCET with SPM (cycles) 
1udcmp  21680  12449 
qurt  92  17856 
minver  2604  14264 
engine  478  4859160 
pocsag  1216  1268830 
jpeg  77561  74621273 
statemate  227  21326 
nsichneu  1588  202212 
fftlk  16484  5223830 
Ims  1268  1762300 
jfdctint  340  58678 
adpcm  2212  256275 
fir  592  150293 
crc  1079  47786 
compress  1837  14758 
matmu1t  4840  167420 
fdct  220  6275 
select  124  6644 
edn  1884  164364 
v42  40973  40869080 
aircraft and communicate with ground station.  For our eval­
uation purposes, we separated the tasks from the original im­
plementation,  and maintained the control dependencies among 
these tasks. The statistics of all the tasks in the 6th task set can 
be found in [19]. 
We implemented both our unified approach and the CR ap­
proach  proposed  in  [19] .  When  determining the  maximum 
SPM size for each task,  we set k to 2,  and the threshold benefit 
vector Cﾥmin to (0.1,0).  Since the CR approach does not han­
dle individual deadlines,  we revised it such that the interference 
between two tasks cannot be eliminated if delaying one task 
causes its deadline to be missed. In addition, we set the prior­
ity of each task to its deadline, and a smaller deadline implies 
a higher priority. 
We manually assigned each task in all the six task sets a re­
lease time and a deadline for every SPM configuration in such 
a way that many preemptions are needed in order to find a fea­
sible schedule. 
We  modified Chronos  4.0  [12]  to  calculate  the WCET of 
each application with different SPM sizes.  The infeasible path 
detection is enabled in Chronos.  The target architecture is an 
out-of-order,  pipelined processor,  with an instruction cache and 
perfect branch prediction. If the instruction cache is hit,  an in­
struction fetch takes 1 cycle.  Otherwise, it takes 100 cycles. 
The target processor uses scratchpad memory to replace data 
cache. The latencies of scratchpad memory and off-chip mem­
ory accesses are 1 cycle and 100 cycles as in [19],  respectively. 
The execution time of each instruction is 1 cycle. 
B. Results and Analysis 
We evaluated both our approach and the CR approach under 
three different SPM size configurations:  10%, 20% and 30% 
98-3 
of the total data size. We use two performance metrics,  namely 
WCRT  and  feasible  schedule, to  compare  both  approaches. 
The WCRT  of  a  schedule  is  the  maximum  completion  time 
minus the minimum start time of all the tasks.  The WCRTs 
produced by both approaches under various configurations are 
shown in Figure 2. 
In each figure, the black bars are for our approach and the 
light bars for the CR approach. Each bar represents the relative 
WCRT increase WCRTinc which is computed as follows: 
WCRTinc  (WCRTbase - WCRTalg)/WCRTbase 
where WCRnase  is the WCRT of a schedule, computed by 
using the pEDF, for the same task set without any SPM, and 
WC  RTalg is the WCRT computed by the two approaches. 
For the 10% SPM size in Setl,  the CR approach cannot find 
a feasible schedule that meets all the deadlines while our ap­
proach does.  Therefore, the second bar is empty.  For Set3, 
the WCRTs computed by our approach and the CR approach 
are close.  For this task set, the schedules computed by both 
have the same number of preemptions. However,  our approach 
achieves a slightly better SPM utilization due to our more effi­
cient SPM allocation algorithm. As a result,  our approach per­
forms slightly better.  For all the other task sets, our approach 
performs significantly better.  The maximum improvement on 
WCRT of our approach over the CR approach is 20.31  %,  which 
occurs in Set2 under 30% SPM size configuration. 
There are two major reasons that our approach performs bet­
ter.  The first reason is that our approach preempts a task only 
if it is necessary. The second reason is that our SPM allocation 
algorithm is more efficient as we demonstrated in an example 
in Section IV. 
It is worth noting that SPM is much less effective for an out­
of-order processor than for an in-order processor used in [17]. 
The reason is that an out-of-order processor can hide off-chip 
memory access latencies by executing other ready instructions. 
VI. RELATED WORK 
The problems of scheduling tasks with various constraints 
have  been  extensively  studied  [8,  15,  16].  Various  schedul­
ing  techniques  have  been  proposed.  One  common  assump­
tion made by all the previous scheduling techniques is that the 
WCET of each task is known. If SPM is used to replace cache, 
this assumption does not hold any more.  As a result, all the 
previous scheduling techniques without considering SPM are 
not applicable to the processors with SPM. 
A number of research groups have studied the SPM alloca­
tion for a single task [1-3,9-11,17,24].  All the techniques 
proposed assume that the amount of the SPM allocated to each 
task is known, which is not true for typical embedded systems 
with concurrent tasks.  As a result, those techniques cannot be 
used to solve the SPM allocation problem. 
Recently,  several research groups studied the SPM/cache al­
location problem for concurrent tasks. [22] exploits both cache 
partitioning and dynamic cache locking to to provide  worst­
case performance estimates for multitasking systems. [6] stud­
ies the problem of placing multiple tasks in the cache to im­
prove cache performance. It proposes an ILP based approach 
746 98-3 
- Minimum Preemption (MP)  - Critical Path Reduction (CR) 
Sett  Set2  Set3 
� 
40 
�  �  30 
"  "  "  30 
.5  30  .5  .5 
;:;  ;:;  ;:; 
�  �  20  � 
=  =  =  20 
..  20  ..  .. 
"  "  " 
..  ..  .. 
0  0  0 
!:2  10  !:2 
10  !:2 
10 
U  U  U 
�  �  � 
0  0  0 
10%  20%  30%  10%  20%  30%  10%  20%  30% 
SPMSize  SPMSize  SPMSize 
Set4  SetS  Set6 
�  t 
30 
�  40 
"  30 
"  " 
=  =  = 
� 
:=  :=  30  "  20  " 
�  �  � 
=  20  =  = 
..  ..  ..  20  �  �  � 
0  0  10  0 
!:2  10  !:2  !:2  10 
U  U  U 
�  �  � 
0  0  0 
10%  20%  30%  10%  20%  30%  10%  20%  30% 
SPMSize  SPMSize  SPMSize 
Fig. 2.:  WCRT comparison between MP and CR for six task sets and under three SPM configurations 
to optimally placing multiple tasks in the cache.  The ILP for­
mulations aim to minimize a cost function which is the total 
conflicts multiplied by a weight assigned to each task. [4] pro­
poses a dynamic scratchpad memory code allocation technique 
that supports dynamically created processes.  Their approach 
partitions SPM into pages. At runtime,  an SPM manager loads 
code pages of the  running applications into  the SPM on de­
mand.  It supports different sharing strategies that determine 
how the SPM is distributed among the running processes. 
[21] proposes scratchpad memory management techniques 
for priority-based preemptive multitasking systems.  The tech­
niques are applicable to a real-time environment.  It proposes 
three methods:  spatial,  temporal,  and hybrid methods,  with an 
objective to achieve energy reduction in the instruction memory 
subsystems. It formulates each method as an ILP problem that 
simultaneously determines (1) partitioning of scratchpad mem­
ory space for the tasks, and (2) allocation of program code to 
scratchpad memory space for each task. 
All the above-mentioned approaches do not consider the mu­
tual impacts between task scheduling and SPM allocation. As 
a  result, they  cannot  achieve  the  best  SPM  utilization.  [18] 
and  [19]  consider the mutual impacts between task  schedul­
ing and SPM allocation. [18] proposes an integrated task map­
ping, scheduling, SPM partitioning, and data allocation tech­
nique based on ILP. All the tasks are free of timing constraints 
and subject to precedence constraints. The ILP formulation ex­
plores the optimal performance limit and shows that integrated 
task scheduling and SPM optimization improves performance 
by up to 80% for embedded applications. 
[19] presents several dynamic scratchpad allocation tech­
niques that take the process interferences into account to im­
prove the performance and predictability of the memory sys­
tem.  It models the application as a MSC (Message Sequence 
Chart) to capture the interprocess interactions.  It proposes an 
iterative allocation algorithm that consists of two critical steps: 
(1)  analyzing  the MSC along  with  the  existing allocation  to 
determine  potential  interference  patterns, and  (2)  exploiting 
this interference information to tune the scratchpad reloading 
points and content so as to best improve the WCRT. 
The  approach proposed in  [19]  is the most  related to ours. 
Both  their  approach  and  ours  take  into  account  the  mutual 
impacts between task  scheduling and SPM  allocation.  Both 
consider  real-time  tasks  with  precedence  constraints.  How­
ever, there are four key differences between our approach and 
theirs.  Firstly, our SPM allocation algorithm is more efficient 
than their graph coloring based approach. Secondly,  by our ap­
proach,  all the tasks are initially non-preemptible,  which means 
that each task occupies the whole SPM. A task is preempted 
only if another task with a smaller deadline misses its dead­
line.  By their  approach, all the  tasks are preemptible at the 
beginning, and not allocated any SPM space.  Detailed anal­
ysis in CR (Critical Path Interference Reduction) algorithm is 
used to reduce the number of preemptions. As a result,  our ap­
proach leads to fewer preemptions and higher SPM utilization. 
Thirdly,  the task models are different. Under their task model, 
all tasks are periodic tasks without any additional release times, 
and all tasks have the same deadline. Our task model assumes 
that each task has its own release time and deadline.  Lastly, 
747 their approach aims to minimize the worst-case response time. 
In contrast,  our approach aims to minimize the number of pre­
emptions while constructing a feasible schedule. 
VII. CONCLUSION 
We  have  proposed  a  unified  approach  to  the  problem  of 
scheduling a set  of tasks with individual release times, hard 
deadlines, and  precedence  constraints  on  a  single  processor 
where an SPM is used to replace data cache to store stack data 
of each task, and the problem of allocating SPM to each task. 
Our approach consists of two algorithms: a task scheduling al­
gorithm and an SPM allocation algorithm.  The former aims at 
minimizing the number of preemptions by using a mix of pre­
emptive and non-preemptive EDF scheduling strategies.  The 
latter employs a novel data structure, namely, the preemption 
graph, to allocates SPM to each task.  Our simulation results 
show  that  our unified approach performs better than the ap­
proach proposed in [19].  Our future work is to extend our ap­
proach to mUltiprocessor systems. 
ACKNOWLEDGMENTS 
This  research  is  supported  by  the  Australian  Research 
Grants: DP0881330 and DPII0104628. 
REFERENCES 
[1]  Jean-Francois  Deverge  and  Isabelle  Puaut.  WCET-
directed dynamic scratchpad memory allocation of data. 
In ECRTS, pages 179-190,2007. 
[2]  Bemhard Egger, Chihun Kim,  Choonki Jang, Yoonsung 
Nam, Jaejin Lee, and Sang Lyul Min.  A dynamic code 
placement technique for scratchpad memory using post­
pass optimization.  In CASES, pages 223-233,  2007. 
[3]  Bernhard Egger,  Jaejin Lee,  and Heonshik Shin. Dynamic 
scratchpad memory management for code in portable sys­
tems with an MMU. In TECS, 7(2),  2008. 
[4]  Bernhard Egger,  Jaejin Lee,  and Heonshik Shin.  Scratch­
pad memory management in a multitasking environment. 
In EMSOFT, pages 265-274,  2008. 
[5]  Fadia  Nemer, Hugues  Cass, Pascal  Sainrat, Jean-Paul 
Bahsoun,  Marianne De Michiel. Papabench : A free real­
time benchmark.  In Workshop on Worst-Case Execution 
Time Analysis, 2006. 
[6]  Gemot Gebhard and Sebastian Altmeyer.  Optimal task 
placement to improve cache performance.  In EMSOFT, 
pages 259-268,  2007. 
[7]  Jan  Gustafsson, Adam  Betts, Andreas  Ermedahl, and 
Bj6m Lisper. The Mhlardalen WCET Benchmarks - Past, 
Present and Future.  In Workshop on Worst-Case Execu­
tion Time Analysis, pages 137-147,2010. 
[8]  J.  K.  Lenstra, A.  H.  G.  Rinnooy  Kan, and P.  Brucker. 
Complexity of machine  scheduling  problems.  Annals of 
Discrete Mathematics, 1:343-362, 1977. 
98-3 
[9]  Lian Li,  Hui Feng, and Jingling Xue.  Compiler-directed 
scratchpad  memory  management  via  graph  coloring. 
TACO, 6(3):9:1-9:17,  October 2009. 
[10]  Lian Li,  Quan Hoang Nguyen,  and Jingling Xue. Scratch­
pad allocation for data aggregates in superperfect graphs. 
In LCTES, pages 207-26,  2007. 
[11]  Lian Li,  Jingling Xue,  and Jens Knoop. Scratchpad mem­
ory allocation for data aggregates via interval coloring in 
superperfect  graphs.  TECS,  10(2):28:  1-28:42, January 
2011. 
[12]  Xianfeng Li, Yun Liang, Tulika Mitra, and Abhik Roy­
choudhury.  Chronos:  A  timing  analyzer  for  embed­
ded software.  Science of Computer Programming, 69(1-
3):56-67,2007. 
[13]  Jeff Scott,  Lea Hwang Lee,  John Arends,  and Bill Moyer. 
Designing the low-power M*CORE architecture. In IEEE 
Power Driven Microarchitecture Workshop,  pages  145-
150,1998. 
[14]  SNU.  SNU  Real-Time  Benchmarks.  http:// 
www  • cprover.org!goto-cc/examples/snu.htrnl. 
[15]  John A. Stankovic. Deadline schedulingfor real-time sys­
tems: EDF and related algorithms. Springer,  1998. 
[16]  John A.  Stankovic.  Scheduling algorithms.  Springer, 
2007. 
[17]  Vivy Suhendra,  Tulika Mitra, and Abhik Roychoudhury. 
WCET centric data allocation to scratchpad memory.  In 
RTSS, pages 223-232,  2005. 
[18]  Vivy Suhendra,  Chandrashekar Raghavan,  and Tulika Mi­
tra.  Integrated scratchpad memory optimization and task 
scheduling  for  mpsoc  architectures.  In CASES,  pages 
401-410,2006. 
[19]  Vivy Suhendra,  Abhik Roychoudhury, and Tulika Mitra. 
Scratchpad allocation for concurrent embedded software. 
In TOPLAS, 32(4),  2010. 
[20]  Hideki Takase,  Hiroyuki Tomiyama,  and Hiroaki Takada. 
Allocation of scratchpad memory in priority-based multi­
task systems.  In VLSI-DAT, pages 68-71,  2009. 
[21]  Hideki Takase,  Hiroyuki Tomiyama,  and Hiroaki Takada. 
Partitioning  and  allocation  of  scratchpad  memory  for 
priority-based preemptive multi-task systems.  In DATE, 
pages 1124-1129,2010. 
[22]  Xavier Vera,  Bj6m Lisper, and Jingling Xue.  Data cache 
locking  for  tight  timing  calculations.  TECS,  7(1):4:1-
4:38,  December 2007. 
[23]  Qing Wan, Hui Wu, and Jingling Xue.  WCET-aware 
data selection and allocation for scratchpad memory.  In 
LCTES, pages 41-50,2012. 
[24]  Hui Wu,  Jingling Xue, and Sri Parameswaran.  Optimal 
WCET-aware code selection for scratchpad memory.  In 
EMSOFT, pages 59-68,  2010. 
748 