Real-time Scheduling of Concurrent
Transactions in Multi-domain Ring Buses by Bui, Bach Duy et al.
1Real-time Scheduling of Concurrent
Transactions in Multi-domain Ring Buses
Bach D. Bui, Rodolfo Pellizzoni,Marco Caccamo
University of Illinois at Urbana-Champaign
Department of Computer Science
fbachbui2, rpelliz2, mcaccamog@illinois.edu
Abstract—We address the problem of scheduling concurrent periodic real-time transactions on Multi-Domain Ring Bus (MDRB).
The problem is challenging because although the bus allows multiple non-overlapping transactions to be executed concurrently, the
degree of concurrency depends on the topology of the bus and of executed transactions. This is a different challenge compared to
that of multi-processor real-time task scheduling. To solve this problem, first, we propose two novel efficient scheduling algorithms
for topographically-acyclic transaction sets. The first algorithm is optimal for transaction sets under restrictive assumptions while
the second one induces a competitive sufficient schedulable utilization bound for more general transaction sets. Then, we extend
these two algorithms for the scheduling of topographically-cyclic transaction sets. Extensive simulations show that the proposed
algorithm can schedule transaction sets with high bus utilization and is better than that of related works in most practical settings.
The implementation of the algorithms in a real test-bed shows that they have relatively low execution-time overhead.
Index Terms—real-time communications, real-time scheduling, real-time Network-on-Chip scheduling.
F
1 INTRODUCTION
Demands for high performance computing systems
have recently created significant interest in many-
core System-on-Chip (SoC) both industry-wise and
academic-wise [9], [10], [7]. In a many-core SoC, pro-
cessing cores and other parts of the system are intercon-
nected by a Network-on-Chip (NoC) [9], [10]. Since the
performance of NoC can greatly affect the performance
of the system as a whole, there have been many research
efforts on NoC architectures [2]. Commercial many-core
SoC with software-controlled NoC have also been de-
veloped. For example, the IBM Cell Broadband Engine
processor (CellBE) [9] is well-distinguished for its high
performance. The CellBE consists of twelve elements:
a PowerPC core, eight Synergistic Processing Elements
(SPE), a memory controller and two I/O controllers. All
elements are connected through a high-speed ring bus.
Many-core SoC with software-controlled NoC is well
suited for real-time applications. Consider highly crit-
ical real-time systems such as avionics and medical
systems. A typical task in such systems executes the fol-
lowing activities: 1) collecting data from sensors that are
tracking some physical events; 2) processing the data;
3) sending processed data or control signals to other
tasks or actuators. An example can be a multipurpose
status display task on an avionic system [18] which
shows the status of all aircraft avionics devices. The task
periodically gets data from I/O devices such as radars
every dozen of milliseconds, then processes the data
before sending information to a display task. A good
implementation model for this software system on a
many-core SoC is the thread streaming model [1]. In this
model, different real-time tasks run on different process-
ing elements. Real-time data transactions between tasks,
I/O devices and main memory are executed through
the NoC. The key idea is that since NoC accesses are
software-controlled, software designers can schedule
these data transactions deterministically.
In this paper, we study the real-time data transaction
scheduling problem for a specific NoC architecture, the
Multi-Domain Ring Bus (MDRB) (which is similar to
what used in CellBE). In a MDRB, bus elements are
connected through routers arranged in a ring configu-
ration. Transactions that do not overlap (i.e., they are
not routed through the same bus segment between
routers) can be transferred concurrently. MDRB has
been implemented in commercial systems [9], [3] and
is a proven cost-effective high-performance solution for
heterogeneous many-core SoC. The ring architecture
requires smaller amount of wires compared to other
common NoC architectures such as torus and mesh. As
a consequence, this architecture is simpler and supports
higher clock rates. At the same time, network utiliza-
tion can be significantly higher compared to a simple
shared bus because non-overlapping transactions can
be transmitted in parallel.
However, exploiting such parallelism to maximize
MDRB utilization while still providing strict real-time
guarantees is challenging. At first glance, scheduling
transactions in a MDRB bears similarity to parallel
scheduling of tasks on multiprocessors, but there are
major differences between the two problems. In par-
ticular, the degree of concurrency in a multiprocessor
2depends on the number of processing elements. How-
ever, the degree of concurrency in a MDRB depends on
the topology of the bus and executed transactions.
To tackle this problem, we advocate the use of slot-
based scheduling, in which time line is divided into
consecutive equal slots and transactions are scheduled
on contention-free slots. This type of scheduling has
been used to build the PFair algorithm [5] and the
Boundary-Fair algorithm [25], which are the optimal
scheduling algorithms for multi-processors. Although
this approach requires synchronization between bus
elements and computation at the end node (i.e. bus
elements), it can significantly reduce implementation
complexity of real-time NoC because it eliminates the
need of buffers and arbiters at the routers. The slot-
based scheduling model has been successfully imple-
mented in the Aethereal NoC [11], a guaranteed-service
NoC developed at Phillips Laboratory.
This research has two main contributions. First, we
propose two novel scheduling algorithms POBase and
POGen for real-time transaction sets which are topo-
graphically acyclic. These are transaction sets whose
transaction overlaps do not create a cycle on the bus.
POBase is optimal under a restrictive assumption on
transactions’ periods. Meanwhile, POGen induces a suf-
ficient schedulable utilization bound for transaction sets
without the restrictive assumption. We will show that
the bound is highly competitive for typical MDRB im-
plementations. Second, we propose cPOGen which ex-
tends POGen for topographically-cyclic transaction sets.
To the best of our knowledge, our algorithms are the
first dynamic-priority algorithms proposed for MDRB.
Most previous works [23], [22], [16], [4] have focused on
fixed-priority scheduling. Extensive simulations show
that our approach allows much higher utilization for
typical MDRB compared to the related works. The gain
in performance of the proposed algorithm results in a
higher algorithm overhead. However, we will show in
our implementation that the overhead is relatively small
in typical system settings.
The paper is organized as follows. We survey related
works in Section 2. Section 3 defines the real-time bus
transactions and the scheduling model. In Section 4,
we discuss our proposed real-time scheduling algo-
rithms for MDRB starting from a simpler case, that of
topographically-acyclic transactions. Section 5 extends
the algorithms to cyclic transaction sets. Section 6 pro-
vides a simulation-based evaluation of the proposed al-
gorithms, while Section 7 discusses our implementation
of the algorithms in a real system. We conclude our
paper in Section 8.
2 RELATED WORKS
Many of the early works on hard real-time communi-
cation [15], [24], [20] focus on communication between
computers on single-domain bus networks. In these net-
works, only one transaction can be transferred on a bus
at any time because the bus is shared between all trans-
actions. A system with multiple buses is considered
in [12]. However, each bus in the system still has one
domain. Since a single-domain bus bears a similarity
to single-processor systems, the traditional real-time
scheduling theory for single-processor systems [17] is
applied or extended to solve the problem in these
works. The many-core SoC in which we are interested
have multi-domain buses where non-overlapping trans-
actions can be transferred concurrently. In addition, the
number of domains on a bus is determined by the
topology of bus transactions.
There has also been significant research focused on
real-time communication on multi-domain buses. Most
of these works [22], [23], [4], [16], [19] are concerned
with the fixed-priority scheduling paradigm. For exam-
ple, in [22], [23], Zheng et al. propose a solution to
optimally assign fixed priorities to real-time transac-
tions and a method to analyze the worst-case trans-
action latency (WTL) under a fixed-priority scheduling
algorithm. Although our work has the same assumption
about multi-domain buses, our proposed scheduling
algorithm is based on the dynamic-priority scheduling
paradigm. To the best of our knowledge, our research
is the first to do so. As will be shown in the evaluation
section, the performance of our approach on a typical
MDRB is better than that of related works.
A preliminary version of this work has appeared in
[8]. Compared to [8], this paper has following improve-
ments. First, we modified both algorithm POBase and
POGen. The modification result in algorithms with sig-
nificant lower time complexity than that of [8]. Second,
we developed a complete solution for topographically-
cyclic transaction sets which is not in [8]. Third, we
added an evaluation section where we extensively com-
pared the performance of the proposed algorithms with
the state of the arts. Finally, we have fully implemented
the algorithms in a real system and performed experi-
ments to measure the algorithm overhead.
3 REAL-TIME BUS TRANSACTION AND
SCHEDULING MODEL
A model of the Multi-Domain Ring Bus (MDRB), in
which we are interested, is shown in Figure 1. Our
research focuses on MDRB with one bidirectional ring
since they are most common; however, the proposed
scheduling algorithms can also be applied to MDRB
with two unidirectional rings (one clockwise and one
counterclockwise) by analyzing each direction sepa-
rately. In the MDRB shown in Figure 1, each bus
element is connected to the MDRB through a dedicated
bus router that is able to transmit a transaction toward
its destination. The MDRB has a ring structure in which
each router has direct connections through two bus
segments to its two neighboring routers.
We study systems where applications running on
multiple, possibly heterogeneous processing cores (i.e.
3bus elements) exchange data through a MDRB. A data
transaction is defined as a request made by an appli-
cation to transfer a certain amount of data between
two bus elements. We consider a scheduling problem
where applications request periodic data transactions,
each comprising an infinite sequence of jobs. Each data
transaction is divided at the router level into multiple
fixed-size packets called atomic transactions, which are
then transferred hop-by-hop from the source to the
destination. We do not impose specific constraints on
the way routers and bus segments between them are
implemented: worm-hole routers [21] are particularly
suitable in networks-on-chip, but store-and-forward im-
plementations are also possible. We assume that each
data transaction has a fixed route which consists of
bus elements through which it reaches the destination.
Two data transactions overlap and can not be transferred
concurrently if their routes share a same bus segment
(e.g., each bidirectional bus segment can only be ac-
cessed in one direction at a time). However, multiple
non-overlapping data transactions can be sent at the
same time.
Our selection of the bus scheduling model aims at
providing software designers with an accurate view of
the bus real-time performance, while abstracting away
the details of the low-level physical implementation. To
this end, the proposed bus scheduling model does not
schedule data transfers in term of atomic transactions.
Instead, the bus elements which are the endpoints of a
data transaction are synchronized and programmed to
transfer a fixed portion of the data transaction at a time.
Each portion consists of multiple atomic transactions.
We call this portion of data a unit transaction. All
unit transactions have an equal transmission time that
is a slot, and each data transaction typically requires
multiple slots (i.e. it is composed of multiple unit
transactions). All scheduling decisions are made on a
slot-by-slot basis.
Our model effectively abstracts most low-level imple-
mentation details because of two main reasons. 1) Since
overlapping data transfers are never executed simul-
taneously, data packets (i.e. atomic transactions) never
contend for access to the bus. Therefore, the schedule
of atomic transactions is entirely predictable rather than
being dictated by specific low-level atomic arbitration
algorithms and/or router switching techniques. 2) The
variable end-to-end delay required to transfer an atomic
transaction from source to destination is hidden by the
slot abstraction. In particular, the transmission time of
a unit transaction includes the maximum transmission
delay between any two elements. The rest of this
section introduces the notation, terminology and basic
results that will be used in the scheduling algorithms
of Sections 4, 5.
3.1 Data Transaction Model
Let bus elements be indexed with a unique number
in [1; N ] where N  is the number of bus elements.
Multi-Domain Ring Bus
1 62 543
1τ
2τ
3τ
4τ 5
τ
6τ
12 11 10 9 8 7
8τ
7τ
Router
n
Bus Element Bus Segment Transactions
Fig. 1. Bus Architecture and Acyclic transaction set
1 2 3 4 5 6 7 8 9 10 11 12
2τ
1τ
3τ
4τ
5τ
6τ
7τ
8τ
element 
index
13
Fig. 2. Indexed straight line representation
We define T as the set of data transactions: T =
fi : i = [1; N ]g. A data transaction i is characterized
by a tuple i = (ei; pi; 1i ; 
2
i ) where ei is the time
that the bus spends to transmit a job of i, pi is the
period of i. Each job must complete within its period,
i.e. relative deadlines are equal to periods. 1i and 
2
i ,
where 1i 6= 2i , are the indexes of the source and
destination bus elements of the transactions. 1i , 
2
i are
called the first and second endpoint of i, respectively.
A transaction has two endpoints 1i and 
2
i if its route
uses all consecutive routers from element 1i to 
2
i in the
clockwise direction. Transaction i is said to go through
element  if  6= 1i ,  6= 2i and element  is on the
route of i. The bus utilization ui of i is calculated as:
ui = ei=pi. We assume that all data transactions arrive
at time 0. Let hyper-period h of T be the least common
multiple of the periods of all transactions in T .
Two transactions are said to overlap and can not be
transferred concurrently on the bus if they use a same
bus segment. Given a data transaction set T , we define
an overlap indicating function OV : T  T 7! f0; 1g
where OV (i; j) = 1 if i and j overlap, and 0
otherwise. Figure 1 shows a transaction set where 1, 2,
3, and 4 overlap each other but they do not overlap
7.
A pairwise overlap set (PO-set) D is defined as a max-
imal subset of T such that 8i; j 2 D : OV (i; j) = 1.
For convenience, we consider that a transaction that
does not overlap any transactions belongs to a PO-set
that contains only that transaction. In general a trans-
action may belong to more than one PO-set. Figure 1
shows an example of a transaction set with four PO-sets:
D1 = f1; 2; 3; 4g, D2 = f2; 4; 5g, D3 = f4; 5; 6g,
D4 = f7; 8g. Let the total number of PO-sets in a
transaction set be ND. Since each PO-set contains at
least one element different from those of other PO-sets
and transactions are arranged in an one dimensional
4Multi-Domain Ring Bus
1 62 543
1τ
2τ
3τ
4τ
5τ
12 11 10 9 8 7
Fig. 3. Cyclic transaction set
space, ND  N .
A transaction set is said to be acyclic if there exists a
bus element which has no transaction going through.
The transaction set is cyclic, otherwise. Figure 1 shows
an example of an acyclic where element 1, 7, and 8
have no transaction going through, whereas Figure 3
shows an example of a cyclic transaction set. For ease of
identifying the first and second endpoints in the figure,
we depict each transaction i as an arrow which always
directs from the first endpoint to the second endpoint
of i. The direction of the arrow does not imply the
direction of the transaction.
3.2 Scheduling Model
We adopt the discrete scheduling model of PFair
scheduling used in [5]. In this model scheduling de-
cisions are made at integral values, starting from 0.
The real interval between time t 2 N and time t + 1
i.e. [t; t + 1) is called slot t. We assume that every
transaction’s execution time and period are multiples
of slots. Thereafter, we will use a slot as a time unit
unless specified otherwise. A schedule S is defined as
a function S:    N 7! f0; 1g where S(i; t) = 1 if and
only if i is scheduled at slot t. A schedule S is valid
if and only if according to S, it never happens that a
transaction is scheduled in the same slot together with
one or more other transactions that overlap with it.
Given the constraint on overlapping transactions, a
necessary condition on the schedulability of a transac-
tion set can be easily derived as in Theorem 3.1.
Theorem 3.1: A transaction set T is schedulable only
if:
8D  T : uD =
X
8i2D
ui  1 (3.1)
Proof: Since, by definition, no two transactions of a
PO-set D can be scheduled concurrently, all transactions
of D must be scheduled in sequence. In other words, the
transactions of D can be considered to be sharing one
resource. Therefore, Inequality 3.1 must be satisfied.
Let E(k) be a set of all transactions in T that use the
bus segment between bus element k and k + 1. The
following lemma is necessary for later discussion.
Lemma 3.1: Given a transaction set T that satisfies the
necessary condition, the following inequality holds.X
8i2E(k)
ui  1
Proof: Since transactions in E(k) pairwise overlap,
there exists D such that E(k)  D. Therefore the lemma
is implied by Theorem 3.1.
3.3 Indexed Straight-line Representation of Acyclic
Transaction Sets
For ease of presentation, thereafter, we use the indexed
straight-line representation described below to model
acyclic transaction sets. Given an acyclic transaction
set, we select a bus element which has no transaction
going through to be the first element. Then, the bus
elements are indexed ascendingly from 1 to N  in
clockwise direction in which the first element’s index is
1. Bus elements in Figure 1 are indexed following this
definition. Since there are no transaction going through
element 1, the overlaps between transactions in the
acyclic transaction set remains the same if we do the
following transformation: 1) let bus element N , instead
of connecting to bus element 1, connect to an additional
bus element which is indexed N  + 1 ; 2) change every
transaction which has the second endpoint at 1, i.e.
i = (ei; pi; 
1
i ; 1), to be i = (ei; pi; 
1
i ; N
 + 1). For
example, in Figure 1, 7 = (e7; p7; 8; 1) will be changed
to be 7 = (e7; p7; 8; 13). Since the overlaps between
transactions are still the same after the transformation, a
valid schedule of the transformed transaction set is also
a valid schedule of the original transaction set and vice
versa. Given this transformation, the acyclic transaction
set can be represented as a set of overlapping line
intervals on an indexed straight line where each line
interval corresponds to a transaction and the straight
line is indexed from 1 to N  + 1. Figure 2 shows the
indexed straight line representation of the transaction
set shown in Figure 1. The following properties are
obvious in the straight-line representation of an acyclic
transaction set.
Property 1: For every transaction i, we have 1i < 
2
i .
Property 2: For every transaction i and j , i and j
overlap if and only if 1i < 
2
j and 
1
j < 
2
i .
We study the scheduling problem for acyclic transac-
tion sets in Section 4. We then extend our solution to
cyclic transaction sets in Section 5.
4 SCHEDULING ALGORITHMS FOR ACYCLIC
TRANSACTIONS
In this section we present our scheduling algorithms
for the proposed real-time transaction sets on the ring
buses. The discussion is divided into two parts.
First, we propose an algorithm, namely POBase,
which schedule every acyclic transaction set whose
transactions have the same period. We will prove that
the necessary condition (Theorem 3.1) is also the suffi-
cient condition for same-period acyclic transaction set to
be schedulable by POBase. Therefore, POBase is optimal
for these transaction sets.
Second, a scheduling algorithm, namely POGen, is
proposed to schedule acyclic transaction sets whose
5transactions do not have the same period. POGen, which
is built based on POBase, is an online algorithm. POGen
can schedule all transaction sets whose PO-set utiliza-
tions satisfy the following utilization bound:
8D  T : uD  L  1
L
; (4.1)
where L is defined as the greatest common divisor of
all transaction periods. Although the utilization bound
is sufficient, it approximates 1 when L is large. We
believe that this assumption holds in most practical
real-time applications [18]. As we will show in the
implementation section, with the speed of the state of
the art many-core SoC [3], the practical slot size is about
100us to 10us (which is also the size of a time unit
in our definition). Meanwhile, the period granularity
in practical real-time applications [18] is at the level of
milliseconds. That means L has practical values ranging
from 10 to 100 slots. This results in the utilization bound
between 0.9 and 0.99.
4.1 The POBase algorithm
The problem of acyclic same-period transaction set
scheduling is similar to the Interval Coloring Problem
[14]. More specifically, the latter is a special case of the
former because it assumes that all transactions have
the same execution time. Hence, the coloring algorithm
in [14] can only handle this special case. Our pro-
posed algorithm POBase is a new algorithm to solve
the problem at hand. POBase is a first-fit algorithm
with respect to a transaction ordering. More specifically,
in POBase, the transactions are ordered ascendingly
by their first endpoint (stored in list L). Then, each
transaction in L is assigned to the earliest slots where
no smaller-ordered overlapping transaction has been
already assigned to1. This condition is enforced by the
use of array lastEndpoint (Step 6). lastEndpoint has size
equal to the transactions’ period p. The initial values of
all items of lastEndpoint is 1 which is also the smallest
index of the bus elements. Except for the initial value,
during the algorithm execution, the value of item t of
lastEndpoint will be the second endpoint (Step 8) of the
last transaction that has been assigned to slot t. Since
transactions are being assigned in ascending order of
their first endpoints, if condition lastEndpoint[t]  1i
in Step 6 is satisfied then i does not overlap with all
transactions that have been assigned to t before i. We
will formally prove this statement in Lemma 4.2. This
proof requires Lemma 4.1. Finally, the proof of POBase’s
optimality will be shown in Theorem 4.1.
Figure 4 shows an example of the schedule generated
by POBase for the transaction set shown in Figure 1
whose transactions have period equal to 8 and execu-
tion times: e1 = 2; e2 = 1; e3 = 2; e4 = 3; e5 = 4; e6 =
1; e7 = 4; e8 = 4. Consider the schedule of transactions
1. The transactions can also be ordered by their second endpoint
and the schedule is generated in descending ordered of the order list.
0 8
time
1τ
2τ
3τ
4τ
5τ
6τ
7τ
8τ
2 4 61 3 5 7
Fig. 4. An example of the POBase algorithm
of D2 = f2; 4; 5g. 5 is scheduled in slots f0; 1; 3; 4g,
because its smaller-ordered overlapping transactions 2
and 4 are scheduled in slots f2; 5; 6; 7g.
Lemma 4.1: At each iteration of Step 6, for every j
which has been assigned to slot t, 2j  lastEndpoint[t].
Proof: We prove by induction.
Base case: Consider the first iteration, the lemma holds
because there is not any transaction being assigned.
Induction case: Assume that the lemma holds at itera-
tion k where j is being assigned, we will prove that it
also holds at iteration k + 1. Consider slot t to which
j is assigned. Due to the condition at Step 6, we have
lastEndpoint[t]  1j . Then by the induction assumption,
we have: for every l that has been assigned to t
before iteration k, 2l  1j . Furthermore, when j is
assigned to duration t, we have lastEndpoint[t] = 2j
after Step 8. Since by Property 1 of the indexed straight-
line presentation, 1j < 
2
j , we have the lemma holds at
iteration k + 1.
Algorithm 1 POBase
Input: T such that 8i 2 T : pi = p
Output: schedule S for period p
1: L  list of 8i 2 T in ascending order of 1i
2: 8t 2 [0; p) : lastEndpoint[t] 1
3: for each i 2 L do
4: r  ei
5: for each t 2 [0; p) do
6: if lastEndpoint[t]  1i then
7: S(i; t) 1
8: lastEndpoint[t] 2i
9: r  r   1
10: if r = 0 then
11: break //complete schedule assignment of i
Lemma 4.2: Slot t has not been assigned to any over-
lapping transaction of i if and only if lastEndpoint[t] 
1i .
Proof:
Necessary condition: We prove this condition by show-
ing that if lastEndpoint[t] > 1i then slot t has
been assigned to a transaction that overlaps i. Since
6lastEndpoint[t] > 1i , there must exist a transaction j
that has been assigned to t before i where 2j =
lastEndpoint[t] > 1i . And since j has been assigned to
the slot before i, 1i  1j . Therefore, we have: 2j > 1i
and 2i > 
1
j . Then, by Property 2 of the indexed straight-
line presentation, i and j overlap.
Sufficient condition: If lastEndpoint[t]  1i , then by
Lemma 4.1 we have that: for every j that has been
assigned to slot t before i, 2j  lastEndpoint[t]  1i .
Therefore, by Property 2 of the indexed straight-line
presentation, j and i do not overlap.
Theorem 4.1: POBase is optimal for same-period
acyclic transaction sets.
Proof: The generated schedule is valid because a
transaction is not scheduled in the same slot with its
overlapping transactions (Lemma 4.2) . It remains to
show that if a transaction set satisfies the necessary
condition, then at the end of the algorithm,
8i :
X
x2[0;p)
S(i; x) = ei: (4.2)
We will prove this by induction.
Base case: Consider the first iteration of the for-loop
starting at Step 3. In this iteration, the schedule of 1
in L is generated. Since all items of lastEndpoint have
value 1, the condition at Step 6 satisfies for every t.
Furthermore, we have e1  p. Therefore, at the end
of the iteration, Equation 4.2 must holds for 1 i.e.P
x2[0;p) S(1; x) = e1.
Induction case: Assume after iteration k of the for-loop
starting at Step 3, Equation 4.2 holds for all transactions
fi : i 2 [1; k]g. We will prove that Equation 4.2
also holds for k+1 after iteration k + 1. By contra-
diction, assume that at the end of the iteration k + 1,P
x2[0;p) S(k+1; x) < ek+1. Let E(
1
k+1) be the set of
transactions that use the bus segment between bus
element 1k+1 and 
1
k+1 +1. By the way the transactions
are ordered and Property 2 of the indexed straight-
line presentation, we have that 8i 2 T if the schedule
of i has been generated before k+1 and i overlaps
k+1 then 1i  1k+1 < 2i . Therefore i 2 E(1k+1). It is
because 1i  1k+1 < 2i . In other words, among all the
transactions that overlap with k+1, only transactions in
E(1k+1) have their schedule been generated. Therefore,
the contradiction assumption occurs only when:X
i2E(1k+1)
X
x2[0;p)
S(i; x) = p: (4.3)
Since the following is true:
8i 2 E(1k+1) n fk+1g :
X
x2[0;p)
S(i; x)  ei;
by the contradiction assumption and Equation 4.3 we
have:
P
i2E(1k+1) ei > p. This contradicts with Lemma
3.1 which implies that
P
i2E(1k+1) ei  p. Therefore, at
the end of the iteration, Equation 4.2 must hold for k+1.
This completes the proof.
POBase Analysis: an efficient sorting algorithm has time
complexity O(N log(N)). Furthermore, Step 6 to 11
requires constant number of operations. Therefore the
time complexity of POBase to build a schedule of p slots
for N transactions is O(N max(log(N); p)).
4.2 The POGen algorithm
In this subsection we propose an online scheduling
algorithm (POGen) for acyclic transaction sets whose
transactions do not have the same period. In POGen,
the execution time line from 0 to the hyper-period h,
i.e. [0; h), is divided into a set of consecutive scheduling
intervals: fintk = [tk; tk+1) : k 2 N ^ 0  tk < tk+1 < hg.
Let jintkj = tk+1   tk. In each scheduling interval
intk, each transaction i is assigned an interval load lki
which is the number of slots in the interval allocated
to schedule i. The interval loads of each transaction
is calculated such that at the end of each interval, the
transaction’s execution approximates its execution in
the fluid scheduling model [13]. The interval load of
a PO-set is the sum of the interval loads of its trans-
actions. Given the interval loads of all transactions in
interval intk, POBase is used to generate the schedule of
intk. As shown in the previous subsection, the interval
schedule given by POBase will be feasible if and only
if:
8D  T :
X
i2D
lki  jintkj:
A schedule of a transaction set, which is generated
by POGen, is feasible if it satisfies the following two
conditions:
Condition 1: for each transaction i, the sum of the
interval loads over the transaction period is equal to ei.
Condition 2: there is a feasible schedule for every
scheduling interval.
In the following paragraphs, we will discuss our solu-
tion to identify the scheduling intervals and the interval
loads which induces a feasible schedule.
Our proposed solution is inspired by the work in [25].
However, since this work does not have the transaction
overlap assumption, their proposed algorithms can not
be used for the problem at hand. In POGen, scheduling
intervals must respect two fundamental properties: 1)
the arrival time (also deadline) of any transaction must
coincide with the finishing time of a scheduling interval
and the start time of the next one; 2) the minimum
length of any scheduling interval must be at least L
where L is the greatest common divisors of all trans-
action periods. As we will show in Theorem 4.2, the
second property is essential to induce a feasible uti-
lization bound: intuitively, longer scheduling intervals
allow POGen to better approximate the fluid schedul-
ing model. There are multiple feasible assignments of
scheduling intervals that respects the two properties.
For example, Figure 6 shows the scheduling intervals
induced by the set of three transactions shown in Figure
5, where 1 = fe1 = 1; p1 = 2; 11 = 1; 21 = 3g,
71 2 3 4 5 6 7 8 9 10
2τ
3τ
element index
1τ
Fig. 5. Example of three transactions
1τ
2τ3τ
0 5 time
2τ
1τ
1 2 3 4 6
1τ 3τ
2τ
1τ
0int 1int 2int 3int
Fig. 6. Scheduling intervals on the execution time line
2 = fe2 = 1; p2 = 3; 12 = 1; 22 = 4g and 3 = fe3 =
1; p3 = 6; 
1
3 = 2; 
2
3 = 5g. In this example, the scheduling
intervals are the intervals between two closest arrival
times of any two transactions. Note that by definition
of L and since all transactions arrive at time 0, it follows
that the minimum length of scheduling intervals in the
example is indeed L. An alternative feasible definition
for scheduling intervals consists in assigning tk = kL,
e.g. all scheduling intervals have fixed length L. By
definition of L, it then follows that the arrival time of
any transaction coincides with the start time tk of some
interval intk. In the rest of this section, we will not
restrict ourselves to any specific interval assignment,
instead only assuming that scheduling intervals respect
the two fundamental properties.
With regard to the interval loads, we define for
each transaction i and scheduling interval intk a lag
function:
lag(i; int
k) = ui  tk+1  
X
x2[0;tk)
S(i; x):
The function calculates how much time i must be
executed in interval intk such that at the end of intk it is
scheduled according to the fluid scheduling model [13].
We also define for each PO-set D a similar lag function:
lag(D; intk) = uD  tk+1  
X
i2D
X
x2[0;tk)
S(i; x):
The goal of POGen is to generate a feasible load set for
each interval intk, that is, a set of transaction loads that
satisfy the following inequalities:
8i 2 T : blag(i; intk)c  lki  dlag(i; intk)e; (4.4)
8D  T : blag(D; intk)c 
X
i2D
lki
 min(jintkj; dlag(D; intk)e): (4.5)
Inequality 4.4 sets conditions on the interval load for
each transaction, based on the closest integral values of
the lag functions. Inequality 4.5 sets conditions on the
total interval load of each PO-set. Note that the right
side of Inequality 4.5 guarantees that each PO-set with
feasible loads is schedulable in intk by POBase, that is,
Condition 2 is satisfied for intk. Similarly, if all loads
satisfy the lower bounds of Inequalities 4.4, then the
generated schedule satisfies Condition 1. The reason is
as follows. Consider the last scheduling interval of a
period of transaction i: int = [t; a  pi) where t and a
are some integers, the lag function of i is:
lag(i; int) = a  ui  pi  
X
x2[0;t)
S(i; x):
Since ui  pi = ei is an integer, and so is S(i; x),
blag(i; int)c = lag(i; int). That means the total interval
loads of i up to slot a  pi, which is calculated as:
blag(i; int)c+
X
x2[0;t)
S(i; x);
is equal to a  ei and satisfies Condition 1. However
using only the lower bound loads does not guarantee
that Inequality 4.5 can be satisfied at the same time.
This is also true if only upper bound loads are used.
The following example illustrates this point. Consider
again example of the transaction set in Figure 5 and 6. If
the algorithm runs with interval loads to be their lower
bound loads, then the schedule of interval [4; 6) is not
feasible because the total load in this interval is 3. If
otherwise, the upper bound loads are used only, then
the schedule of interval [0; 2) is also not feasible because
the total load in this interval is 3. An algorithm that
generates feasible schedules must use a combination of
these values and computing this is not trivial.
POGen achieves this feat by iteratively computing a
feasible load set for each scheduling interval. It is an
online algorithm which is invoked at the beginning
of each interval and generates the schedule for that
interval. In Lemma 4.3, we first show that the following
inequalities initially hold at the beginning of the first
interval int0:
8D  T : blag(D; intk)c  jintkj; (4.6)
8D  T :
X
i2D
X
x2[0;tk)
S(i; x)  buD  tkc: (4.7)
A feasible load set is then computed in Step 1 of POGen
by the GenerateLoad procedure, which is assumed to
honor the following proposition:
Proposition 4.1: Assume that all PO-sets satisfy the
utilization bound in Inequalities 4.1. If Inequalities 4.6,
4.7 hold for intk, then GenerateLoad computes a feasible
load set for intk.
Given a feasible load set for interval int0, Lemma 4.3
guarantees that Inequalities 4.6, 4.7 again hold for
int1. Hence, in the next execution of POGen at t1,
GenerateLoad can compute a feasible load set for int1,
and so on and so forth for all scheduling intervals in
the hyper-period. Since a feasible load set is obtained
for all scheduling intervals, Condition 1 and Condition
2 are satisfied and thus POGen generates a feasible
schedule of T . In the next Section 4.3, we will prove
that GenerateLoad indeed honors Proposition 4.1.
8Algorithm 2 POGen
Input: transaction set T , interval intk
Output: schedule S for intk
1: flki : 8i 2 [1; N ]g  GenerateLoad(T ,intk)
2: T 0  flki ; jintkj; 1i ; 2i g : 8i 2 [1; N ]	
3: S for intk  POBase(T 0)
Lemma 4.3: If GenerateLoad honors Proposition 4.1,
then Inequalities 4.6, 4.7 hold for every scheduling
intervals.
Proof: We prove by induction.
Base step: Consider the first scheduling interval int0 =
[0; t1). Inequalities 4.6 for this interval hold because
8D  T : blag(D; int0)c = buD  t1c  jint1j;
and Inequalities 4.7 hold because
8D  T :
X
i2D
X
x2[0;0)
S(i; x) = 0 = buD  0c:
Induction step: Assume that Inequalities 4.6, 4.7 hold
in every scheduling interval up to intk. We prove that
Inequalities 4.6, 4.7 also hold before the execution of
GenerateLoad at interval intk+1. Since Inequalities 4.6,
4.7 are satisfied at interval intk, GenerateLoad generates
a feasible load set and POBase generates a feasible
schedule for the interval. Therefore after Step 3, we
have:
8D  T :
X
i2D
X
x2[tk;tk+1)
S(i; x) =
X
i2D
lki :
Then by the left side of Inequalities 4.5, we obtain the
following which proves that Inequalities 4.7 hold for
intk+1.
8D  T :
X
i2D
X
x2[0;tk+1)
S(i; x)
=
X
i2D
X
x2[0;tk)
S(i; x) +
X
i2D
lki

X
i2D
X
x2[0;tk)
S(i; x) +j
uD  tk+1
k
 
X
i2D
X
x2[0;tk)
S(i; x)
= buD  tk+1c
Now consider Inequalities 4.6. Notice that since S(i; x)
is integer, we have:
8D  T : blag(D; intk+1)c
=
j
uD  tk+2
k
 
X
i2D
X
x2[0;tk+1)
S(i; x):
Since Inequalities 4.7 hold for intk+1, Inequalities 4.6
also hold because:
8D  T : blag(D; intk+1)c
=
j
uD  tk+2
k
 
X
i2D
X
x2[0;tk+1)
S(i; x)
 buD  tk+2c   buD  tk+1c
 duD  (tk+2   tk+1)e  jintk+1j:
This completes the proof.
4.3 The GenerateLoad procedure
As we mentioned, procedure GenerateLoad searches for
a feasible load set of each scheduling interval. There
are two questions that have to be answered: (1) is there
a feasible load set? (2) is there an efficient algorithm
to find it? We will show that the problem at hand is
equivalent to the problem of circulations in graphs with
loads and lower bounds [14]. This is the problem of
finding a feasible circulation flow in a directed graph
where each edge has a capacity and a lower bound.
Furthermore, we will prove that if the utilization of
each PO-set is smaller than the utilization bound ex-
pressed by Inequalities 4.1, there always exists a feasible
solution therefore answering Question 1. Then, since the
Ford-Fulkerson algorithm [14] can be used to solve the
problem, Question 2 is also answered.
In the following, we will intuitively describe the
construction of a directed graph from the input of
GenerateLoad. Each vertex of the constructed graph
represents a PO-set Dj . We define for each vertex, a PO-
set edge gDj whose real-valued flow fDj represents the
interval load of the corresponding PO-set. An integer-
valued lower bound bDj and an integer-valued capacity
cDj are defined for each of the PO-set edges such that
Inequalities 4.5 are imposed on their flow values, i.e.:
8Dj  T : bDj  fDj  cDj : (4.8)
where
bDj = blag(Dj ; intk)c;
cDj = min(jintkj; dlag(Dj ; intk)e):
Furthermore, for each transaction i, a transaction edge
is defined whose real-valued flow fi represents the
interval load of the corresponding transaction. A lower
bound value bi and a capacity ci are defined for each
of the transaction edges such that Inequalities 4.4 are
imposed on their flow values:
8i 2 T : bi = blag(i; intk)c  fi  ci = dlag(i; intk)e:
(4.9)
The flow of a transaction edge entering a vertex rep-
resents the contribution of the corresponding transac-
tion’s interval load to the corresponding PO-set’s inter-
val load. The endpoints and the direction of each edge
are defined in such a way that the values of the flows
in and out a vertex preserve the relationship between
9the interval load of the corresponding PO-set and that
of its transactions. The graph has a feasible circulation
flow which represents a feasible load set.
The following definition is necessary for the graph
construction. Let the index PO-set order of a transaction
set T be an ordered list of all PO-sets in T where
PO-set D with smaller mini2Dj 2i has smaller index.
Ties are broken arbitrarily. Since each PO-set has
only one value mini2Dl 
2
i , the order is well-defined.
The transaction set in Figure 1 has the index PO-set
order be fDj : j 2 [1; 4]g where D1 = f1; 2; 3; 4g,
D2 = f2; 4; 5g, D3 = f4; 5; 6g, D4 = f7; 8g. Figure
7 shows the graph G constructed from the transaction
set in Figure 1. Transaction edges are represented
by solid lines while PO-set edges are represented by
dotted lines.
Graph construction: let us define a tuple G = (V;E)
as follows:
 For each PO-set Dj in the index PO-set order, define
a vertex vj .
 For each PO-set Dj in the index PO-set order,
define a directed edge gDj with capacity c
D
j =
min(jintkj; dlag(Dj ; intk)e) and lower bound bDj =
blag(Dj ; intk)c. Let gDj be a PO-set edge.
 For each transaction i, define a directed edge gi
with capacity ci = dlag(i; intk)e, and lower bound
bi = blag(i; intk)c. Let gi be a transaction edge.
 fgi : i 2 D1g are edges that enter v1; gD1 is an edge
that exits v1.
 8j : 1 < j  ND, fgi : i 2 Dj n Dj 1g and
gDj 1 are edges that enter vj ; fgi : i 2 Dj 1 n Djg
and gDj are edges that exits vj . This construc-
tion step deals with the situation where two PO-
sets Dj 1;Dj share some transactions. Intuitively,
to preserve the relationship between the interval
loads of the PO-sets and that of its transactions, the
transaction edge of a transaction common to two
PO-sets would have to enter the two corresponding
vertexes vj 1; vj . Since in a qualified graph, each
directed edge can enter at most one vertex, this
situation must be avoided. This can be accom-
plished by representing the interval loads of the
common transactions on the second PO-set (vj) as
the interval load of the first PO-set (i.e., gDj 1 enters
vj) minus the interval load of the transactions that
are only in the first set (i.e., fgi : i 2 Dj 1 n Djg
exit vj). Lemma 4.4 will detail the proof of this
argument.
 V = fvj : j 2 [1; ND]g
 E = fgi : j 2 [1; N ]g [ fgDj : j 2 [1; ND]g
Finally, the graph flow is subject to the flow conversa-
tion constraint [14] in which given a vertex, the sum
of the flow values entering it minus the sum of the
flow values existing it is zero. As a graph construction
example, consider PO-set D2. Vertex v2 has an output
PO-set edge gD2 which represents the interval load of D2.
1g
8g
7g6g
5g
4g
3g
2g
1g
D
2g
D
3g
D
4g
D
2v1v 3v 4v
transaction edge PO-set edge
Fig. 7. Constructed graph G
Since D1 has 2 and 4 in common with D2 but not 1
and 3, v2 has an input PO-set edge gD1 which represents
the interval load of D1 and two output transaction
edges g1 and g3 that represent the interval loads of 1
and 3, respectively. Furthermore, note that edges g2
and g4 for the common transactions 2 and 4 enter
v1, not v2. g2 exits v3, but g4 exists v4 because 4 also
belongs to D3. Finally v2 has an input transaction edge
g5 that represents the interval load of 5. Since 5 is in
D2 and D3 but not D1 and D4, g5 exits v4. Lemma 4.4
shows that G is indeed a directed graph.
Lemma 4.4: G is a directed graph.
Proof: Since every edge of G is directed, it remains
to show that each edge has only one or two endpoints.
There is one edge defined for each PO-set and one edge
defined for each transaction.
For each PO-set Dj , the PO-set edge gDj exits only vj . In
addition, gDj enters only vj+1 when j < N
D. Therefore
each PO-set edge exists exactly one vertex and enters
at most one vertex.
By the index PO-set ordering, if i 2 Dj n Dj 1, then
i =2 Dk n Dk 1 where j < k  ND. Therefore, the
elements of the following set are disjoint: A = fgi :
i 2 D1g; fgi : i 2 Dj n Dj 1g : j 2 (1; ND]
	
. By
definition, A contains the transaction edges of G that
enter some vertices. Also the union of the elements of
A is fgi : i 2 T g. Therefore, each transaction edges
enters exactly one vertex.
By a similar proving technique, we can show that each
transaction edge exists at most one vertex. Due to space
constraints, we skip the detailed proof. In conclusion,
every edge of G has at most two endpoints and is
directed.
It remains to show that GenerateLoad honors Proposi-
tion 4.1 and therefore POGen generates feasible sched-
ules for all transaction sets that satisfy the utilization
bound of Inequalities 4.1. For simplicity of exposition,
we split the proof in multiple lemmas. First, Lemma
4.5 proves an important property of graph G regarding
the flow values. Then, this property will be used to
prove in Lemma 4.6 that graph G has a feasible flow
if Inequalities 4.6, 4.7 are satisfied for interval intk and
furthermore all PO-sets satisfy an utilization constraint
based on jintkj. Note that we know from [14] that if
graph G has a feasible flow, then it has an integral
feasible flow which can be found by the Ford-Fulkerson
algorithm [14]. Therefore, to complete the proof, we will
have to prove that a feasible load set can be derived
10
from an integral feasible flow of G (Lemma 4.7). Finally,
we will show that the utilization bound of Inequalities
4.1 implies the utilization bound used in Lemma 4.6.
Hence, Proposition 4.1 holds.
Lemma 4.5: A flow in graph G honors the flow con-
servation constraint at every vertex vj if and only if the
following equalities hold for every PO-set Dj :X
i2Dj
fi = f
D
j : (4.10)
Proof: We prove this by induction over the ordered
set of vertices.
Base case: By the graph construction rules regarding
the edges that enter and exist v1, the flow conservation
constraint holds at vertex v1 if and only if:X
i2D1
fi = f
D
1
Induction case: Assume that the lemma holds for every
PO-set from D1 to Dj 1. We will prove that the lemma
also holds for Dj . By the graph construction rules
regarding the edges that enter and exist vj and the
definition of the flow conservation constraint at vertex
vj , we have X
i2DjnDj 1
fi =
X
i2Dj 1nDj
fi + f
D
j   fDj 1:
Since X
i2Dj
fi =
X
i2DjnDj 1
fi +
X
i2Dj\Dj 1
fi;
we have the flow conservation constraint holds at ver-
tex vj if and only ifX
i2Dj
fi =
X
i2Dj 1nDj
fi + f
D
j   fDj 1 +
X
i2Dj\Dj 1
fi
=
X
i2Dj 1
fi + f
D
j   fDj 1
Finally, by the induction hypothesis, the above equation
is equivalent toX
i2Dj
fi = f
D
j 1 + f
D
j   fDj 1 = fDj
This complete the proof.
Lemma 4.6: There exists a feasible flow in graph G
if Inequalities 4.6, 4.7 are satisfied for interval intk and
furthermore the PO-set utilizations satisfy the following
condition.
8Dj  T : uDj 
jintkj   1
jintkj (4.11)
Proof: First note that Inequalities 4.6 are necessary
for the edge constraints on each PO-set edge (Inequality
4.8) to be satisfied. Let us construct a flow as follows.
8i 2 T : fi = lag(i; intk)
8Dj  T : fDj = lag(Dj ; intk)
We will have to prove that the constructed flow sat-
isfies the edge constraints and the flow conservation
constraints. Given the constructed flow, it is easy to
verify that the edge constraints of each transaction edge
(Inequality 4.9) and the left-side edge constrains of each
PO-set edge (Inequality 4.8) are satisfied. The right-
side edge constraints of each PO-set edge are satisfied
because by the definition of the lag function and by
Inequalities 4.7, before the execution of GenerateLoad for
interval intk we have the following:
lag(Dj ; intk) = uDj  tk+1  
X
i2Dj
X
x2[0;tk)
S(i; x)
 uDj  tk+1   buDj  tkc
< uDj  tk+1   uDj  tk + 1:
Now by Inequalities 4.11, the following holds:
lag(Dj ; intk) < uDj  tk+1   uDj  tk + 1  jintkj:
It remains to verify that the flow conservation con-
straint is honored at every vertex. Since the constructed
flow satisfied Equation 4.10, the sufficient condition of
Lemma 4.5 proves this statement.
Lemma 4.7: If there is an integral feasible flow in
graph G, then there is a feasible load set where 8i 2
T : lki = fi .
Proof: Given an integral feasible flow, 8i 2 T let
lki = fi. The following inequality holds.
8i 2 T : blag(i; intk)c  lki  dlag(i; intk)e
Thus the interval loads satisfy Inequality 4.4. We now
have to prove that the interval loads also satisfy In-
equality 4.5. By the necessary condition of Lemma 4.5,
we have
P
i2Dj fi = f
D
j . Then since
P
i2Dj l
k
i =P
i2Dj fi and f
D
j is subject to PO-set edge constraints
in Inequality 4.8, Inequality 4.5 is satisfied.
We can finally state our main theorem.
Theorem 4.2: acyclic transaction set T is schedulable
by POGen if:
8Dj  T : uDj 
L  1
L
:
Proof: Since L  mink(jintkj), Inequalities 4.11 hold.
Assume that Inequalities 4.6, 4.7 hold for a specific in-
terval intk. Then by Lemma 4.6 and [14], the constructed
graph G has an integral feasible flow. Hence, by Lemma
4.7 algorithm GenerateLoad computes a feasible load
set, which proves Proposition 4.1. Since furthermore,
according to Lemma 4.3, Inequalities 4.6, 4.7 hold for
every interval intk, it follows that Inequalities 4.4 and
4.5, and therefore feasibility Conditions 1 and 2, also
hold for every interval. This concludes the proof.
Algorithm analysis: Since 8gi 2 E : ci   bi  1 and
8gDj 2 E : cDj   bDj  1, we have  =
P
gi2E (ci   bi) +P
gDj 2E (c
D
j   bDj )  N+ND  2N . The time complexity
of the Ford-Fulkerson algorithm in finding a feasible
circulation in graph G is O(jEj  fmax) where fmax is
11
the maximum flow value of a graph derived fromG and
fmax   (see [14] for details). Since   2N , the time
complexity of GenerateLoad is O(N2). Finally, since the
time complexity of POBase is O(N max(log(N); jintkj)),
the time complexity of POGen to generate the schedule
for intk is O(N max(N; jintkj)).
5 SCHEDULING ALGORITHMS FOR CYCLIC
TRANSACTION SETS
The cyclic transaction set scheduling problem is NP-
complete because the special case where all transmis-
sion times are 1 and all periods are equal has been
shown to be NP-complete [14]2. In this section we will
propose an approximation algorithm for this problem.
The proposed solution uses the transaction buffer at a
bus element to transform a cyclic transaction set into
an acyclic one such that the latter’s schedule can be
used to execute the former. More specifically, we select
a bus element k and split each transaction i that go
through k into two pseudo transactions  0i and 
00
i . 
0
i
transfers data of i from 1i to k, and 
00
i transfers the
data which is stored in k by  0i to 
2
i . We said that 
0
i
and  00i is feasibly transferred data of i if data of every
job of i is transferred to 2i before its deadline. The
new transaction set is acyclic since there is no transac-
tion going through k. However, there is a precedence
constraint between the pseudo transactions i.e. if  00i
transferred data in slot t then that data must be stored
in k by  0i before t. Since this constraint is not an
assumption of transaction sets that can be scheduled
by POGen,  0i and 
00
i may not feasibly transfer data of
i. This happens when  00i has the same transmission
time as i and is scheduled when there is no data of i
stored in k. We deal with this problem by increasing
the transmission time of  0i and 
00
i to be more than
that of i. However the increment in the transmission
time of pseudo transactions causes their utilization to be
bigger than that of i. The question is how to minimize
this increase. We will address this question in Lemma
5.2 after we formally describe the problem in the next
paragraphs.
As described above, we replace each transaction i 2
T where i = (ei; pi; 1i ; 2i ) and i goes through k with
two pseudo transactions (p-transactions)  0i and 
00
i where
 0i = (e
+
i ; pi; 
1
i ; k), 
00
i = (e
+
i ; pi; k; 
2
i ), and e
+
i > ei. 
0
i
and  00i have the same utilization u
+
i = e
+
i =pi > ui. i
is called the original transaction (o-transaction) of  0i and
 00i . The new transaction set is called pseudo transaction
set denoted by T 0. The following (work conserving) rule
is applied to the schedules of a p-transaction.
Rule 1: A p-transaction always transfer data of the
current job of its o-transaction in slot t if there is data of
the job stored in the source element of the p-transaction
at time t.
2. The scheduling problem of the special case is equivalent to the
Circular-Arc Coloring Problem [14].
0 20 time
0int
5 10 15 25
'iτ
1int 2int 3int 4int
empty loaded
''iτ
0t 1t 2t 3t 4t 5t
Fig. 8. Schedule of p-transactions where L =5(slots),
8k : intk = L, i = (8; 5L; 1i ; 2i ), and  0i = (12; 5L; 1i ; k),
 00i = (12; 5L; k; 
2
i ).
Note that the execution of a p-transaction in slot t can
transfer data of the current job of its o-transaction only
if the data has already been stored in its source elements
before time t, otherwise the execution does nothing. In
the former case, we say that the execution slot of the p-
transaction is loaded, and is empty in the latter. Note that
 0i execution slots are always loaded until it transfers all
the data of the current job of i. It is because when a
job of  0i is ready, a job of i is also ready and all data
of the job of i has been stored in 1i . Therefore, the
statement is true by Rule 1. Figure 8 shows an example
of the schedule of  0i and 
00
i with the given transaction
parameters and with every scheduling interval having
size L. Consider int0 in which  0i has 2 execution slots
and  00i has 3 execution slots (this number is determined
by function GenerateLoad). Since  0i is scheduled in slot
2 and 3 (this schedule is determined by POBase), there
is no data of i stored in k before time t = 3. Therefore,
the 3 execution slots of  00i in int
0 are empty.
We say that a p-transaction is effective in execution
slot t when either of the following cases occurs: 1) the
execution slot is loaded or 2) the execution slot is empty
and the p-transaction has transferred all data of the
current job of its o-transaction at time t. Note that  0i is
always effective in all slots because its execution slots
are always loaded until it transfers all the data of the
current job of i. However that is not the case for  00i .
In Figure 8,  00i is not effective in slot 0, 1, 2, and 12.
In these slots, there is still data of the current job of i
stored at 1i but there is no data of the job stored at k.
The following lemma is obvious due to Rule 1.
Lemma 5.1: Consider job j of i which is ready at t and
has deadline at t+ pi, and consider scheduling interval
intk where t  tk < tk+1  t+ p. If  00i is not effective in
one of its execution slot in intk, then, at tk+1, the amount
of data of j that has been transferred by  00i must be at
least equivalent to bu+i  tkc   u+i  t slots.
Proof: By Rule 1, if  00i is not effective in int
k, then
by tk+1,  00i must have transferred all the data of j which
had been transferred by  0i until t
k and there must be
a portion of data of j still being stored at 1i after t
k.
This also means that all execution slots of  0i between t
and tk are loaded. By POGen, the number of execution
slots of  0i between t (i.e. when job j is ready) and t
k
are bu+i  tkc  u+i  t. Since  0i is always loaded in these
slots, the amount of data of j that has been transferred
12
by  0i until t
k is equivalent to at least bu+i  tkc   u+i  t
slots. This complete the proof.
We will now describe how to determine e+i such that i
is schedulable and u+i is minimized.
Lemma 5.2: Every job of i completes before its dead-
line if T 0 is schedulable by POGen and
8intk : e+i 
jei + 1

k
; (5.1)
where  = 1  jintkj=pi.
Proof: Since T 0 is schedulable by POGen, jobs of  0i
and  00i complete before their deadlines which are the
same as the deadline of the correspondent job of i.
Thus, to complete the proof, we only need to show that
the jobs of  0i and 
00
i together transfer all the data of i
during their execution.
Consider job j of i which is ready at t and has
deadline at t+ pi. Let intk where t  tk < tk+1  t+ pi
be the last scheduling interval where  00i is not effective
in at least one slot in intk. We have:
 By POGen, the maximum number of execution slots
of  00i within [t; t
k+1) is du+i tk+1e u+i t. Therefore,
the smallest number of execution slots of  00i within
[tk+1; t+ pi) is A = e+i   du+i  tk+1e+ u+i  t.
 By Lemma 5.1, The amount of data of j that  00i
needs to transfer after tk+1 is equivalent to at most
B = ei   bu+i  tkc+ u+i  t slots.
Since  00i is always effective within [t
k+1; t+pi), all data
of i will be transferred to 2i before the deadline if
A  B ,
e+i   du+i  tk+1e+ u+i  t  ei   bu+i  tkc+ u+i  t
, e+i   ei  du+i  tk+1e   bu+i  tkc: (5.2)
Since
du+i  tk+1e   bu+i  tkc  du+i  tk+1e   du+i  tke+ 1
 du+i  jintkje+ 1;
if the following inequality is true then Inequality 5.2 is
also true:
e+i   ei  du+i  jintkje+ 1 (5.3)
Since e+i is integer, Inequality 5.3 is equivalent to
de+i  e  ei + 1: (5.4)
We complete the proof by showing in the following that
if e+i satisfies 5.1 then 5.4 is true:
de+i  e 
ljei + 1

k
 
m

lei + 1

  1

 
m
=

(ei + 1)  

= ei + 1:
The last equation in the above derivation comes from
the fact that ei is integer and  < 1.
If we set every scheduling interval such that its length
is the GCD of all periods, i.e. 8k : jintkj = L, then u+i is
minimized when
e+i =
j ei + 1
1  L=pi
k
: (5.5)
The scheduling algorithm for cyclic transaction set T ,
namely cPOGen, is summarized as follows.
Algorithm cPOGen:
 Step 1: Find k such that the derived pseudo
transaction set T 0, whose execution time of the p-
transactions are calculated by Equation 5.5, passes
the sufficient utilization bound test of POGen.
 Step 2: Schedule T 0 using POGen in which every
scheduling interval has its length to be L, and
schedule T accordingly following Rule 1.
Algorithm analysis: Step 1 of algorithm cPOGen only
needs to be executed once and can be implemented with
time complexity O(N2). Step 2 is a POGen algorithm
therefore it has the same time complexity as POGen.
6 EVALUATION
Most of the previous related works [22], [23], [4], [16],
[19] have focused on the Fixed-priority Scheduling Al-
gorithm (FPA). These works deal with the methods for
schedulability analysis and priority assignment. More
specifically, Shi et al. have recently proposed in [22] a
branch-and-bound algorithm that searches for a feasible
priority set for a transaction set. If a feasible priority
set exists, then the transactions set is guaranteed to
be schedulable under the worst-case transaction latency
(WTL) analysis proposed in [23]. The works in [23], [22]
are the state of the art.
In this section, we are interested in comparing the
performance of POGen=cPOGen on MDRB with the
solution proposed in [23], [22]. The analyzed perfor-
mance metric is the percentage of random transaction
sets which are schedulable under POGen=cPOGen and
FPA. Under POGen=cPOGen, an acyclic transaction set
is schedulable if it passes the utilization bound test of
Theorem 4.2, whereas a cyclic one is schedulable if its
pseudo transaction set is schedulable. A transaction set
is schedulable under FPA if it has a feasible priority set
generated by the algorithm in [22].
In our experiments, we used three controlled param-
eters: 1) the maximum PO-set utilization of a trans-
action set3; 2) the size of transaction sets; and 3) the
number of bus elements. The transactions’ sources and
destinations are randomly selected from the set of
bus elements. Meanwhile, the transactions’ utilization,
transmission time and period are generated as follows.
Given a maximum PO-set utilization umax, the utiliza-
tion of transaction i is initially generated according
to the uniform distribution algorithm in [6] such that
the utilizations of all PO-sets are no larger than umax.
The transmission time ei is generated as a uniformly-
distributed random number in the range of 1 to 100
slots. The period pi is then determined as dei=(uiL)eL.
Finally, given the pair of fei; pig, we recalculate ui to be
ei=pi.
3. it is equivalent to ”the maximum link utilization” in [22]
13
0.5 0.6 0.7 0.8 0.9 1.0
Maximum PO-set Utilization
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
A
cc
e
p
ta
n
ce
 R
a
te
FPA
POGen/cPOGen (L=5)
POGen/cPOGen (L=10)
POGen/cPOGen (L=20)
POGen/cPOGen (L=50)
POGen/cPOGen (L=100)
Fig. 9. Acceptance rate with various maximum PO-set
utilization
The following graphs depict the average acceptance
rate of POGen=cPOGen and FPA over 1000 different ran-
dom transaction sets, in each of which two controlled
variables are kept constant while the other one is varied.
The set of transaction sets used to draw each graph has
both acyclic and cyclic transaction sets.
Figure 9 shows the average acceptance rate of FPA
and POGen=cPOGen (when L is 5, 10, 20, 50, or 100)
under various maximum PO-set utilization. In these
experiments, the size of transaction sets and the number
of bus elements are set at 20 and 10, respectively.
In this setting, approximately 80% of the generated
transaction sets are cyclic. It can be seen that in most
cases the acceptance rate of POGen=cPOGen is better
than that of FPA especially when PO-set maximum
utilization is high and L > 5. The better performance
of POGen=cPOGen comes from the fact that the WTL
analysis in [23] does not always take advantage of the
parallelism between non-overlapping transactions. For
example, consider the transaction set shown in Figure
1. Assume 3 and 5 have higher priority than 4.
According to the WTL analysis in [23], the interference
of transactions 3 and 5 on the execution of 4 is
calculated as if all transactions were using a single-
shared resource. However, POGen=cPOGen allows 3
and 5 to be executed in parallel as shown in Figure 4. In
other words, the acceptance rate of FPA will be reduced
when the number of distinct PO-sets that contain a
given transaction increases. Let denote this number as
PcT.
Figure 10 shows the acceptance rate with the max-
imum PO-set utilization to be 0.9, the number of bus
elements to be 10, and various size of transaction sets.
We also draw in Figure 10 a bar graph which shows
the average (over all transaction sets) of the maximum
PcT in each set. The performance of POGen=cPOGen is
better than FPA especially when the size of transaction
sets are higher. The reason is that, when the number of
bus elements is fixed, the higher the size of a transaction
set, the bigger the maximum PcT (as shown in the
bar graph). As a consequence, FPA suffers more from
0
1
2
3
4
5
6
7
8
N
u
m
b
e
r 
o
f 
P
O
se
ts
5 10 15 20 25 30
Transaction Set Size
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
A
cc
e
p
ta
n
ce
 R
a
te
Average of Maximum PcT
FPA
POGen/cPOGen (L=10)
Fig. 10. Acceptance rate with various size of transaction
sets
0
1
2
3
4
5
6
7
N
u
m
b
e
r 
o
f 
P
O
se
ts
6 8 10 12 14 16
Bus Length
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
A
cc
e
p
ta
n
ce
 R
a
te
Average of Maximum PcT
FPA
POGen/cPOGen (L=10)
Fig. 11. Acceptance rate with various bus length
the effect described in the previous paragraph. The
performance of POGen=cPOGen, however, stays almost
the same.
The same reason explains the better performance of
POGen=cPOGen in Figure 11 which shows the accep-
tance rate of POGen=cPOGen (with L = 10) and FPA
when the number of bus elements is varied. In these
experiments, the maximum PO-set utilization is set at
0.9, the size of transaction sets are fixed at 20. When the
number of bus elements increases, there will be more
longer transactions which may belong to higher number
of distinct PO-sets as shown with the bar graph.
7 IMPLEMENTATION
In this section, we discuss a POGen implementation
on a Cell Broadband Engine processor [9] and report
its execution time overhead. The Cell processor has
1 PowerPC Processing Element (PPE) and 8 Synergy
Processing Elements (SPE) each of which is an element
on the Cell ring bus. There are also 3 additional bus
elements which are a memory controller and two I/O
controllers.
14
L=10 slots L=20 slots L=50 slots L=100 slots
0.000
0.005
0.010
0.015
0.020
0.025
E
x
e
c
u
t
i
o
n
 
T
i
m
e
 
(
m
s
)
N=10 N=20 N=30 N=40
Fig. 12. Average execution time of POGen
POGen is implemented to run on SPEs as an online al-
gorithm. It is invoked at the beginning of each schedul-
ing interval by a timer-interrupt handler and generates
the schedule of all transactions in that interval. Then, if
the generated schedule has S(i; t) = 1, a slot scheduler
will transfer data of i in slot t using Direct-Memory-
Access commands. The slot scheduler is also invoked
by a timer-interrupt handler. The execution of POGen
and the slot scheduler induce some overheads. We
describe the measurement of these overheads in the
next paragraphs.
POGen execution time was measured under various
slot sizes which are 10us, 20us, 50us, and 100us. We
assume that the period of every transaction is a multiple
of 1ms which is also the smallest possible scheduling
interval. We also selected the size of every scheduling
interval to be equal to the size of L which happened
to be 1ms in all generated transaction sets. Since L is
1ms, given the different values of slot size, the size of L
measured in number of slots are 100, 50, 20 and 10 slots.
We generated transaction sets with various sizes using
the same methodology discussed in Section 6. The sizes
of transaction sets are 10, 20, 30, and 40.
Figure 12 shows the execution time of POGen in
ms under various conditions. This execution time also
includes the latency of the timer-interrupt handler that
invokes POGen. It can be seen that the algorithm over-
head increases when L has a higher value. However the
algorithm overhead is no more than 0:03ms even when
L = 100 slots. Since each scheduling interval is 1ms,
under the given conditions, the maximum algorithm
overhead is less than 3% of the scheduling interval size.
Our measurement also shows that the execution time
of the slot scheduler is no more than 0:125us. In other
words, if the slot size is 10us, the overhead is less than
1:25% of the slot size. The overhead is smaller when the
slot size is bigger.
8 CONCLUSION
We have investigated the problem of real-time commu-
nication scheduling on multi-core processor buses with
ring topology. This scheduling problem has important
assumptions that are different with traditional real-time
problems. We proposed a novel scheduling algorithm
to solve the problem at hand. Compared to previ-
ous works, our algorithms employ a dynamic-priority
scheduling scheme and can achieve much higher bus
utilization. Our future works will focus on extending
the proposed algorithms to other bus topology such as
mesh and torus.
REFERENCES
[1] Cell be programming tutorial. IBM, 2007.
[2] Ankur Agarwal, Cyril Iskander, and Ravi Shankar. Survey of
network on chip (noc) architectures & contributions. Journal of
Engineering, Computing and Architecture, 3(1), 2009.
[3] T. W. Ainsworth and T. M. Pinkston. Characterizing the cell eib
on-chip network. IEEE Micro, 27(5):6–14, 2007.
[4] Shobana Balakrishnan and Fu¨sun O¨zgu¨ner. A priority-driven
flow control mechanism for real-time traffic in multiprocessor
networks. IEEE Trans. Parallel Distrib. Syst., 9(7):664–678, 1998.
[5] S. K. Baruah, N. K. Cohen, C. G. Plaxton, and D. A. Varvel. Pro-
portionate progress: a notion of fairness in resource allocation.
In STOC ’93: Proceedings of the twenty-fifth annual ACM symposium
on Theory of computing, pages 345–354, New York, NY, USA, 1993.
ACM.
[6] Enrico Bini and Giorgio C. Buttazzo. Measuring the performance
of schedulability tests. Real-Time Syst., 30(1-2), 2005.
[7] Tobias Bjerregaard and Shankar Mahadevan. A survey of
research and practices of network-on-chip. ACM Comput. Surv.,
38(1):1, 2006.
[8] Bach D. Bui, Rodolfo Pellizzoni, Deepti K. Chivukula, and Marco
Caccamo. Real-time communication for multicore systems with
multi-domain ring buses. In RTCSA ’10: Proceedings of the
16th IEEE International Conference on Embedded and Real-Time
Computing Systems and Applications, Macau, China.
[9] T. Chen, R. Raghavan, J. Dale, and E. Iwata. Cell broadband
engine architecture and its first implementation: A performance
view. IBM Research, 2005.
[10] Jason Howard et al. A 48-core ia-32 message-passing processor
with dvfs in 45nm cmos. In IEEE International Solid-State Circuits
Conference, 2010.
[11] K. Goossens, J. Dielissen, and A. Radulescu. Aethereal network
on chip: concepts, architectures, and implementations. Design
Test of Computers, IEEE, 22(5):414 – 421, 2005.
[12] Sathish Gopalakrishnan, Lui Sha, and Marco Caccamo. Hard
real-time communication in bus-based networks. In RTSS
’04: Proceedings of the 25th IEEE International Real-Time Systems
Symposium, pages 405–414, Washington, DC, USA, 2004. IEEE
Computer Society.
[13] Philip Holman and James H. Anderson. Adapting pfair
scheduling for symmetric multiprocessors. J. Embedded Comput.,
1(4):543–564, 2005.
[14] Jon Kleinberg and E´va Tardos. Algorithm Design. Addison
Wesley, March 2005.
[15] John P. Lehoczky and Lui Sha. Performance of real-time
bus scheduling algorithms. SIGMETRICS Perform. Eval. Rev.,
14(1):44–53, 1986.
[16] Jong-Pyng Li and Matt W. Mutka. Real-time virtual channel flow
control. J. Parallel Distrib. Comput., 32(1):49–65, 1996.
[17] C. L. Liu and James W. Layland. Scheduling algorithms for
multiprogramming in a hard-real-time environment. J. ACM,
20(1):46–61, 1973.
[18] C. D. Locke, D. R. Vogel, L. Lucas, and J. B. Goodenough. Generic
avionics software specification. Technical Report CMU/SEI-90-
TR-8, 1990.
[19] Zhonghai Lu, Axel Jantsch, and Ingo Sander. Feasibility analysis
of messages for on-chip networks using wormhole routing. In
ASP-DAC ’05: Proceedings of the 2005 Asia and South Pacific Design
Automation Conference, pages 960–964, New York, NY, USA, 2005.
ACM.
[20] Marco Di Natale and Antonio Meschi. Scheduling messages with
earliest deadline techniques. Real-Time Syst., 20(3):255–285, 2001.
15
[21] Lionel M. Ni and Philip K. Mckinley. A survey of wormhole
routing techniques in direct networks. IEEE Computer, 26:62–76,
1993.
[22] Zheng Shi and Alan Burns. Priority assignment for real-time
wormhole communication in on-chip networks. In RTSS ’08:
Proceedings of the 2008 Real-Time Systems Symposium, pages 421–
430, Washington, DC, USA, 2008. IEEE Computer Society.
[23] Zheng Shi and Alan Burns. Real-time communication analysis
for on-chip networks with wormhole switching. In NOCS ’08:
Proceedings of the Second ACM/IEEE International Symposium on
Networks-on-Chip, pages 161–170, Washington, DC, USA, 2008.
IEEE Computer Society.
[24] K. Tindell, A. Burns, and A. Wellings. Analysis of hard real-time
communications. Real-Time Systems, 9:147–171, 1995.
[25] Dakai Zhu, Daniel Mosse, and Rami Melhem. Multiple-resource
periodic scheduling problem: How much fairness is necessary.
In 24th IEEE International Real-Time Systems Symposium, 2003.
