APPENDIX to DRAMbulism: Balancing Performance and Predictability through Dynamic Pipelining by Mirosanlou, Reza et al.
APPENDIX TO DRAMbulism: BALANCING PERFORMANCE
AND PREDICTABILITY THROUGH DYNAMIC PIPELINING
This document provides the appendix to:
Reza Mirosanlou, Mohamed Hassan and Rodolfo Pellizzoni,
DRAMbulism: Balancing Performance and Predictability
through Dynamic Pipelining. Proceedings of the 26th IEEE
Real-Time and Embedded Technology and Applications Sym-
posium, Sydney, Australia, April 2020.
A. Proofs of Section V
Proof: [Lemma 1] Let k to denote the number of PRE
commands of other banks that conflict with the cua. Since
there are b − 1 other banks in the system and due to RR
arbitration in Rule 6, k is at most b − 1. We next focus on
the number of conflicting ACTs. Due to the tRRD constraint
between successive ACTs, the number of conflicting ACTs
is bounded by dLPRE+1tRRD e (one ACT every tRRD cycles; note
that adding one to LPRE is required to account for the clock
cycle when the cua is issued). Also note that due to intra-bank
constraints, an ACT must follow a PRE to the same bank by
at least tRP , and a PRE must follow the previous ACT to the
same bank by at least tRCD + tRTP . Hence, the number of
interfering ACTs belonging to the k banks with PRE commands




Noting that such constraint does not apply to the remaining
b − 1 − k banks, we obtain that the number of conflicting




dLPRE+1−tRCD−tRTPtRRD e+ b− 1− k).
Finally, note that the distance between any two CASes
is at least tCCD; the distance between a PRE and a CAS
command to the same bank is at least tRP + tRCD; and
between a CAS and a PRE tRTP . Hence, we similarly




e + dLPRE+1−tRP−tRCDtCCD e +
b−1−k). Adding together the number of conflicting commands
of each type yields the proof.
Proof: [Lemma 2] Assuming that the current round has
the same direction as the previous round, regardless of the
direction, the inter-bank constraint between two CASes is
tCCD; since the last CAS of the previous round is issued one
clock cycle before the end of the round, the maximum value
of CAStimerinit is tCCD − 1. If the current and previous
rounds have different directions, assuming that the previous
round was read, the earliest time that a write CAS can be
issued is after tRTW cycles and CAStimerinit is thus at most
tRTW − 1. Similarly, if the previous round was write, the
earliest time a read CAS can be issued is after tWtoR cycles,
and the CAStimerinit is at most tWtoR − 1. This concludes
the proof.
Proof: [Lemma 3] The inter-bank ACT constraints are
tRRD and tFAW . For the tFAW constraint to delay the first
ACT in a round, we need four ACTs in the previous round.
Since we want to determine the maximum value of ACTtimer,
we assume that those ACTs are issued as late as possible, which
means tRRD after each other. The last ACT in previous round
needs to issue its CAS command which incorporates ACT
to CAS delay tRCD. Therefore, the total time from issuing
the first ACT of the sequence to the time the previous round
finishes is 3 · tRRD + tRCD + 1. Hence, the maximum delay
that tFAW can cause to an ACT at the beginning of the round
is tFAW − (3 · tRRD + tRCD + 1). We next consider tRRD.
Since for all devices described in Table I, tRCD > tRRD, it
follows that the tRRD constraint generated by the last ACT in
the previous round cannot delay the first ACT of the current
round. This yields the lemma.
Proof: [Lemma 4] Since we assume that all other banks
issue a transaction in Round 2, by Rule 6, the tua has the
highest RR priority in Round 3. Hence, for a closed tua, its
ACT will be issued ACTtimerinit after the beginning of
Round 3. The CAS of the tua will then become intra-ready
tRCD after the ACT is issued. We now have two cases: 1) the
CAS becomes intra-ready at or before CAStimerinit after the
beginning of the round; or 2) the CAS becomes intra-ready after
CAStimerinit. In case 1), the CAS will be issued no later than
CAStimerinit + 1 (note that, an ACT could cause command
bus conflict at CAStimerinit), resulting in a latency bound
LCRR3 = CAStimer
max,R
init + 1. In case 2), depicted in Figure 4,
the CAS could be delayed by another (lower priority) read CAS
in Round 3; in the worst-case such CAS could be issued the
clock cycle before the CAS under analysis becomes intra-ready,
resulting in an added delay of tCCD when accounting for bus
conflict. This results in LCRR3 = ACTtimer
max
init+tRCD+tCCD.
Taking the maximum of Case 1) and 2) results in Equation 11.
Proof: [Lemma 5] Let t̄ be the time at which transactions
become pipe-blocked in the round, and CAStimer be the value
of CAStimer at t̄. First note that based on Rule 4, no ACT
command can be issued after t̄; this means that CASes issued
after the blocking point cannot suffer bus conflict. Furthermore,
the number of CAS commands issued after t̄ is exactly equal
to the number of pending requests Npend at that time; and it
must be Npend ≥ 1, otherwise the round would have ended by
Rule 1. We analyze two cases: 1) all Npend CAS commands
are delayed (Figure 6(a)); 2) at least one of the Npend CASes
is not delayed (Figure 6(b)). For Case 1), considering that
each CAS delays the next one by tCCD since there are no bus
conflicts, and that the round ends one cycle after τN .C, we
immediately obtain:
Lpipe = CAStimer + (N
pend − 1) · tCCD + 1. (18)
Since the third condition in Rule 4 must not hold at t̄, we
have the following inequalities:
CAStimer +Nwait · tCCD − tRCD − 1 < 0 (19)
=⇒ (CAStimer +Nwait · tCCD − tRCD − 1)
+ tRCD − tCCD + 2 ≤ tRCD − tCCD + 1
=⇒ CAStimer + (Nwait − 1) · tCCD + 1 (20)
≤ tRCD − tCCD + 1 (21)
Since Npend ≤ Nwait, we have:
Lpipe = CAStimer + (N
pend − 1) · tCCD + 1
≤ (CAStimer + (Nwait − 1) · tCCD + 1
≤ tRCD − tCCD + 1, (22)
which is the first term of the max in Equation 12.
We now analyze Case 2). Let τj .C, with j ≤ N , be
the command with the largest index in Npend that is not
delayed; then CASes must be issuable at τj .C.t − 1 since
there are no bus conflicts. Hence, τj , . . . , τN must all be
close transactions, otherwise they would be intra-ready at
t̄ and their CASes would be issued at or before τj .C.t − 1.
Then, since the minimum time that an ACT can arrive after
the previous ACT is tRRD, and ACTtimer must be zero
the clock cycle before the blocking point, it follows that
t̄ ≥ τj .A.t + (N − j + 1) · tRRD + 1; while the round ends
at τj .A.t+ tRCD + (N − j) · tCCD + 1. Hence, we obtain:
Lpipe = τj .A.t+ tRCD + (N − j) · tCCD + 1− t̄
≤ tRCD + (N − j) · tCCD − (N − j + 1) · tRRD
= tRCD − tRRD + (N − j) · (tCCD − tRRD); (23)
but since tRRD ≥ tCCD for all devices, Equation 23 is
maximized for j = N , yielding the second term of the max in
Equation 12. This completes the proof.
B. Latency Analysis for Open Read Requests
In this section we provide the equations for the latency
analysis of an open read requests. For the open request, the
latency decomposition can be shown in Figure 11. Therefore,
the open read request consists of the following timing elements:
LORpRreq = L
ORpR
tran + tRL + tBUS . (24)
Notice that in order to have an open read, the request must be
preceded by a read as discussed in Section V. The transaction
latency can be achieved from:
LORpR−selftran = L
OR
self + Lround(b− 1, CAStimer
max,W
init ,
ACTtimermaxinit ) + L
OR−self
R3 (25)
LORpR−pipetran = Lpipe + Lround(b− 2, CAStimer
max,W
init ,








Note that in order to happen a pipe-blocking when we want
to calculate LORpR−pipetran , there must be a close read transaction
that was blocked in the first read round according to Rule 4.
That is, in the second round, apart from the tua, we need to
factor out the request that caused the pipe-blocking resulting
in b-2 requests in the second round.
The maximum amount of time that an open transaction can
be self-blocked in a round is:
LORself = Lround(b, 0, 0)− tRL − tBUS . (28)
The worst-case scenario for LORself happens when a requestor
sends another request while it was serviced in the same round.
Note that the blocking time can be obtained as the difference
between the length of Round 1, and the time that the data of
the previous request of the same bank is transferred. Hence, to
maximize LORself , we maximize the length of Round 1, assuming
that the previous request completes as soon as possible, which
A CP
Data Data
tRL/WL +  tBUS
tRCDtRP
Open request arrives C
Ltran 





Fig. 11. Increasing the number of requestor in the system with DDR3 2133L.
is tRL + tBUS after the beginning of the round (assuming
CAStimerinit = 0).
The maximum times required to issue the CAS command
in Round 3 for open read transaction are:
LOR−selfR3 = CAStimer
max,R
init + 1, (29)
LOR−pipeR3 = CAStimer
max,R
init + tCCD + 2. (30)
For an open tua with pipe-blocking, there must be a close
transaction that was also pipe-blocked in Round 1, otherwise
the tua would not be pipe-blocked. This close transaction must
be issued in Round 3. Therefore, we consider two cases: 1) if
CAStimerinit + 1 < ACTtimerinit + tRCD, the open CAS
will be issued at either CAStimerinit or CAStimerinit + 1.
2) If CAStimerinit + 1 ≥ ACTtimerinit + tRCD, then the
CAS of the close transaction might be issued first (since it
might have higher priority) at CAStimerinit+1, delaying the
CAS of the tua for an extra tCCD+1 clock cycles, so that it is
issued at CAStimerinit+1+ tCCD+1. Taking the maximum
of the two cases and noting that other requestors have lower
priority in RR queue results in CAStimerinit+tCCD+2. For
an open tua with self-blocking, we only consider the first case
since there is no such outstanding close request in Round 3.
C. Latency Analysis for Write Requests
For the write request, we follow the same steps as read
request but we need to consider the write related timers and
timing constraints. After applying the appropriate changes, we
obtain the following formulas for the write requests:
LCWpRreq = t
pR
α + LPRE + tRP + L
CWpR
tran + tWL + tBUS , (31)
LCWpWreq = t
pW
α + LPRE + tRP + L
CWpW
tran + tWL + tBUS ; (32)
where the transaction latencies can be achieved from:
LCWpWtran = max(Lpipe, L
CW
self )
+Lround(b− 1, CAStimermax,Rinit , ACTtimer
max
init ) + L
CW
R3 .(33)
LCWpRtran = Lpipe + Lround(b− 1,
CAStimermax,Rinit , ACTtimer
max






init + tRCD + tCCD,
CAStimermax,Winit + 1). (35)
The maximum amount of time a close write transaction can
be self-blocked in a round is:
LCWself = Lround(b, 0, 0)− t
pW
α − LPRE − tRP − tWL − tBUS .(36)
