2-Bit Branch Predictor Modeling Using Markov Model  by Elkhouly, Reem et al.
 Procedia Computer Science  62 ( 2015 )  650 – 653 
1877-0509 © 2015 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license 
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
Peer-review under responsibility of organizing committee of The 2015 International Conference on Soft Computing and Software Engineering 
(SCSE 2015)
doi: 10.1016/j.procs.2016.05.115 
ScienceDirect
Available online at www.sciencedirect.com
The 2015 International Conference on Soft Computing and Software Engineering (SCSE 2015)
2-Bit Branch Predictor Modeling Using Markov Model
Reem Elkhoulya, Ahmed El-Mahdya,b, Amr Elmasrya,c
aDept. of Computer Science and Engineering, Egypt-Japan University of Science and Technology (E-JUST), Alexandria, Egypt
bOn leave from Dept. of Computer Engineering and Systems, Alexandria University, Alexandria, Egypt
cDept. of Computer Engineering and Systems, Alexandria University, Alexandria, Egypt
Abstract
Power consumption is a very important issue when it comes to embedded devices, therefore every processing cycle should be
optimally utilized and considered. In speculated execution, highly mispredicted branches are considered a critical threat for both
time and power saving. In this paper, we show that, for a speciﬁc branch, misprediction rate of a 2-bit branch predictor can be
precisely calculated using Markov model. Further, this can be done oﬄine for more power saving. Thus, a decision of replacing
the branch with conditional (predicated) instructions instead of counting on the predictor can be made.
c© 2015 The Authors. Published by Elsevier B.V.
Peer-review under responsibility of organizing committee of The 2015 International Conference on Soft Computing and Software
Engineering (SCSE 2015).
Keywords: Branch Predictor; Modeling; Embedded Systems.
1. Introduction
Low power processors such as the ones used in embedded devices are becoming ubiquitous. These limited hard-
ware platforms have to be carefully considered on the software design phase. Relying on speculative execution may
not always be an optimal course of action in terms of time cost that translates into more power consumption[1]. As the
processor mis-loads the next instruction to be executed, then ﬂushes it to load another instruction instead[2], thereby
wasting valuable processing cycles. So, if those power greedy branches are precisely detected in advance, the pro-
cessor can stall fetching instructions till the branch target is known. In accordance, a signiﬁcant amount of power
is saved. Predicated instructions were introduced to overcome misprediction by converting control-dependence into
data-dependence via guarded execution[3]. However, predication comes at the extra cost of executing ‘nulliﬁed’ in-
structions, potentially degrading performance and costing more processing cycles. Moreover, branches interact in
terms of allowing for diﬀerent execution schedules, for which ﬁnding the optimal schedule is generally a hard com-
binatorial search problem. Therefore, these type of instructions should be carefully considered. In this paper, we
introduce a Markov model for the 2-bit branch predictor, usually used for embedded processors such as (ARMv6k),
that can estimate the misprediction rate oﬄine. Consequently, this information could be used in the software compil-
ing to convert some branches to otherwise costly predicated instruction and enhancing the overall performance and
time cost, hence saving power consumption.
2. The Model
The 2-bit saturating counter is a Finite State Machine (FSM) that is widely used as a branch predictor; see Figure 1.
We consider modeling this FSM as a Markov chain where the probability of success (p) is the probability of a branch
 5 The Authors. Published by Elsevi r B.V. This is an open access article under the CC BY-NC-ND license 
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
Peer-review under responsibility of organizing committee of The 2015 International Conference on Soft Computing and Software 
Engineering (SCSE 2015)
651 Reem Elkhouly et al. /  Procedia Computer Science  62 ( 2015 )  650 – 653 
Not TakenStrongly Taken
Taken
Not Taken
Weakly Taken
Taken
Not Taken
Weakly Not Taken
Taken
Not Taken
Strongly Not Taken
 Taken
Fig. 1. 2-bit Saturating Counter Branch Predictor
being taken and the probability of failure (q = 1 − p) is the probability of a branch being not taken. Then, the model
can be expressed with the following equations:
π ×
⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝
S st S wt S wnt S snt
S st p q 0.0 0.0
S wt p 0.0 q 0.0
S wnt 0.0 p 0.0 q
S snt 0.0 0.0 p q
⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠
=π
,where π =
[
x y z w
]
and x + y + w + z = 1
x =
p3
p3 + p2q + pq2 + q3
y =
p2q
p3 + p2q + pq2 + q3
z =
pq2
p3 + p2q + pq2 + q3
w =
q3
p3 + p2q + pq2 + q3
P(correctprediction) = P(success) = p(x + y) + q(z + w)
=
p3 + q3
p3 + p2q + pq2 + q3
 0
 0.05
 0.1
 0.15
 0.2
 0.25
 0.3
 0.35
 0.4
 0.45
 0.5
 0  0.2  0.4  0.6  0.8  1
M
is
pr
ed
ict
io
n 
Ra
te
p
Fig. 2. Misprediction rate calculated by the model
When we apply the model to the range of branch probability to be taken or not taken starting from zero to one, we
get the result shown in Figure 2. The peak of the graph at p = 0.5 represents the maximum uncertainty. That is when
the branch is 50% mispredicted. While the tails of the graph indicate 100% correct prediction as the branch is always
taken or always not taken.
3. Experiments
 0.9
 0.95
 1
 1.05
 1.1
 1.15
 1.2
 1.25
 1.3
 1.35
 1.4
 1.45
 0  20  40  60  80  100
Ti
m
e 
(s)
Taken Branch (%)
Branches
Conditional Inst.
Fig. 3. ARM Performance Measurements
We simulated the behaviour of the 2-bit predictor
to capture the correlation between the input data –
which is the basic speciﬁer of the probability to take
the branch (p) – and the probability of correct pre-
diction (Pexper., the experimental value). Then, we
used the model to calculate the same probability an-
alytically (Pmath., the calculated value). We ran an
iterative mergesort algorithm; see Algorithm 1, on
inputs that are integers from diﬀerent distributions:
uniform, normal, and sorted data. The comparison
between the mathematically calculated and the sim-
ulation counted probabilities is shown in Table 3. We also simulated the predictor and tested it on the RasberryPi
ARM1176JZF-S (ARMv6k) 700 MHz. In this processor, the branch misprediction penalty is 6 cycles. To show how
expensive the misprediction is on the ARM processor, we test a simple code that would contain a single branch in-
struction if the code version was written using branched code (contains jumps) as opposed to the other version which
was written using conditional (predicated) instructions. Both versions of the code are shown in Listing 1, 2. We exe-
cuted these pieces of code with diﬀerent sets of input data (1M integers), which consequently control the probability
of the branch to be taken (or the conditional instruction to be executed),and we recorded the results shown in Figure
3. Similar to what is found in Figure 2, the peak of runtime is at 50% taken branch due to maximum mispredic-
652   Reem Elkhouly et al. /  Procedia Computer Science  62 ( 2015 )  650 – 653 
tion (predictor uncertainty). It is worth noting that the best runtime of the branched code has the same value as the
conditional-instruction code.
Algorithm 1 Mergesort
1: function Mergesort(list, low, high)
2: length = high - low + 1
3: if low = high then return
4: end if
5: pivot← ( low + high ) / 2
6: mergesort(list, low, pivot)
7: mergesort(list ,pivot+1 , high)
8: tempList← list[ low : high ]
9: indexLeft← 0, indexRight← pivot - low + 1
10: for do index← 0 to length
11: if indexRight ≤ high - low then
12: if indexLeft ≤ pivot - low then
13: if tempList[indexLeft] > tempList[indexRight] then
14: list[low+index]← tempList[indexRight]
15: indexRight← indexRight + 1
16: else
17: list[low+index]← tempList[indexLeft]
18: indexLeft← indexLeft + 1
19: end if
20: else
21: list[low+index]← tempList[indexRight]
22: end if
23: else
24: list[low+index]← tempList[indexLeft]
25: end if
26: end for
27: return
28: end function
Table 1. Prob. of Correct (Successful) Prediction of Mergesort (Algorithm 1) Branches
Input Data Branch1 (line 10) Branch2 (line 11) Branch3 (line 12) Branch4 (line 14)
Uniform
p 0.1284 0.105655 0.09817 0.52583
Pexper.(success) 0.867704 0.866071 0.881864 0.453875
Pmath.(success) 0.855814 0.883489 0.892418 0.502662
Normal
p 0.1284 0.105655 0.096506 0.52486
Pexper.(success) 0.867704 0.869048 0.890183 0.480663
Pmath.(success) 0.855814 0.883489 0.894391 0.502466
Sorted
p 0.1284 0.0 0.470238 1.0
Pexper.(success) 0.867704 0.997024 0.660714 1.0
Pmath.(success) 0.855814 1.0 0.503531 1.0
4. Related Work
Analyzing the complexity of dynamic branch prediction schemes for worst-case execution times (WCETs) of their
critical tasks was studied in[4,5]. Finite state machines were used for modeling branch behaviour during runtime
in[6]. Nevertheless, there is no previous work that addresses lower power consumption through eliminating the highly
mispredicted branches using oﬄine modeling. Performance and power simulation of applications execution on het-
erogeneous architectures using high-level models was presented by the authors in [7] which enables early and fast
653 Reem Elkhouly et al. /  Procedia Computer Science  62 ( 2015 )  650 – 653 
power estimation. An approach to estimate power consumption during system simulation that gives a very close result
to that of the architecture was introduced in[8]. Analysis in [9] shows that ﬁne tuning memory hierarchy customization
can lead to power eﬃcient processing with experiments on diﬀerent CPU and GPU architectures. Power reduction
technique for embedded systems that use code compression without performance loss was proposed in[10].
5. Conclusion and Future Work
In this paper, we revisit the problem of optimizing time cost in power-limited processors, due to wasted work done
by the processor when a branch predictor fails. We present a Markov model of the 2-bit branch predictor used in
ARMv6 processor. This model enables oﬄine analysis to detect highly mispredicted branches. Hence, these branches
are better to be converted into predicated execution using conditional instructions (already exist in the ARM ISA).
The model is tested on a mergesort program and another simple code. We are working to expand the model for more
complicated architectures and verify against larger programs.
Listing 1. Conditional instruction code
main :
l d r r3 , a d d r e s s o f r e t u r n
s t r l r , [ r3 ]
l d r r0 , a d d r e s s o f s c a n p a t t e r n
l d r r1 , a d d r e s s o f n umb e r r e a d
b l s c a n f
l d r r1 , a d d r e s s o f n umb e r r e a d
l d r r1 , [ r1 ]
cmp r1 , #10
addge r1 , r1 , r1
l d r r0 , a d d r e s s o f r e s u l t
s t r r1 , [ r0 ]
l d r r0 , a d d r e s s o f me s s a g e
l d r r1 , a d d r e s s o f r e s u l t
l d r r1 , [ r1 ]
b l p r i n t f
l d r r3 , a d d r e s s o f r e t u r n
l d r l r , [ r3 ]
bx l r
a d d r e s s o f s c a n p a t t e r n : . word s c a n p a t t e r n
a d d r e s s o f n umb e r r e a d : . word number read
a d d r e s s o f r e t u r n : . word r e t u r n
a d d r e s s o f me s s a g e : . word message
a d d r e s s o f r e s u l t : . word r e s u l t
Listing 2. Branched code
main :
l d r r3 , a d d r e s s o f r e t u r n
s t r l r , [ r3 ]
l d r r0 , a d d r e s s o f s c a n p a t t e r n
l d r r1 , a d d r e s s o f n umb e r r e a d
b l s c a n f
l d r r1 , a d d r e s s o f n umb e r r e a d
l d r r1 , [ r1 ]
cmp r1 , #10
b l t b r1
add r1 , r1 , r1
br1 :
l d r r0 , a d d r e s s o f r e s u l t
s t r r1 , [ r0 ]
l d r r0 , a d d r e s s o f me s s a g e
l d r r1 , a d d r e s s o f r e s u l t
l d r r1 , [ r1 ]
b l p r i n t f
l d r r3 , a d d r e s s o f r e t u r n
l d r l r , [ r3 ]
bx l r
a d d r e s s o f s c a n p a t t e r n : . word s c a n p a t t e r n
a d d r e s s o f n umb e r r e a d : . word number read
a d d r e s s o f r e t u r n : . word r e t u r n
a d d r e s s o f me s s a g e : . word message
a d d r e s s o f r e s u l t : . word r e s u l t
References
1. D. Parikh, K. Skadron, Y. Zhang, M. Barcella, M. R. Stan, Power issues related to branch prediction, in: Proceedings of the 8th International
Symposium on High-Performance Computer Architecture, HPCA ’02, IEEE Computer Society, 2002, p. 233.
2. J. E. Smith, A study of branch prediction strategies, in: Proceedings of the 8th annual symposium on Computer Architecture, IEEE Computer
Society Press, 1981, pp. 135–148.
3. S. A. Mahlke, R. E. Hank, J. E. McCormick, D. I. August, W.-M. Hwu, A comparison of full and partial predicated execution support for ILP
processors, in: Proceedings of the 22nd Annual International Symposium on Computer Architecture, IEEE, 1995, pp. 138–149.
4. C. Burguiere, C. Rochange, On the complexity of modeling dynamic branch predictors when computing worst-case execution time, in:
Proceedings of the ERCIM/DECOS Workshop On Dependable Embedded Systems, 2007.
5. T. Mitra, A. Roychoudhury, A framework to model branch prediction for worst case execution time analysis, in: Proceedings of the 2nd
International Workshop on Worst-Case Execution Time Analysis, 2002.
6. T. Sherwood, B. Calder, Automated design of ﬁnite state machine predictors for customized processors, in: Proceedings of the 28th Annual
International Symposium on Computer Architecture, 2001, pp. 86–97.
7. A. Gerstlauer, S. Chakravarty, M. Kathuria, P. Razaghi, Abstract system-level models for early performance and power exploration, in:
Proceedings of the 17th Asia and South Paciﬁc Design Automation Conference (ASP-DAC), 2012, pp. 213–218.
8. C. Trabelsi, R. Ben Atitallah, S. Meftali, J.-L. Dekeyser, A. Jemai, A model-driven approach for hybrid power estimation in embedded
systems design, EURASIP Journal on Embedded Systems (1) (2011) 569031.
9. A. Pedram, R. van de Geijn, A. Gerstlauer, Codesign tradeoﬀs for high-performance, low-power linear algebra architectures, Computers,
IEEE Transactions on 61 (12) (2012) 1724–1736.
10. H. Lekatsas, J. Henkel, W. Wolf, Code compression for low power embedded system design, in: Proceedings of the 37th Annual Design
Automation Conference, DAC ’00, ACM, 2000, pp. 294–299.
