2-Bit Branch Predictor Modeling Using Markov Model  by unknown
 Procedia Computer Science  62 ( 2015 )  467 – 469 
Available online at www.sciencedirect.com
1877-0509 © 2015 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license 
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
Peer-review under responsibility of organizing committee of The 2015 International Conference on Soft Computing and Software Engineering 
(SCSE 2015)
doi: 10.1016/j.procs.2015.08.517 
ScienceDirect
The 2015 International Conference on Soft Computing and Software Engineering (SCSE 2015)
2-Bit Branch Predictor Modeling Using Markov Model
Abstract
Power consumption is a very important issue when it comes to embedded devices, therefore every processing cycle should be
optimally utilized and considered. In speculated execution, highly mispredicted branches are considered a critical threat for both
time and power saving. In this paper, we show that, for a speciﬁc branch, misprediction rate of a 2-bit branch predictor can be
precisely calculated using Markov model. Further, this can be done oﬄine for more power saving. Thus, a decision of replacing
the branch with predicated instructions instead of counting on the predictor can be made.
c© 2015 The Authors. Published by Elsevier B.V.
Peer-review under responsibility of organizing committee of The 2015 International Conference on Soft Computing and Software
Engineering (SCSE 2015).
Keywords: Branch Predictor; Modeling; Embedded Systems;
1. Introduction
Low power processors such as the ones used in embedded devices are becoming ubiquitous. These limited hard-
ware platforms have to be carefully considered on the software design phase. Relying on speculative execution may
not always be an optimal course of action in terms of time cost which translate into more power consumption. As
the processor mis-loads the next instruction to be executed, then ﬂushes it to load another instruction instead, thereby
wasting valuable processing cycles. So, if those power greedy branches are precisely detected in advance, the proces-
sor can stall fetching instruction till the branch target is known, thus, a signiﬁcant amount of power is saved. Predicate
instructions were introduced to overcome misprediction by converting control-dependence into data-dependence via
guarded execution. However, predication comes at the extra cost of executing ‘nulliﬁed’ instructions, potentially
degrading performance and costing more processing cycles. Moreover, branches interact in terms of allowing for dif-
ferent execution schedules, for which ﬁnding the optimal schedule is generally a hard combinatorial search problem.
Therefore, these type of instructions should be carefully considered.
In this paper we introduce a Markov model for the 2-bit branch predictor, usually used for embedded processors
such as (ARMv6k), that can estimate the misprediction rate oﬄine. Consequently, this information could be used in
the software compiling to convert some branches to otherwise costly predicated instruction and enhancing the overall
performance and time cost, hence saving power consumption.
2. Markov Chain of 2-bit Saturating Counter Branch Predictor
The 2-bit saturating counter is a Finite State Machine (FSM) that is widely used as a branch predictor, Figure 1.
We consider modeling this FSM as a markov chain where the probability of success p is the probability of a branch
being taken and the probability of failure q (= 1 − p) is the probability of a branch being not taken. Then, the model
can be expressed with the following equations.
  he Authors. Published by Elsevi r B.V. This is an open access article under the CC BY-NC-ND license 
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
Peer-review under responsibility of organizing committee of The 2015 International Conference on Soft Computing and Software 
Engineering (SCSE 2015)
468   Author name /  Procedia Computer Science  62 ( 2015 )  467 – 469 
Not TakenStrongly Taken
Taken
Not Taken
Weakly Taken
Taken
Not Taken
Weakly Not Taken
Taken
Not Taken
Strongly Not Taken
 Taken
Fig. 1. 2-bit Saturating Counter Branch Predictor
[
x y z w
]
×
⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝
S st S wt S wnt S snt
S st p q 0.0 0.0
S wt p 0.0 q 0.0
S wnt 0.0 p 0.0 q
S snt 0.0 0.0 p q
⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠
=
[
x y z w
]
and, x + y + w + z = 1
x =
p3
p3 + p2q + pq2 + q3
y =
p2q
p3 + p2q + pq2 + q3
z =
pq3
p3 + p2q + pq2 + q3
w =
q3
p3 + p2q + pq2 + q3
P(correctprediction) = P(success) = p(x + y) + q(z + w) =
p3 + q3
p3 + p2q + pq2 + q3
3. Model
 0
 0.05
 0.1
 0.15
 0.2
 0.25
 0.3
 0.35
 0.4
 0.45
 0.5
 0  0.2  0.4  0.6  0.8  1
M
is
pr
ed
ict
io
n 
Ra
te
p
Fig. 2. Misprediction rate calculated by the model
When we apply the model to the range of branch
probability to be taken or not taken starting from zero
to one, we get the result shown in Figure 2. The
peak of the graph at p = 0.5 represents the maxi-
mum uncertainty. That is when the branch is 50%
mispredicted. That is often happens in random data
comparisons as what usually exist in searching and
sorting algorithms. While the tails of the graph in-
dicates 100% correct prediction as the branch is al-
ways taken or always not taken. Also, we simu-
lated the behaviour of the 2-bit predictor in code
to capture the correlation between the input data –
which is the basic speciﬁer of the probability to take
the branch – and the probability of correct predic-
tion. The considered algorithm for this experiment
is the iterative mergesort. Then, we use the model to
calculate the same probability. The comparison be-
tween the mathematically calculated and the simula-
tion counted probabilities is shown in Table 1. Input
data distribution aﬀects the probability of correct prediction, as shown for uniform and normal distributed and sorted
numbers. Usually, input data would be uniformly distributed.
4. Analysis on RasberryPi
As a test case we simulate the predictor and test it on the RasberryPi ARM1176JZF-S (ARMv6k) 700 MHz. In
this processor, the branch misprediction penalty is 6 cycles. To show how expensive the misprediction is on the ARM
processor, we test a simple code contains a single branch instruction if the code written using a jump or else using
conditional instruction. Both versions of the code are shown in Listing 1, 2. When these pieces of code run on diﬀerent
sets of input data (1M integers) that controls the probability of the branch to be taken (or the conditional instruction
to be executed), we record the results shown in Figure 3. Similar to what is found in Figure 2, the peak of runtime is
at 50% taken branch due to maximum misprediction (predictor uncertainty). It is worth noting that, the best runtime
of the branched code is the same value of conditional instruction code and would be even better for larger if-body.
469 Author name /  Procedia Computer Science  62 ( 2015 )  467 – 469 
Table 1. Probability of Correct (Successful) Prediction of Mergesort Branches
Data Distribution Outer for loop branch Left part merge branch Right part merge branch Data Comparison branch
Uniform
p 99771 = 0.1284
71
672 = 0.105655
59
601 = 0.09817
285
542 = 0.52583
Pexper.(success) 0.867704 0.866071 0.881864 0.453875
Pmath.(success) 0.855814 0.883489 0.892418 0.502662
Normal
p 99771 = 0.1284
71
672 = 0.105655
58
601 = 0.096506
285
543 = 0.52486
Pexper.(success) 0.867704 0.869048 0.890183 0.480663
Pmath.(success) 0.855814 0.883489 0.894391 0.502466
Sorted
p 99771 = 0.1284
0
672 = 0.0
316
672 = 0.470238
356
356 = 1.0
Pexper.(success) 0.867704 0.997024 0.660714 1.0
Pmath.(success) 0.855814 1.0 0.503531 1.0
Listing 1. Conditional instruction code
main :
l d r r3 , a d d r e s s o f r e t u r n
s t r l r , [ r3 ]
l d r r0 , a d d r e s s o f s c a n p a t t e r n
l d r r1 , a d d r e s s o f n umb e r r e a d
b l s c a n f
l d r r1 , a d d r e s s o f n umb e r r e a d
l d r r1 , [ r1 ]
cmp r1 , #10
addge r1 , r1 , r1
l d r r0 , a d d r e s s o f r e s u l t
s t r r1 , [ r0 ]
l d r r0 , a d d r e s s o f me s s a g e
l d r r1 , a d d r e s s o f r e s u l t
l d r r1 , [ r1 ]
b l p r i n t f
l d r r3 , a d d r e s s o f r e t u r n
l d r l r , [ r3 ]
bx l r
a d d r e s s o f s c a n p a t t e r n : . word s c a n p a t t e r n
a d d r e s s o f n umb e r r e a d : . word number read
a d d r e s s o f r e t u r n : . word r e t u r n
a d d r e s s o f me s s a g e : . word message
a d d r e s s o f r e s u l t : . word r e s u l t
Listing 2. Branched code
main :
l d r r3 , a d d r e s s o f r e t u r n
s t r l r , [ r3 ]
l d r r0 , a d d r e s s o f s c a n p a t t e r n
l d r r1 , a d d r e s s o f n umb e r r e a d
b l s c a n f
l d r r1 , a d d r e s s o f n umb e r r e a d
l d r r1 , [ r1 ]
cmp r1 , #10
b l t b r1
add r1 , r1 , r1
br1 :
l d r r0 , a d d r e s s o f r e s u l t
s t r r1 , [ r0 ]
l d r r0 , a d d r e s s o f me s s a g e
l d r r1 , a d d r e s s o f r e s u l t
l d r r1 , [ r1 ]
b l p r i n t f
l d r r3 , a d d r e s s o f r e t u r n
l d r l r , [ r3 ]
bx l r
a d d r e s s o f s c a n p a t t e r n : . word s c a n p a t t e r n
a d d r e s s o f n umb e r r e a d : . word number read
a d d r e s s o f r e t u r n : . word r e t u r n
a d d r e s s o f me s s a g e : . word message
a d d r e s s o f r e s u l t : . word r e s u l t
 0.9
 0.95
 1
 1.05
 1.1
 1.15
 1.2
 1.25
 1.3
 1.35
 1.4
 1.45
 0  20  40  60  80  100
Ti
m
e 
(s)
Taken Branch (%)
Branches
Conditional Inst.
Fig. 3. ARM Performance Measurements
5. Related Work
Analyze the modeling complexity of dynamic branch prediction schemes on the bases of worst-case execution
times (WCETs) of their critical tasks is studied in1,2. Finite state machines are used for modeling branch behaviour
during runtime in3.
6. Conclusion and Future Work
In this paper, we revisit the problem of optimizing time cost in power-limited processors, due to wasted work done
by the processor when a branch predictor mistakes. We presented a markov model of the 2-bit branch predictor used in
ARMv6 processor. This model enables oﬄine analysis to detect highly mispredicted branches. Hence, these branches
are better to be converted into predicated execution using conditional instructions (already exist in the ARM ISA).
The model is tested on a mergesort program and another simple code. In the future, we are working to expand the
model for more complicated architectures and verify against larger programs.
References
1. Burguiere, C., Rochange, C.. On the complexity of modeling dynamic branch predictors when computing worst-case execution time. In:
Proceedings of the ERCIM/DECOS Workshop On Dependable Embedded Systems. 2007, .
2. Mitra, T., Roychoudhury, A.. A framework to model branch prediction for worst case execution time analysis. In: Proc. 2nd Intl. Workshop
on Worst-Case Execution Time Analysis. 2002, .
3. Sherwood, T., Calder, B.. Automated design of ﬁnite state machine predictors for customized processors. In: Computer Architecture, 2001.
Proceedings. 28th Annual International Symposium on. IEEE; 2001, p. 86–97.
