Branch Prediction is a key task in the operation of a high performance processor. An
inaccurate branch predictor results in increased program run-time and a rise in energy
consumption. The drive towards processors with limited die-space and tighter energy
requirements will continue to intensify over the coming years, as will the shift towards
increasingly multicore processors. Both trends make it increasingly important and
increasingly difficult to find effective and efficient branch predictor designs.
This thesis presents savings in energy and die-space through the use of more efficient
cooperative branch predictors achieved through novel branch prediction designs.
The first contribution is a new take on the problem of a hybrid dynamic-static branch
predictor allocating branches to be predicted by one of its sub-predictors. A new bias
parameter is introduced as a mechanism for trading off a small amount of performance
for savings in die-space and energy. This is achieved by predicting more branches
with the static predictor, ensuring that only the branches that will most benefit from
the dynamic predictor’s resources are predicted dynamically. This reduces pressure on
the dynamic predictor’s resources allowing for a smaller predictor to achieve very high
accuracy. An improvement in run-time of 7-8% over the baseline BTFN predictor is
observed at a cost of a branch predictor bits budget of much less than 1KB.
Next, a novel approach to branch prediction for multicore data-parallel applications
is presented. The Peloton branch prediction scheme uses a pack of cyclists as an
illustration of how a group of processors running similar tasks can share branch predictions
to improve accuracy and reduce runtime. The results show that sharing updates
for conditional branches across the existing interconnect for I-cache and D-cache updates
results in a reduction of mispredictions of up to 25% and a reduction in run-time
of up to 6%. McPAT is used to present an energy model that suggests the savings are
achieved at little to no increase in energy required. The technique is then extended to
architectures where the size of the branch predictors may differ between cores. The
results show that such heterogeneity can dramatically reduce the die-space required
for an accurate branch predictor while having little impact on performance and up to
9% energy savings. The approach can be combined with the Peloton branch prediction
scheme for reduction in branch mispredictions of up to 5%