INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING AND NETWORKING (IJHPCN) 1 A Latency-Conscious SMT Branch Prediction Architecture

Alex Ramirez; Ayose Falcón; Mateo Valero; Oliverio J. Santana

INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING AND NETWORKING (IJHPCN) 1 A Latency-Conscious SMT Branch Prediction Architecture

Authors: Alex Ramirez
Ayose Falcón
Mateo Valero
Oliverio J. Santana
Publication date
Publisher

Abstract

Abstract — Executing multiple threads has proved to be an effective solution to partially hide latencies that appear in a processor. When a thread is stalled because a long-latency operation is being processed, like a memory access or a floatingpoint calculation, the processor can switch to another context so that another thread can take advantage of the idle resources. However, fetch stall conditions caused by a branch predictor delay are not hidden by current SMT fetch designs, causing a performance drop due to the absence of instructions to execute. In this paper, we propose several solutions to reduce the effect of branch predictor delay in the performance of Simultaneous Multithreading (SMT) processors. First, we analyze the impact of varying the number of access ports. Then, we describe a decoupled implementation of an SMT fetch unit that helps to tolerate the predictor delay. Finally, we present an inter-thread pipelined branch predictor, based on creating a pipelined of interleaved predictions from different threads. Our results show that, combining all the proposed techniques, the performance obtained is similar to that obtained using an ideal, 1-cycle access branch predictor. Index Terms — SMT, fetch engine, branch predictor delay, decoupled predictor, predictor pipelining. I

Similar works

Full text

Available Versions

CiteSeerX

oai:CiteSeerX.psu:10.1.1.65.72...

Last time updated on 22/10/2014