Search CORE

5 research outputs found

Predicting multiple streams per cycle

Author: Ramírez Bellido Alejandro
Santana Jaria Oliverio J.
Valero Cortés Mateo
Publication venue
Publication date: 01/01/2005
Field of study

The next stream predictor is an accurate branch predictor that provides stream level sequencing. Every stream prediction contains a full stream of instructions, that is, a sequence of instructions from the target of a taken branch to the next taken branch, potentially containing multiple basic blocks. The long size of instruction streams makes it possible for the stream predictor to provide high fetch bandwidth and to tolerate the prediction table access latency. Therefore, an excellent way for improving the behavior of the next stream predictor is to enlarge instruction streams. In this paper, we provide a comprehensive analysis of dynamic instruction streams, showing that focusing on particular kinds of stream is not a good strategy due to Amdahl's law. Consequently, we propose the multiple stream predictor, a novel mechanism that deals with all kinds of streams by combining single streams into long virtual streams. We show that our multiple stream predictor is able to tolerate the prediction table access latency without requiring the complexity caused by additional hardware mechanisms like prediction overriding, also reducing the overall branch predictor energy consumption.Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Enlarging instruction streams

Author: Ramírez Bellido Alejandro
Santana Jaria Oliverio J.
Valero Cortés Mateo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

The stream fetch engine is a high-performance fetch architecture based on the concept of an instruction stream. We call a sequence of instructions from the target of a taken branch to the next taken branch, potentially containing multiple basic blocks, a stream. The long length of instruction streams makes it possible for the stream fetch engine to provide a high fetch bandwidth and to hide the branch predictor access latency, leading to performance results close to a trace cache at a lower implementation cost and complexity. Therefore, enlarging instruction streams is an excellent way to improve the stream fetch engine. In this paper, we present several hardware and software mechanisms focused on enlarging those streams that finalize at particular branch types. However, our results point out that focusing on particular branch types is not a good strategy due to Amdahl's law. Consequently, we propose the multiple-stream predictor, a novel mechanism that deals with all branch types by combining single streams into long virtual streams. This proposal tolerates the prediction table access latency without requiring the complexity caused by additional hardware mechanisms like prediction overriding. Moreover, it provides high-performance results which are comparable to state-of-the-art fetch architectures but with a simpler design that consumes less energy.Peer ReviewedPostprint (published version

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Techniques for enlarging instruction streams

Author: Ramírez Bellido Alejandro
Santana Jaria Oliverio J.
Valero Cortés Mateo
Publication venue
Publication date: 01/01/2005
Field of study

This work presents several techniques for enlarging instruction streams. We call stream to a sequence of instructions from the target of a taken branch to the next taken branch, potentially containing multiple basic blocks. The long size of instruction streams makes it possible for a fetch engine based on streams to provide high fetch bandwidth, which leads to obtaining performance results comparable to a trace cache. The long size of streams also enables the next stream predictor to tolerate the prediction table access latency. Therefore, enlarging instruction streams will improve the behavior of a fetch engine based on streams. We provide a comprehensive analysis of dynamic instruction streams, showing that focusing on particular kinds of stream is not a good strategy due to Amdahl's law. Consequently, we propose the multiple stream predictor, a novel mechanism that deals with all kinds of streams by combining single streams into long virtual streams. We show that our multiple stream predictor is able to tolerate the prediction access latency without requiring the complexity caused by additional hardware mechanisms like prediction overriding.Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

A Low-Complexity Fetch Architecture for High-Performance Superscalar Processors

Author: Larriba-Pey J.L. (Josep L.)
Ramirez A. (Alex)
Santana O.J. (Oliverio J.)
Valero M. (Mateo)
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2004
Field of study

Fetch engine performance is a key topic in superscalar processors, since it limits the instructionlevel parallelism that can be exploited by the execution core. In the search of high performance, the fetch engine has evolved toward more efficient designs, but its complexity has also increased. In this paper, we present the stream fetch engine, a novel architecture based on the execution of long streams of sequential instructions, taking maximum advantage of code layout optimizations. We describe our design in detail, showing that it achieves high fetch performance, while requiring less complexity than other state-of-the-art fetch architectures

Carleton University's Institutional Repository

A low-complexity fetch architecture for high-performance superscalar processors

Author: Agarwal V.
Alex Ramirez
Black B.
Calder B.
Conte T.
Driesen K.
Friendly D. H.
Hrishikesh M. S.
Jacobson Q.
Jimenez D. A.
Jimenez D. A.
Josep L. Larriba-Pey
Kaeli D.
Mateo Valero
Oliverio J. Santana
Patel S. J.
Peleg A.
Ramirez A.
Ramirez A.
Ramirez A.
Ramirez A.
Reinman G.
Rosner R.
Rotenberg E.
Santana O. J.
Seznec A.
Sherwood T.
Shivakumar P.
Yeh T.-Y.
Yeh T.-Y.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref