Search CORE

5 research outputs found

Multiple-Block Ahead Branch Predictors

Author: André Seznec
Pascal Sainrat
Pierre Michaud
Projet Caps
Stéphan Jourdan
Publication venue: HAL CCSD
Publication date: 01/01/1996
Field of study

A basic rule in computer architecture is that a processor cannot execute an application faster than it fetches its instructions. To overcome the instruction fetch bottleneck shown in wide-dispatch «brainiac» processors, this paper presents a novel cost-effective mechanism called the multiple-block ahead branch predictor that predicts in an efficient way addresses of multiple basic blocks in a single cycle. Moreover and unlike the previous multiple predictor schemes, the multiple-block ahead branch predictor can use any of the branch prediction schemes to perform very accurate predictions required to achieve high-performance on superscalar processors. Finally, we show that pipelining the branch prediction process can be done by means of our predictor for «speed demon» processors to achieve higher clock rate

HAL-CentraleSupelec

CiteSeerX

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

INRIA a CCSD electronic archive server

HAL-Rennes 1

Don't Use the Page Number, But a Pointer on It

Author: Andre Seznec
Projet Caps
Publication venue
Publication date
Field of study

: Most newly announced microprocessors manipulate 64-bit virtual addresses and the width of physical addresses is also growing. As a result, the relative size of the address tags in the L1 cache is increasing. This is particularly dramatic when small block sizes are used. At the same time, the performance of complex superscalar processors depends more and more on the accuracy of branch prediction, while the size of the Branch Target Buffer is also increasing linearly with the address width. In this paper, we apply the very simple principle enounced in the title for limiting the tag size of on-chip caches, and for limiting the size of the Branch Target Buffer. In an indirect-tagged cache, the anachronic duplication of the page number in processors (in TLB and in cache tags) is removed. The tag check is then simplified and the tag cost does not depend on the address width. Then applying the same principle, we propose the Reduced Branch Target Buffer. The storage size in a Reduced Branch ..

CiteSeerX

About Effective Cache Miss Penalty on Out-of-Order Superscalar Processors

Author: André Seznec
Fabien Lloansi
Fabien Lloansi
Projet Caps
Publication venue
Publication date
Field of study

: For many years, the performance of microprocessors has depended on the miss ratio of L1 caches. The whole processor would stall on a cache miss. The contribution of a cache miss to the execution time was exactly the miss penalty. Limiting the miss ratio on L1 caches has been a major issue for the last ten years. Studies showed that, for current cache sizes, 32 or 64 bytes cache blocks was a good tradeoff. Today, technology has changed. Most of the newly announced processors implement a very complex superscalar microarchitecture allowing out-of-order execution. On these processors, instruction execution continues while L1 cache misses are serviced by a pipelined L2 cache. In this paper, we show that, on such superscalar processors, the effective contribution of a cache miss to the execution time is quite distinct of the miss use penalty for the missing data or instruction. We also show that the L2 cache busy time becomes a major bottleneck and that decreasing the demanded throughput o..

CiteSeerX