68 research outputs found

    EOLE: Toward a Practical Implementation of Value Prediction

    No full text

    Don't Use the Page Number, But a Pointer on It

    No full text
    : Most newly announced microprocessors manipulate 64-bit virtual addresses and the width of physical addresses is also growing. As a result, the relative size of the address tags in the L1 cache is increasing. This is particularly dramatic when small block sizes are used. At the same time, the performance of complex superscalar processors depends more and more on the accuracy of branch prediction, while the size of the Branch Target Buffer is also increasing linearly with the address width. In this paper, we apply the very simple principle enounced in the title for limiting the tag size of on-chip caches, and for limiting the size of the Branch Target Buffer. In an indirect-tagged cache, the anachronic duplication of the page number in processors (in TLB and in cache tags) is removed. The tag check is then simplified and the tag cost does not depend on the address width. Then applying the same principle, we propose the Reduced Branch Target Buffer. The storage size in a Reduced Branch ..

    Trading Conflict and Capacity Aliasing in Conditional Branch Predictors

    No full text
    As modern microprocessors employ deeper pipelines and issue multiple instructions per cycle, they are becoming increasingly dependent on accurate branch prediction. Because hardware resources for branch-predictor tables are invariably limited, it is not possible to hold all relevant branch history for all active branches at the same time, especially for large workloads consisting of multiple processes and operating-system code. The problem that results, commonly referred to as aliasing in the branch-predictor tables, is in many ways similar to the misses that occur in finite-sized hardware caches. In this paper we propose a new classification for branch aliasing based on the three-Cs model for caches, and show that conflict aliasing is a significant source of mispredictions. Unfortunately, the obvious method for removing conflicts -- adding tags and associativity to the predictor tables -- is not a cost-effective solution. To address this problem, we propose the skewed branch predict..

    Exploring Instruction-Fetch Bandwidth Requirement in Wide-Issue Superscalar Processors

    Get PDF
    The effective performance of wide-issue superscalar processors depends on many parameters, such as branch prediction accuracy, available instruction-level parallelism, and instruction-fetch bandwidth. This paper explores the relations between some of these parameters, and more particularly, the requirement in instruction-fetch bandwidth. We introduce new enhancements to boost effectively the instruction-fetch bandwidth of conventional fetch engines. However, experiments strongly show that performance improves less for a given instruction-fetch bandwidth gain as the base fetch bandwidth increases. At the level of bandwidth exhibited by the proposed schemes, the performance improvement is small. This clearly brings to light potential relations between the fetch bandwidth and the other parameters. We provide a model to explain this behavior and quantify some relations. Based on the experimental observation that the available parallelism in an instruction window of size N grows as the sq..
    • …
    corecore