18 research outputs found

    FPGA configuration of an alloyed correlated branch predictor used with RISC processor for educational purposes

    Get PDF
    Instructions pipelining is one of the most outstanding techniques used in improving processor speed; nonetheless, these pipelined stages are constantly facing stalls that caused by nested conditional branches. During the execution of nested conditional branches, the behavior of the running branch depends on the history information of the previous ones; therefore, these branches have the greatest effect in reducing the prediction accuracy of a branch predictor among conditional branches. The purpose of this research is to reduce the stall cycles caused by correlated branches misprediction by introducing a hardware model of a branch predictor that combines both local and global prediction techniques. This predictor integrates the prediction characteristics of the alloyed predictor with those of the correlated predictor. the predictor design which implemented in VHDL (Very high-speed IC hardware description language) was inserted in previously designed MIPS (microprocessor without interlocked pipelined stages) processor and its prediction accuracy was confirmed by executing a program using the selection sort algorithm to sort 100 input numbers of different combinations ascendingly

    Architectural support for probabilistic branches

    Get PDF
    A plethora of research efforts have focused on fine-tuning branch predictors to increasingly higher levels of accuracy. However, several important optimization, financial, and statistical data analysis algorithms rely on probabilistic computation. These applications draw random values from a distribution and steer control flow based on those values. Such probabilistic branches are challenging to predict because of their inherent probabilistic nature. As a result, probabilistic codes significantly suffer from branch mispredictions. This paper proposes Probabilistic Branch Support (PBS), a hardware/software cooperative technique that leverages the observation that the outcome of probabilistic branches needs to be correct only in a statistical sense. PBS stores the outcome and the probabilistic values that lead to the outcome of the current execution to direct the next execution of the probabilistic branch, thereby completely removing the penalty for mispredicted probabilistic branches. PBS relies on marking probabilistic branches in software for hardware to exploit. Our evaluation shows that PBS improves MPKI by 45% on average (and up to 99%) and IPC by 6.7% (up to 17%) over the TAGE-SC-L predictor. PBS requires 193 bytes of hardware overhead and introduces statistically negligible algorithmic inaccuracy

    Methods and systems for ordering instructions using future values

    Get PDF
    A method of ordering instructions. The method can include placing a first instruction that consumes a value of an object before a second instruction that produces the value of the object such that the first instruction is processed before the second instruction and a physical location is allocated to the value of the object upon processing the first instruction.https://digitalcommons.mtu.edu/patents/1022/thumbnail.jp

    An evaluation of multiple branch predictor and trace cache advanced fetch unit designs for dynamically scheduled superscalar processors

    Get PDF
    Semiconductor feature size continues to decrease permitting superscalar microprocessors to continue to increase the number of functional units available for execution. As the instruction issue width increases beyond the five instruction average basic block size of integer programs, more than one basic block must be issued per cycle to continue to increase instructions per cycle (IPC) performance. Researchers have created methods of fetching instructions beyond the first taken branch to overcome the bottleneck created by the limitations of conventional single branch predictors. We compare the performance of the multiple branch prediction (MBP) and trace cache (TC) fetch unit optimization methods. Multiple branch predictor fetch unit designs issue multiple basic blocks per cycle using a branch address cache and a multiple branch predictor. A trace cache uses the runtime instruction stream to create fixed length instruction traces that encapsulate multiple basic blocks. The comparison is performed by using a SPARC v8 timing based simulator. We simulate both advanced fetch methods and execute benchmarks from the SPEC CPU2000 suite. The results of the simulations are compared and a detailed analysis of both microarchitectures is performed. We find that both fetch unit designs provide a competitive IPC performance. As issue width is increased from an eight to sixteen way superscalar, the IPC performance improves implying that these fetch unit designs are able to take advantage of the wider issue widths. The MBP can use a smaller L1 instruction cache size than the TC and yet achieve a similar IPC performance. Pre-arranged instructions provided by the TC allow the pipeline stages to be shortened in comparison to the MBP. The shorter pipeline significantly improves the IPC performance. Prior trace cache research used two or more ports to the instruction cache to improve the chances of fetching a full basic block per cycle. This was at the expense of instruction cache line realignment complexity. Our results show good performance with a single instruction cache port. We study an approximately equal cost implementation for the MBP and TC. Of the six benchmarks studied, the TC outperforms the MBP over four of the benchmarks

    Power considerations for memory-related microarchitecture designs

    Get PDF
    The fast performance improvement of computer systems in the last decade comes with the consistent increase on power consumption. In recent years, power dissipation is becoming a design constraint even for high-performance systems. Higher power dissipation means higher packaging and cooling cost, and lower reliability. This Ph.D. dissertation will investigate several memory-related design and optimization issues of general-purpose computer microarchitectures, aiming at reducing the power consumption without sacrificing the performance. The memory system consumes a large percentage of the system\u27s power. In addition, its behavior affects the processor power consumption significantly. In this dissertation, we propose two schemes to address the power-aware architecture issues related to memory: (1) We develop and evaluate low-power techniques for high-associativity caches. By dynamically applying different access modes for cache hits and misses, our proposed cache structure can achieve nearly lowest power consumption with minimal performance penalty. (2) We propose and evaluate look-ahead architectural adaptation techniques to reduce power consumption in processor pipelines based on the memory access information. The scheme can significantly reduce the power consumption of memory-intensive applications. Combined with other adaptation techniques, our schemes can effectively reduce the power consumption for both computer- and memory-intensive applications. The significance, potential impacts, and contributions of this dissertation are: (1) Academia and industry R & D has solely targeted the objective of high performance in both hardware and software designs since the beginning stage of building computer systems. However, the pursuit of high performance without considering energy consumption will inevitably lead to increased power dissipation and thus will eventually limit the development and progress of increasingly demanded mobile, portable, and high-performance computing systems. (2) Since our proposed method adaptively combines the merits of existing low-power cache designs, it approaches the optimum in terms of both retaining performance and saving energy. This low power solution for highly associative caches can be easily deployed with a low cost. (3) Using a cache miss , a common program execution event, as a triggering signal to slow down the processor issue rate, our scheme can effectively reduce processor power consumption. This design can be easily and practically deployed in many processor architectures with a low cost

    Design of trace caches for high bandwidth instruction fetching

    Get PDF
    Thesis (M.Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1998.Includes bibliographical references (leaves 60-63).by Michael Sung.M.Eng

    Object Histories in Java

    Get PDF
    Developers are often faced with the task of implementing new features or diagnosing problems in large software systems. Convoluted control and data flows in large object-oriented software systems, however, make even simple tasks extremely difficult, time-consuming, and frustrating. Specifically, Java programs manipulate objects by adding and removing them from collections and by putting and getting them from other objects' fields. Complex object histories hinder program understanding by forcing software maintainers to track the provenance of objects through their past histories when diagnosing software faults. In this thesis, we present a novel approach which answers queries about the evolution of objects throughout their lifetime in a program. On-demand answers to object history queries aids the maintenance of large software systems by allowing developers to pinpoint relevant details quickly. We describe an event-based, flow-insensitive, interprocedural program analysis technique for computing object histories and answering history queries. Our analysis technique identifies all relevant events affecting an object and uses pointer analysis to filter out irrelevant events. It uses prior knowledge of the meanings of methods in the Java collection classes to improve the quality of the histories. We present the details of our technique and experimental results that highlight the utility of object histories in common programming tasks
    corecore