47 research outputs found

    Hardware schemes for early register release

    Get PDF
    Register files are becoming one of the critical components of current out-of-order processors in terms of delay and power consumption, since their potential to exploit instruction-level parallelism is quite related to the size and number of ports of the register file. In conventional register renaming schemes, register releasing is conservatively done only after the instruction that redefines the same register is committed. Instead, we propose a scheme that releases registers as soon as the processor knows that there will be no further use of them. We present two early releasing hardware implementations with different performance/complexity trade-offs. Detailed cycle-level simulations show either a significant speedup for a given register file size, or a reduction in register file size for a given performance level.Peer ReviewedPostprint (published version

    Late allocation and early release of physical registers

    Get PDF
    The register file is one of the critical components of current processors in terms of access time and power consumption. Among other things, the potential to exploit instruction-level parallelism is closely related to the size and number of ports of the register file. In conventional register renaming schemes, both register allocation and releasing are conservatively done, the former at the rename stage, before registers are loaded with values, and the latter at the commit stage of the instruction redefining the same register, once registers are not used any more. We introduce VP-LAER, a renaming scheme that allocates registers later and releases them earlier than conventional schemes. Specifically, physical registers are allocated at the end of the execution stage and released as soon as the processor realizes that there will be no further use of them. VP-LAER enhances register utilization, that is, the fraction of allocated registers having a value to be read in the future. Detailed cycle-level simulations show either a significant speedup for a given register file size or a reduction in the register file size for a given performance level, especially for floating-point codes, where the register file pressure is usually high.Peer ReviewedPostprint (published version

    DESIGN SPACE EXPLORATION AND OPTIMIZATION OF SUPER SCALAR PROCESSOR

    Get PDF
    Designing a microprocessor involves determining the optimal microarchitecture for a given objective function and a given set of constraints. Superscalar processing is the latest in along series of innovations aimed at producing ever-faster microprocessors. By exploiting instruction-level parallelism, superscalar processors[1] are capable of executing more than one instruction in a clock cycle.The architectural design of super scalar processor involves a lot of trade off issues when selecting parameter values for instruction level parallelism.The use of critical quantitative analysis based upon the Simple Scalar simulations is necessary to select optimal parameter values for the processor aimed at specific target environment. This paper aims at finding optimal values for the super scalar processor and determines which processor parameters have the greatest impact on the simulated execution time

    Dynamic Instruction Scheduling For Microprocessors Having Out Of Order Execution

    Get PDF
    Dynamic Instruction Scheduling is very much needed for fast working of multiprocessor and reduction of overhead by the processor. The Instruction scheduling logic mainly depends on associative searching of the entries to the dynamic wakeup instructions for the execution. We also describes the scheduler concept which also the concern for scalability and complexity of the multiprocessor. We have different Dynamic Instruction Scheduling Logic highlighting the objectives, goals, advantages and challenges facing during scheduling logic like energy issues and complexity issues as well as full description of dynamic instruction Scheduling logic. In this paper, we will be presented in a comprehensive analysis to reschedule the execution order of instructions for improve the performance of microprocessor. General Terms: Design, Performance, Measurement Keywords: Dynamic Instruction Scheduling, Instruction Grouping, Issue Queue

    StressTest: an automatic approach to test generation via activity monitors

    Get PDF

    A Comparison of x86 Computer Architecture Simulators

    Get PDF
    The significance of computer architecture simulators in advancing computer architecture research is widely acknowledged. Computer architects have developed numerous simulators in the past few decades and their number continues to rise. This paper explores different simulation techniques and surveys many simulators. Comparing simulators with each other and validating their correctness has been a challenging task. In this paper, we compare and contrast x86 simulators in terms of flexibility, level of details, user friendliness and simulation models. In addition, we measure the experimental error and compare the speed of four contemporary x86 simulators: gem5, Sniper, Multi2sim and PTLsim. We also discuss the strengths and limitations of the different simulators. We believe that this paper provides insights into different simulation strategies and aims to help computer architects understand the differences among the existing simulation tools

    Power considerations for memory-related microarchitecture designs

    Get PDF
    The fast performance improvement of computer systems in the last decade comes with the consistent increase on power consumption. In recent years, power dissipation is becoming a design constraint even for high-performance systems. Higher power dissipation means higher packaging and cooling cost, and lower reliability. This Ph.D. dissertation will investigate several memory-related design and optimization issues of general-purpose computer microarchitectures, aiming at reducing the power consumption without sacrificing the performance. The memory system consumes a large percentage of the system\u27s power. In addition, its behavior affects the processor power consumption significantly. In this dissertation, we propose two schemes to address the power-aware architecture issues related to memory: (1) We develop and evaluate low-power techniques for high-associativity caches. By dynamically applying different access modes for cache hits and misses, our proposed cache structure can achieve nearly lowest power consumption with minimal performance penalty. (2) We propose and evaluate look-ahead architectural adaptation techniques to reduce power consumption in processor pipelines based on the memory access information. The scheme can significantly reduce the power consumption of memory-intensive applications. Combined with other adaptation techniques, our schemes can effectively reduce the power consumption for both computer- and memory-intensive applications. The significance, potential impacts, and contributions of this dissertation are: (1) Academia and industry R & D has solely targeted the objective of high performance in both hardware and software designs since the beginning stage of building computer systems. However, the pursuit of high performance without considering energy consumption will inevitably lead to increased power dissipation and thus will eventually limit the development and progress of increasingly demanded mobile, portable, and high-performance computing systems. (2) Since our proposed method adaptively combines the merits of existing low-power cache designs, it approaches the optimum in terms of both retaining performance and saving energy. This low power solution for highly associative caches can be easily deployed with a low cost. (3) Using a cache miss , a common program execution event, as a triggering signal to slow down the processor issue rate, our scheme can effectively reduce processor power consumption. This design can be easily and practically deployed in many processor architectures with a low cost
    corecore