4 research outputs found

    Reducing Register Ports Using Delayed Write-Back Queues And Operand Pre-Fetch

    No full text
    In high-performance wide-issue microprocessors the access time, energy and area of the register file are often critical to overall performance. This is because these pararmeters grow superlinearly as read and write ports are added to support wide-issue. This paper presents techniques to reduce the number of ports of a register file intended for a wide-issue microprocessor without noticeably impacting its IPC. Our results show that it is possible to replace the 16 read/8 write port file of an eight-issue processor with an 8 read/8 write port file so that the impact on IPC is insignificant. This is accomplished with the addition of some small auxiliary memory structures. Furthermore, the access time of the smaller file plus the auxiliary structures is such that if it were the critical path a 45-50 % increase in clock speed would be possible. Finally, there is an energy per access savings of about 20 % and an area savings of 40%, which has the potential for further savings by shortening global interconnect in the layout. An extension to the scheme that reduces the number of write ports from 8 to 6 is also presented. It suffers modest penalty in terms of IPC, but shows further reduction in energy and area. Depending on implementation characteristics it could yield a further increase in performance

    Reducing Register Ports Using Delayed Write-Back Queues And Operand Pre-Fetch

    No full text
    In high-performance wide-issue microprocessors the access time, energy and area of the register file are often critical to overall performance. This is because these pararmeters grow superlinearly as read and write ports are added to support wide-issue. This paper presents techniques to reduce the number of ports of a register file intended for a wide-issue microprocessor without noticeably impacting its IPC. Our results show that it is possible to replace the 16 read/8 write port file of an eight-issue processor with an 8 read/8 write port file so that the impact on IPC is insignificant. This is accomplished with the addition of some small auxiliary memory structures. Furthermore, the access time of the smaller file plus the auxiliary structures is such that if it were the critical path a 45-50 % increase in clock speed would be possible. Finally, there is an energy per access savings of about 20 % and an area savings of 40%, which has the potential for further savings by shortening global interconnect in the layout. An extension to the scheme that reduces the number of write ports from 8 to 6 is also presented. It suffers modest penalty in terms of IPC, but shows further reduction in energy and area. Depending on implementation characteristics it could yield a further increase in performance

    Register Multimapping: Reducing Register Bank Conflicts Through One-to-Many Logical-to-Physical Register Mapping

    Get PDF
    Coordinated Science Laboratory was formerly known as Control Systems Laborator

    Banked microarchitectures for complexity-effective superscalar microprocessors

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.Includes bibliographical references (p. 95-99).High performance superscalar microarchitectures exploit instruction-level parallelism (ILP) to improve processor performance by executing instructions out of program order and by speculating on branch instructions. Monolithic centralized structures with global communications, including issue windows and register files, are used to buffer in-flight instructions and to maintain machine state. These structures scale poorly to greater issue widths and deeper pipelines, as they must support simultaneous global accesses from all active instructions. The lack of scalability is exacerbated in future technologies, which have increasing global interconnect delay and a much greater emphasis on reducing both switching and leakage power. However, these fully orthogonal structures are over-engineered for typical use. Banked microarchitectures that consist of multiple interleaved banks of fewer ported cells can significantly reduce power, area, and latency of these structures.(cont.) Although banked structures exhibit a minor performance penalty, significant reductions in delay and power can potentially be used to increase clock rate and lead to more complexity-effective designs. There are two main contributions in this thesis. First, a speculative control scheme is proposed to simplify the complicated control logic that is involved in managing a less-ported banked register file for high-frequency superscalar processors. Second, the RingScalar architecture, a complexity-effective out-of-order superscalar microarchitecture, based on a ring topology of banked structures, is introduced and evaluated.by Jessica Hui-Chun Tseng.Ph.D
    corecore