5 research outputs found

    Inherently workload-balanced clustered microarchitecture

    Get PDF
    The performance of clustered microarchitectures relies on steering schemes that try to find the best trade-off between workload balance and inter-cluster communication penalties. In previously proposed clustered processors, reducing communication penalties and balancing the workload are opposite targets, since improving one usually implies a detriment in the other. In this paper we propose a new clustered microarchitecture that can minimize communication penalties without compromising workload balance. The key idea is to arrange the clusters in a ring topology in such a way that results of one cluster can be forwarded to the neighbor cluster with a very short latency. In this way, minimizing communication penalties is favored when the producer of a value and its consumer are placed in adjacent clusters, which also favors workload balance. The proposed microarchitecture is shown to outperform a state-of-the-art clustered processor. For instance, for an 8-cluster configuration and just one fully pipelined unidirectional bus, 15% speedup is achieved on average for FP programs.Peer ReviewedPostprint (published version

    RingScalar: A Complexity-Effective Out-of-Order Superscalar Microarchitecture

    Get PDF
    RingScalar is a complexity-effective microarchitecture for out-of-order superscalar processors, that reduces the area, latency, and power of all major structures in the instruction flow. The design divides an N-way superscalar into N columns connected in a unidirectional ring, where each column contains a portion of the instruction window, a bank of the register file, and an ALU. The design exploits the fact that most decoded instructions are waiting on just one operand to use only a single tag per issue window entry, and to restrict instruction wakeup and value bypass to only communicate with the neighboring column. Detailed simulations of four-issue single-threaded machines running SPECint2000 show that RingScalar has IPC only 13% lower than an idealized superscalar, while providing large reductions in area, power, and circuit latency

    Inherently Workload-Balanced Clustered Microarchitecture

    No full text

    Banked microarchitectures for complexity-effective superscalar microprocessors

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.Includes bibliographical references (p. 95-99).High performance superscalar microarchitectures exploit instruction-level parallelism (ILP) to improve processor performance by executing instructions out of program order and by speculating on branch instructions. Monolithic centralized structures with global communications, including issue windows and register files, are used to buffer in-flight instructions and to maintain machine state. These structures scale poorly to greater issue widths and deeper pipelines, as they must support simultaneous global accesses from all active instructions. The lack of scalability is exacerbated in future technologies, which have increasing global interconnect delay and a much greater emphasis on reducing both switching and leakage power. However, these fully orthogonal structures are over-engineered for typical use. Banked microarchitectures that consist of multiple interleaved banks of fewer ported cells can significantly reduce power, area, and latency of these structures.(cont.) Although banked structures exhibit a minor performance penalty, significant reductions in delay and power can potentially be used to increase clock rate and lead to more complexity-effective designs. There are two main contributions in this thesis. First, a speculative control scheme is proposed to simplify the complicated control logic that is involved in managing a less-ported banked register file for high-frequency superscalar processors. Second, the RingScalar architecture, a complexity-effective out-of-order superscalar microarchitecture, based on a ring topology of banked structures, is introduced and evaluated.by Jessica Hui-Chun Tseng.Ph.D

    Inherently workload-balanced clustered microarchitecture

    No full text
    The performance of clustered microarchitectures relies on steering schemes that try to find the best trade-off between workload balance and inter-cluster communication penalties. In previously proposed clustered processors, reducing communication penalties and balancing the workload are opposite targets, since improving one usually implies a detriment in the other. In this paper we propose a new clustered microarchitecture that can minimize communication penalties without compromising workload balance. The key idea is to arrange the clusters in a ring topology in such a way that results of one cluster can be forwarded to the neighbor cluster with a very short latency. In this way, minimizing communication penalties is favored when the producer of a value and its consumer are placed in adjacent clusters, which also favors workload balance. The proposed microarchitecture is shown to outperform a state-of-the-art clustered processor. For instance, for an 8-cluster configuration and just one fully pipelined unidirectional bus, 15% speedup is achieved on average for FP programs.Peer Reviewe
    corecore