20 research outputs found

    PEZY-SC3: A MIMD Many-core Processor for Energy-efficient Computing

    Full text link
    PEZY-SC3 is a highly energy- and area-efficient processor for supercomputers developed using TSMC 7nm process technology. It is the third generation of the PEZY-SCx series developed by PEZY Computing, K.K. Supercomputers equipped with the PEZY-SCx series have been deployed at several research centers and are used for large scale scientific calculations. PEZY-SC3 outperforms previous PEZY-SCx and other processors in terms of energy and area efficiency. To achieve high efficiency, PEZY-SC3 employs a MIMD many-core, fine-grained multithreading, and non-coherent cache, focusing on applications involving high thread-level parallelism. Our MIMD many-core-based architecture achieves high efficiency while providing higher programmability than existing architectures based on specialized tensor units with limited functionality or wide-SIMD. Another key point of this architecture is to achieve both high efficiency and high throughput without using complex and expensive units such as out-of-order schedulers. Moreover, our novel non-coherent and hierarchical cache system enables high scalability on many-core without compromising programmability. The energy efficiency of a system equipped with PEZY-SC3 is approximately 24.6 GFlops/W, and it ranked 12th in the Green500 (November 2021), which measures the energy efficiency of supercomputers. In terms of processor architecture, all the systems ranked higher than the PEZY-SC3 system are equipped with NVIDIA A100 or Preferred Networks NM-Core, and thus PEZY-SC3 is the third-ranked processor after them. While A100 and NM-Core achieve high energy efficiency with tensor units specialized for specific functions, PEZY-SC3 does not have such specialized tensor units and thus has higher programmability

    Performance of Dynamic Instruction Window Resizing for a Given Power Budget under DVFS Control

    No full text

    Improvement of Renamed Trace Cache through the Reduction of Dependent Path Length for High Energy Efficiency

    No full text

    FXA: Executing Instructions in Front-End for Energy Efficiency

    No full text

    Address Order Violation Detection with Parallel Counting Bloom Filters

    No full text

    Skewed Multistaged Multibanked Register File for Area and Energy Efficiency

    No full text

    Design of a Register Cache System with an Open Source Process Design Kit for 45nm Technology

    No full text
    corecore