1 research outputs found

    National Natural Science Foundation of China

    Get PDF
    Abstract The advent of multi-core/many-core chip technology offers both an extraordinary opportunity and a profound challenge. In particular, computer architects and system software designers are faced with a unique opportunity to introducing new architecture features as well as adequate compiler technologytogether they may have profound impact. This paper presents a case study (using the 1D Stencil computation) of compiler-amendable performance optimization techniques on a many-core architecture Godson-T. Godson-T architecture has several unique features that are chosen for this study: (1) chip-level global addressable memory -in particular the scratchpad memories (SPM) local to the processing cores; (2) fine-grain memory based synchronization (e.g. full-empty bit for fine-grain synchronization). Leveraging state-of-the-art performance optimization methods for 1-D stencil parallelization (e.g. timed tiling and variants), we developed and implement a number many-core based optimization for Godson-T. Our experimental study show good performance improvements in both execution time speedups and scalability, validated the value of globally accessed SPM and fine-grain synchronization mechanism (full-empty bits) under the Godson-T, and provide some useful guidelines for future compiler technology of many-core chip architectures
    corecore