1 research outputs found

    High Performance Matrix Multiplication on Many Cores

    No full text
    Abstract. Moore’s Law suggests that the number of processing cores on a single chip increases exponentially. The future performance in-creases will be mainly extracted from thread-level parallelism exploited by multi/many-core processors (MCP). Therefore, it is necessary to find out how to build the MCP hardware and how to program the paral-lelism on such MCP. In this work, we intend to identity the key archi-tecture mechanisms and software optimizations to guarantee high per-formance for multithreaded programs. To illustrate this, we customize a dense matrix multiplication algorithm on Godson-T MCP as a case study to demonstrate the efficient synergy and interaction between hard-ware and software. Experiments conducted on the cycle-accurate simu-lator show that the optimized matrix multiplication could obtain 97.1% (124.3GFLOPS) of the peak performance of Godson-T.
    corecore