1 research outputs found

    Large Matrix Multiplication on a Novel Heterogeneous

    Get PDF
    Abstract. This paper introduces a novel master-multi-SIMD on-chip multi-core architecture for embedded signal processing. The parallel architecture and its memory subsystem are described in this paper. We evaluate the large size matrix multiplication performance on this parallel architecture and compare it with a SIMD-extended data parallel architecture. We also examine how well the new architecture scales for di«erent numbers of SIMD co-processors. The experimental results show that the ePUMA 1 architecture's memory subsystem can e«ectively hide the data access overhead. With its 8-way SIMD data path and multi-SIMD parallel execution, the ePUMA architecture improves the performance of matrix multiplication with a speedup of 45x from the conventional SIMD extension
    corecore