7 research outputs found
MG3MConv: Multi-Grained Matrix-Multiplication-Mapping Convolution Algorithm toward the SW26010 Processor
As the core of artificial intelligence applications, the research of
convolution has become a hot topic in high performance computing. With the
rapid development of the emerging SW26010 processor in artificial intelligence,
there is an urgent need for high-performance convolution algorithms on the
processor. However, the current support of convolution on SW26010 is still
rudimentary. The only studies provide sufficient runtime peak performance but
lack the adaptability to various convolution scenes. To perfect convolution
algorithms on SW26010, we propose a multi-grained matrix-multiplication-mapping
convolution algorithm called MG3MConv, which targets the architectural features
of SW26010. MG3MConv supports diversified mapping schemes of convolution tasks
based on the concept of the thread block proposed in this paper. All the
architecture-oriented optimization methods are elaborately designed from four
levels to fully exploit the hardware efficiency of SW26010. The experiments
show that the hardware efficiency of MG3MConv can reach 84.78% in max, which is
1.75 times compared with that of cuDNN based on NVIDIA K80m GPU. Moreover,
MG3MConv can overperform cuDNN in most convolution scenes. We also use six
representative CNNs as real-world cases, and the hardware efficiency of
MG3MConv reaches up to 67.04% on the VGG network model, which is 1.37 times and
1.96 times that of cuDNN and swDNN, respectively
Supercomputing Frontiers
This open access book constitutes the refereed proceedings of the 6th Asian Supercomputing Conference, SCFA 2020, which was planned to be held in February 2020, but unfortunately, the physical conference was cancelled due to the COVID-19 pandemic. The 8 full papers presented in this book were carefully reviewed and selected from 22 submissions. They cover a range of topics including file systems, memory hierarchy, HPC cloud platform, container image configuration workflow, large-scale applications, and scheduling