Optimizing GPU Convnets

Abstract

Convolution layers are useful for improving the accuracy of neural networks. In the case of networks like CosmoFlow with multiple consecutive convolution layers, the runtime for convolution layers dominates the end-to-end runtime. Several convolution algorithms, such as implicit GEMM, Fast Fourier transform, and Winograd have been optimized for different platforms. To achieve performance close to theoretical bounds, oftentimes manual fine-tuning is required which is specific to the target architecture. We use the DaCe framework to develop portable optimizations for 3D convolutions for implicit GEMM and direct convolution algorithms for the GPUs. We benchmark the optimized code against the available manually tuned library implementations

    Similar works