6 research outputs found

    OpenCNN: A Winograd Minimal Filtering Algorithm Implementation in CUDA

    Get PDF
    [Abstract] Improving the performance of the convolution operation has become a key target for High Performance Computing (HPC) developers due to its prevalence in deep learning applied mainly to video processing. The improvement is being pushed by algorithmic and implementation innovations. Algorithmically, the convolution can be solved as it is mathematically enunciated, but other methods allow to transform it into a Fast Fourier Transform (FFT) or a GEneral Matrix Multiplication (GEMM). In this latter group, the Winograd algorithm is a state-of-the-art variant that is specially suitable for smaller convolutions. In this paper, we present openCNN, an optimized CUDA C++ implementation of the Winograd convolution algorithm. Our approach achieves speedups of up to 1.76脳 on Turing RTX 2080Ti and up to 1.85脳 on Ampere RTX 3090 with respect to Winograd convolution in cuDNN 8.2.0. OpenCNN is released as open-source software.This research was supported by the Ministry of Science and Innovation of Spain (PID2019-104184RB-I00, AEI/FEDER/EU, 10.13039/501100011033) and the predoctoral grant of Roberto L. Castro (FPU19/03974). and by the Xunta de Galicia co-founded by the European Regional Development Fund (ERDF) under the Consolidation Programme of Competitive Reference Groups (ED431C 2021/30). CITIC, Centro de Investigaci贸n de Galicia ref. ED431G 2019/01, receives financial support from Conseller铆a de Educaci贸n, Universidade e Formaci贸n Profesional, Xunta de Galicia, through the ERDF (80%) and Secretar铆a Xeral de Universidades (20%)Xunta de Galicia; ED431C 2021/30Xunta de Galicia; ED431G 2019/0

    Accurate deep neural network inference using computational phase-change memory

    Get PDF
    In-memory computing is a promising non-von Neumann approach for making energy-efficient deep learning inference hardware. Crossbar arrays of resistive memory devices can be used to encode the network weights and perform efficient analog matrix-vector multiplications without intermediate movements of data. However, due to device variability and noise, the network needs to be trained in a specific way so that transferring the digitally trained weights to the analog resistive memory devices will not result in significant loss of accuracy. Here, we introduce a methodology to train ResNet-type convolutional neural networks that results in no appreciable accuracy loss when transferring weights to in-memory computing hardware based on phase-change memory (PCM). We also propose a compensation technique that exploits the batch normalization parameters to improve the accuracy retention over time. We achieve a classification accuracy of 93.7% on the CIFAR-10 dataset and a top-1 accuracy on the ImageNet benchmark of 71.6% after mapping the trained weights to PCM. Our hardware results on CIFAR-10 with ResNet-32 demonstrate an accuracy above 93.5% retained over a one day period, where each of the 361,722 synaptic weights of the network is programmed on just two PCM devices organized in a differential configuration.Comment: This is a pre-print of an article accepted for publication in Nature Communication

    Nucleon-Nucleon Scattering in a Wave-Packet Formalism

    Get PDF
    In this thesis I analyse the prospect of leveraging statistical analyses of the strong nuclear interaction by using the wave-packet continuum discretisation (WPCD) method to efficiently compute nucleon-nucleon (NN) scattering observables on a graphics processing unit (GPU). The WPCD method gives approximate solutions to the S-matrix at multiple scattering energies at the cost of a single eigendecomposition of the NN channel Hamiltonian. In particular, I demonstrate and analyse the accuracy and inherent parallelism of the WPCD method by computing the most common NN scattering observables using a chiral Hamiltonian at next-to-next-to-leading order. I present an in-depth numerical study of the WPCD method and the GPU acceleration thereof. Additionally, I discuss which windows of opportunity are open for studying the strong nuclear interaction using data from few-nucleon scattering experiments
    corecore