Search CORE

6 research outputs found

OpenCNN: A Winograd Minimal Filtering Algorithm Implementation in CUDA

Author: Andrade Diego
Fraguela Basilio B.
López Castro Roberto
Publication venue: 'MDPI AG'
Publication date: 01/01/2021
Field of study

[Abstract] Improving the performance of the convolution operation has become a key target for High Performance Computing (HPC) developers due to its prevalence in deep learning applied mainly to video processing. The improvement is being pushed by algorithmic and implementation innovations. Algorithmically, the convolution can be solved as it is mathematically enunciated, but other methods allow to transform it into a Fast Fourier Transform (FFT) or a GEneral Matrix Multiplication (GEMM). In this latter group, the Winograd algorithm is a state-of-the-art variant that is specially suitable for smaller convolutions. In this paper, we present openCNN, an optimized CUDA C++ implementation of the Winograd convolution algorithm. Our approach achieves speedups of up to 1.76× on Turing RTX 2080Ti and up to 1.85× on Ampere RTX 3090 with respect to Winograd convolution in cuDNN 8.2.0. OpenCNN is released as open-source software.This research was supported by the Ministry of Science and Innovation of Spain (PID2019-104184RB-I00, AEI/FEDER/EU, 10.13039/501100011033) and the predoctoral grant of Roberto L. Castro (FPU19/03974). and by the Xunta de Galicia co-founded by the European Regional Development Fund (ERDF) under the Consolidation Programme of Competitive Reference Groups (ED431C 2021/30). CITIC, Centro de Investigación de Galicia ref. ED431G 2019/01, receives financial support from Consellería de Educación, Universidade e Formación Profesional, Xunta de Galicia, through the ERDF (80%) and Secretaría Xeral de Universidades (20%)Xunta de Galicia; ED431C 2021/30Xunta de Galicia; ED431G 2019/0

Multidisciplinary Digital Publishing Institute

Repositorio da Universidade da Coruña

Accurate deep neural network inference using computational phase-change memory

Author: Boybat Irem
Dazzi Martino
Eleftheriou Evangelos
Gallo Manuel Le
Haefeli Simon
Joshi Vinay
Nandakumar S. R.
Piveteau Christophe
Rajendran Bipin
Sebastian Abu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

In-memory computing is a promising non-von Neumann approach for making energy-efficient deep learning inference hardware. Crossbar arrays of resistive memory devices can be used to encode the network weights and perform efficient analog matrix-vector multiplications without intermediate movements of data. However, due to device variability and noise, the network needs to be trained in a specific way so that transferring the digitally trained weights to the analog resistive memory devices will not result in significant loss of accuracy. Here, we introduce a methodology to train ResNet-type convolutional neural networks that results in no appreciable accuracy loss when transferring weights to in-memory computing hardware based on phase-change memory (PCM). We also propose a compensation technique that exploits the batch normalization parameters to improve the accuracy retention over time. We achieve a classification accuracy of 93.7% on the CIFAR-10 dataset and a top-1 accuracy on the ImageNet benchmark of 71.6% after mapping the trained weights to PCM. Our hardware results on CIFAR-10 with ResNet-32 demonstrate an accuracy above 93.5% retained over a one day period, where each of the 361,722 synaptic weights of the network is programmed on just two PCM devices organized in a differential configuration.Comment: This is a pre-print of an article accepted for publication in Nature Communication

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Repository for Publications and Research Data

King's Research Portal

Nucleon-Nucleon Scattering in a Wave-Packet Formalism

Author: Miller Sean
Publication venue
Publication date: 01/01/2020
Field of study

In this thesis I analyse the prospect of leveraging statistical analyses of the strong nuclear interaction by using the wave-packet continuum discretisation (WPCD) method to efficiently compute nucleon-nucleon (NN) scattering observables on a graphics processing unit (GPU). The WPCD method gives approximate solutions to the S-matrix at multiple scattering energies at the cost of a single eigendecomposition of the NN channel Hamiltonian. In particular, I demonstrate and analyse the accuracy and inherent parallelism of the WPCD method by computing the most common NN scattering observables using a chiral Hamiltonian at next-to-next-to-leading order. I present an in-depth numerical study of the WPCD method and the GPU acceleration thereof. Additionally, I discuss which windows of opportunity are open for studying the strong nuclear interaction using data from few-nucleon scattering experiments

Chalmers Research