TensorDash: Exploiting Sparsity to Accelerate Deep Neural Network
  Training and Inference

Albericio, Jorge; Awad, Omar Mohamed; Edo, Isak; Mahmoud, Mostafa; Moshovos, Andreas; Pekhimenko, Gennady; Zadeh, Ali Hadi

TensorDash: Exploiting Sparsity to Accelerate Deep Neural Network Training and Inference

Authors: Jorge Albericio
Omar Mohamed Awad
Isak Edo
Mostafa Mahmoud
Andreas Moshovos
Gennady Pekhimenko
Ali Hadi Zadeh
Publication date: 1 September 2020
Publisher
Doi

Abstract

TensorDash is a hardware level technique for enabling data-parallel MAC units to take advantage of sparsity in their input operand streams. When used to compose a hardware accelerator for deep learning, TensorDash can speedup the training process while also increasing energy efficiency. TensorDash combines a low-cost, sparse input operand interconnect comprising an 8-input multiplexer per multiplier input, with an area-efficient hardware scheduler. While the interconnect allows a very limited set of movements per operand, the scheduler can effectively extract sparsity when it is present in the activations, weights or gradients of neural networks. Over a wide set of models covering various applications, TensorDash accelerates the training process by

1.95{\times}

while being

1.89\times

more energy-efficient,

1.6\times

more energy efficient when taking on-chip and off-chip memory accesses into account. While TensorDash works with any datatype, we demonstrate it with both single-precision floating-point units and bfloat16

Similar works

Full text

Available Versions

Crossref

Last time updated on 11/08/2021