1 research outputs found
TensorDash: Exploiting Sparsity to Accelerate Deep Neural Network Training and Inference
TensorDash is a hardware level technique for enabling data-parallel MAC units
to take advantage of sparsity in their input operand streams. When used to
compose a hardware accelerator for deep learning, TensorDash can speedup the
training process while also increasing energy efficiency. TensorDash combines a
low-cost, sparse input operand interconnect comprising an 8-input multiplexer
per multiplier input, with an area-efficient hardware scheduler. While the
interconnect allows a very limited set of movements per operand, the scheduler
can effectively extract sparsity when it is present in the activations, weights
or gradients of neural networks. Over a wide set of models covering various
applications, TensorDash accelerates the training process by
while being more energy-efficient, more energy
efficient when taking on-chip and off-chip memory accesses into account. While
TensorDash works with any datatype, we demonstrate it with both
single-precision floating-point units and bfloat16