1 research outputs found
Softermax: Hardware/Software Co-Design of an Efficient Softmax for Transformers
Transformers have transformed the field of natural language processing. This
performance is largely attributed to the use of stacked self-attention layers,
each of which consists of matrix multiplies as well as softmax operations. As a
result, unlike other neural networks, the softmax operation accounts for a
significant fraction of the total run-time of Transformers. To address this, we
propose Softermax, a hardware-friendly softmax design. Softermax consists of
base replacement, low-precision softmax computations, and an online
normalization calculation. We show Softermax results in 2.35x the energy
efficiency at 0.90x the size of a comparable baseline, with negligible impact
on network accuracy.Comment: To appear in Proceedings of the 58th Design Automation Conference
(DAC '21