44 research outputs found
Open-Source GEMM Hardware Kernels Generator: Toward Numerically-Tailored Computations
Many scientific computing problems can be reduced to Matrix-Matrix
Multiplications (MMM), making the General Matrix Multiply (GEMM) kernels in the
Basic Linear Algebra Subroutine (BLAS) of interest to the high-performance
computing community. However, these workloads have a wide range of numerical
requirements. Ill-conditioned linear systems require high-precision arithmetic
to ensure correct and reproducible results. In contrast, emerging workloads
such as deep neural networks, which can have millions up to billions of
parameters, have shown resilience to arithmetic tinkering and precision
lowering
Dynamic Power Consumption of the Full Posit Processing Unit: Analysis and Experiments
Since its introduction in 2017, the Positâ„¢ format for representing real numbers has attracted a
lot of interest, as an alternative to IEEE 754 floating point representation. Several hardware
implementations of arithmetic operations between posit numbers have also been proposed in recent
years. In this work, we analyze the dynamic power consumption of the Full Posit Processing Unit
(FPPU) recently developed at the University of Pisa. Experimental results show that we can model
the dynamic power consumption of the FPPU with an acceptable approximation error from 2.84%
(32-bit FPPU) to 7.32% (8-bit FPPU). Furthermore, from the synthesis of the power monitoring
unit alongside the FPPU we demonstrate that the additional power module has an area cost that
goes from ∼ 5% (32-bit FPPU) to ∼ 30% (8-bit FPPU) of the total unit area occupatio
FP8 Formats for Deep Learning
FP8 is a natural progression for accelerating deep learning training
inference beyond the 16-bit formats common in modern processors. In this paper
we propose an 8-bit floating point (FP8) binary interchange format consisting
of two encodings - E4M3 (4-bit exponent and 3-bit mantissa) and E5M2 (5-bit
exponent and 2-bit mantissa). While E5M2 follows IEEE 754 conventions for
representatio of special values, E4M3's dynamic range is extended by not
representing infinities and having only one mantissa bit-pattern for NaNs. We
demonstrate the efficacy of the FP8 format on a variety of image and language
tasks, effectively matching the result quality achieved by 16-bit training
sessions. Our study covers the main modern neural network architectures - CNNs,
RNNs, and Transformer-based models, leaving all the hyperparameters unchanged
from the 16-bit baseline training sessions. Our training experiments include
large, up to 175B parameter, language models. We also examine FP8
post-training-quantization of language models trained using 16-bit formats that
resisted fixed point int8 quantization