Many scientific computing problems can be reduced to Matrix-Matrix
Multiplications (MMM), making the General Matrix Multiply (GEMM) kernels in the
Basic Linear Algebra Subroutine (BLAS) of interest to the high-performance
computing community. However, these workloads have a wide range of numerical
requirements. Ill-conditioned linear systems require high-precision arithmetic
to ensure correct and reproducible results. In contrast, emerging workloads
such as deep neural networks, which can have millions up to billions of
parameters, have shown resilience to arithmetic tinkering and precision
lowering