Quantum circuit simulation provides the foundation for the development of
quantum algorithms and the verification of quantum supremacy. Among the various
methods for quantum circuit simulation, tensor network contraction has been
increasing in popularity due to its ability to simulate a larger number of
qubits. During tensor contraction, the input tensors are reshaped to matrices
and computed by a GEMM operation, where these GEMM operations could reach up to
90\% of the total calculation time. GEMM throughput can be improved by
utilizing mixed-precision hardware such as Tensor Cores, but straightforward
implementation results in insufficient fidelity for deep and large quantum
circuits. Prior work has demonstrated that compensated summation with special
care of the rounding mode can fully recover the FP32 precision of SGEMM even
when using TF32 or FP16 Tensor Cores. The exponent range is a critical issue
when applying such techniques to quantum circuit simulation. While TF32
supports almost the same exponent range as FP32, FP16 supports a much smaller
exponent range. In this work, we use the exponent range statistics of input
tensor elements to select which Tensor Cores we use for the GEMM. We evaluate
our method on Random Circuit Sampling (RCS), including Sycamore's quantum
circuit, and show that the throughput is 1.86 times higher at maximum while
maintaining accuracy.Comment: This paper has been accepted to ISC'2