Convolutional Neural Networks (CNNs) demonstrate excellent performance in
various applications but have high computational complexity. Quantization is
applied to reduce the latency and storage cost of CNNs. Among the quantization
methods, Binary and Ternary Weight Networks (BWNs and TWNs) have a unique
advantage over 8-bit and 4-bit quantization. They replace the multiplication
operations in CNNs with additions, which are favoured on In-Memory-Computing
(IMC) devices. IMC acceleration for BWNs has been widely studied. However,
though TWNs have higher accuracy and better sparsity than BWNs, IMC
acceleration for TWNs has limited research. TWNs on existing IMC devices are
inefficient because the sparsity is not well utilized, and the addition
operation is not efficient.
In this paper, we propose FAT as a novel IMC accelerator for TWNs. First, we
propose a Sparse Addition Control Unit, which utilizes the sparsity of TWNs to
skip the null operations on zero weights. Second, we propose a fast addition
scheme based on the memory Sense Amplifier to avoid the time overhead of both
carry propagation and writing back the carry to memory cells. Third, we further
propose a Combined-Stationary data mapping to reduce the data movement of
activations and weights and increase the parallelism across memory columns.
Simulation results show that for addition operations at the Sense Amplifier
level, FAT achieves 2.00X speedup, 1.22X power efficiency, and 1.22X area
efficiency compared with a State-Of-The-Art IMC accelerator ParaPIM. FAT
achieves 10.02X speedup and 12.19X energy efficiency compared with ParaPIM on
networks with 80% average sparsity.Comment: 14 page