Processing-in-memory (PIM), as a novel computing paradigm, provides
significant performance benefits from the aspect of effective data movement
reduction. SRAM-based PIM has been demonstrated as one of the most promising
candidates due to its endurance and compatibility. However, the integration
density of SRAM-based PIM is much lower than other non-volatile memory-based
ones, due to its inherent 6T structure for storing a single bit. Within
comparable area constraints, SRAM-based PIM exhibits notably lower capacity.
Thus, aiming to unleash its capacity potential, we propose DDC-PIM, an
efficient algorithm/architecture co-design methodology that effectively doubles
the equivalent data capacity. At the algorithmic level, we propose a
filter-wise complementary correlation (FCC) algorithm to obtain a bitwise
complementary pair. At the architecture level, we exploit the intrinsic
cross-coupled structure of 6T SRAM to store the bitwise complementary pair in
their complementary states (Q/Q​), thereby maximizing the data
capacity of each SRAM cell. The dual-broadcast input structure and
reconfigurable unit support both depthwise and pointwise convolution, adhering
to the requirements of various neural networks. Evaluation results show that
DDC-PIM yields about 2.84× speedup on MobileNetV2 and 2.69× on
EfficientNet-B0 with negligible accuracy loss compared with PIM baseline
implementation. Compared with state-of-the-art SRAM-based PIM macros, DDC-PIM
achieves up to 8.41× and 2.75× improvement in weight density and
area efficiency, respectively.Comment: 14 pages, to be published in IEEE Transactions on Computer-Aided
Design of Integrated Circuits and Systems (TCAD