3 research outputs found
FARe: Fault-Aware GNN Training on ReRAM-based PIM Accelerators
Resistive random-access memory (ReRAM)-based processing-in-memory (PIM)
architecture is an attractive solution for training Graph Neural Networks
(GNNs) on edge platforms. However, the immature fabrication process and limited
write endurance of ReRAMs make them prone to hardware faults, thereby limiting
their widespread adoption for GNN training. Further, the existing
fault-tolerant solutions prove inadequate for effectively training GNNs in the
presence of faults. In this paper, we propose a fault-aware framework referred
to as FARe that mitigates the effect of faults during GNN training. FARe
outperforms existing approaches in terms of both accuracy and timing overhead.
Experimental results demonstrate that FARe framework can restore GNN test
accuracy by 47.6% on faulty ReRAM hardware with a ~1% timing overhead compared
to the fault-free counterpart.Comment: This paper has been accepted to the conference DATE (Design,
Automation and Test in Europe) - 202