MEMTI: optimizing on-chip non-volatile storage for visual multi-task inference at the edge

Abstract

The combination of specialized hardware and embedded non-volatile memories (eNVM) holds promise for energy-efficient DNN inference at the edge. However, integrating DNN hardware accelerators with eNVMs still presents several challenges. Multi-level programming is desirable for achieving maximal storage density on chip, but the stochastic nature of eNVM writes makes them prone to errors and further increases the write energy and latency. We present MEMTI, a memory architecture that leverages a multi-task learning technique for maximal reuse of DNN parameters across multiple visual tasks. We show that by retraining and updating only 10% of all DNN parameters, we can achieve efficient model adaptation across a variety of visual inference tasks. The system performance is evaluated by integrating the memory with the open-source NVIDIA Deep Learning Architecture (NVDLA)

    Similar works