The combination of specialized hardware and embedded non-volatile memories (eNVM) holds promise for energy-efficient
DNN inference at the edge. However, integrating DNN hardware accelerators with eNVMs still presents several challenges. Multi-level
programming is desirable for achieving maximal storage density on chip, but the stochastic nature of eNVM writes makes them prone
to errors and further increases the write energy and latency. We present MEMTI, a memory architecture that leverages a multi-task
learning technique for maximal reuse of DNN parameters across multiple visual tasks. We show that by retraining and updating only
10% of all DNN parameters, we can achieve efficient model adaptation across a variety of visual inference tasks. The system
performance is evaluated by integrating the memory with the open-source NVIDIA Deep Learning Architecture (NVDLA)