1 research outputs found

    Accelerating Low Bit-Width Deep Convolution Neural Network In Mram

    No full text
    Deep Convolution Neural Network (CNN) has achieved outstanding performance in image recognition over large scale dataset. However, pursuit of higher inference accuracy leads to CNN architecture with deeper layers and denser connections, which inevitably makes its hardware implementation demand more and more memory and computational resources. It can be interpreted as \u27CNN power and memory wall\u27. Recent research efforts have significantly reduced both model size and computational complexity by using low bit-width weights, activations and gradients, while keeping reasonably good accuracy. In this work, we present different emerging nonvolatile Magnetic Random Access Memory (MRAM) designs that could be leveraged to implement \u27bit-wise in-memory convolution engine\u27, which could simultaneously store network parameters and compute low bit-width convolution. Such new computing model leverages the \u27in-memory computing\u27 concept to accelerate CNN inference and reduce convolution energy consumption due to intrinsic logic-in-memory design and reduction of data communication
    corecore