2 research outputs found
Ultra Power-Efficient CNN Domain Specific Accelerator with 9.3TOPS/Watt for Mobile and Embedded Applications
Computer vision performances have been significantly improved in recent years
by Convolutional Neural Networks(CNN). Currently, applications using CNN
algorithms are deployed mainly on general purpose hardwares, such as CPUs, GPUs
or FPGAs. However, power consumption, speed, accuracy, memory footprint, and
die size should all be taken into consideration for mobile and embedded
applications. Domain Specific Architecture (DSA) for CNN is the efficient and
practical solution for CNN deployment and implementation. We designed and
produced a 28nm Two-Dimensional CNN-DSA accelerator with an ultra
power-efficient performance of 9.3TOPS/Watt and with all processing done in the
internal memory instead of outside DRAM. It classifies 224x224 RGB image inputs
at more than 140fps with peak power consumption at less than 300mW and an
accuracy comparable to the VGG benchmark. The CNN-DSA accelerator is
reconfigurable to support CNN model coefficients of various layer sizes and
layer types, including convolution, depth-wise convolution, short-cut
connections, max pooling, and ReLU. Furthermore, in order to better support
real-world deployment for various application scenarios, especially with
low-end mobile and embedded platforms and MCUs (Microcontroller Units), we also
designed algorithms to fully utilize the CNN-DSA accelerator efficiently by
reducing the dependency on external accelerator computation resources,
including implementation of Fully-Connected (FC) layers within the accelerator
and compression of extracted features from the CNN-DSA accelerator. Live demos
with our CNN-DSA accelerator on mobile and embedded systems show its
capabilities to be widely and practically applied in the real world.Comment: 9 pages, 10 Figures. Accepted by CVPR 2018 Efficient Deep Learning
for Computer Vision worksho
Effective, Fast, and Memory-Efficient Compressed Multi-function Convolutional Neural Networks for More Accurate Medical Image Classification
Convolutional Neural Networks (CNNs) usually use the same activation
function, such as RELU, for all convolutional layers. There are performance
limitations of just using RELU. In order to achieve better classification
performance, reduce training and testing times, and reduce power consumption
and memory usage, a new "Compressed Multi-function CNN" is developed. Google's
Inception-V4, for example, is a very deep CNN that consists of 4 Inception-A
blocks, 7 Inception-B blocks, and 3 Inception-C blocks. RELU is used for all
convolutional layers. A new "Compressed Multi-function Inception-V4" (CMI) that
can use different activation functions is created with k Inception-A blocks, m
Inception-B blocks, and n Inception-C blocks where k in {1, 2, 3, 4}, m in {1,
2, 3, 4, 5, 6, 7}, n in {1, 2, 3}, and (k+m+n)<14. For performance analysis, a
dataset for classifying brain MRI images into one of the four stages of
Alzheimer's disease is used to compare three CMI architectures with
Inception-V4 in terms of F1-score, training and testing times (related to power
consumption), and memory usage (model size). Overall, simulations show that the
new CMI models can outperform both the commonly used Inception-V4 and
Inception-V4 using different activation functions. In the future, other
"Compressed Multi-function CNNs", such as "Compressed Multi-function ResNets
and DenseNets" that have a reduced number of convolutional blocks using
different activation functions, will be developed to further increase
classification accuracy, reduce training and testing times, reduce
computational power, and reduce memory usage (model size) for building more
effective healthcare systems, such as implementing accurate and convenient
disease diagnosis systems on mobile devices that have limited battery power and
memory