9 research outputs found
XPert: Peripheral Circuit & Neural Architecture Co-search for Area and Energy-efficient Xbar-based Computing
The hardware-efficiency and accuracy of Deep Neural Networks (DNNs)
implemented on In-memory Computing (IMC) architectures primarily depend on the
DNN architecture and the peripheral circuit parameters. It is therefore
essential to holistically co-search the network and peripheral parameters to
achieve optimal performance. To this end, we propose XPert, which co-searches
network architecture in tandem with peripheral parameters such as the type and
precision of analog-to-digital converters, crossbar column sharing and the
layer-specific input precision using an optimization-based design space
exploration. Compared to VGG16 baselines, XPert achieves 10.24x (4.7x) lower
EDAP, 1.72x (1.62x) higher TOPS/W,1.93x (3x) higher TOPS/mm2 at 92.46% (56.7%)
accuracy for CIFAR10 (TinyImagenet) datasets. The code for this paper is
available at https://github.com/Intelligent-Computing-Lab-Yale/XPert.Comment: Accepted to Design and Automation Conference (DAC
Computing-In-Memory Neural Network Accelerators for Safety-Critical Systems: Can Small Device Variations Be Disastrous?
Computing-in-Memory (CiM) architectures based on emerging non-volatile memory
(NVM) devices have demonstrated great potential for deep neural network (DNN)
acceleration thanks to their high energy efficiency. However, NVM devices
suffer from various non-idealities, especially device-to-device variations due
to fabrication defects and cycle-to-cycle variations due to the stochastic
behavior of devices. As such, the DNN weights actually mapped to NVM devices
could deviate significantly from the expected values, leading to large
performance degradation. To address this issue, most existing works focus on
maximizing average performance under device variations. This objective would
work well for general-purpose scenarios. But for safety-critical applications,
the worst-case performance must also be considered. Unfortunately, this has
been rarely explored in the literature. In this work, we formulate the problem
of determining the worst-case performance of CiM DNN accelerators under the
impact of device variations. We further propose a method to effectively find
the specific combination of device variation in the high-dimensional space that
leads to the worst-case performance. We find that even with very small device
variations, the accuracy of a DNN can drop drastically, causing concerns when
deploying CiM accelerators in safety-critical applications. Finally, we show
that surprisingly none of the existing methods used to enhance average DNN
performance in CiM accelerators are very effective when extended to enhance the
worst-case performance, and further research down the road is needed to address
this problem
NASCaps: A Framework for Neural Architecture Search to Optimize the Accuracy and Hardware Efficiency of Convolutional Capsule Networks
Deep Neural Networks (DNNs) have made significant improvements to reach the
desired accuracy to be employed in a wide variety of Machine Learning (ML)
applications. Recently the Google Brain's team demonstrated the ability of
Capsule Networks (CapsNets) to encode and learn spatial correlations between
different input features, thereby obtaining superior learning capabilities
compared to traditional (i.e., non-capsule based) DNNs. However, designing
CapsNets using conventional methods is a tedious job and incurs significant
training effort. Recent studies have shown that powerful methods to
automatically select the best/optimal DNN model configuration for a given set
of applications and a training dataset are based on the Neural Architecture
Search (NAS) algorithms. Moreover, due to their extreme computational and
memory requirements, DNNs are employed using the specialized hardware
accelerators in IoT-Edge/CPS devices. In this paper, we propose NASCaps, an
automated framework for the hardware-aware NAS of different types of DNNs,
covering both traditional convolutional DNNs and CapsNets. We study the
efficacy of deploying a multi-objective Genetic Algorithm (e.g., based on the
NSGA-II algorithm). The proposed framework can jointly optimize the network
accuracy and the corresponding hardware efficiency, expressed in terms of
energy, memory, and latency of a given hardware accelerator executing the DNN
inference. Besides supporting the traditional DNN layers, our framework is the
first to model and supports the specialized capsule layers and dynamic routing
in the NAS-flow. We evaluate our framework on different datasets, generating
different network configurations, and demonstrate the tradeoffs between the
different output metrics. We will open-source the complete framework and
configurations of the Pareto-optimal architectures at
https://github.com/ehw-fit/nascaps.Comment: To appear at the IEEE/ACM International Conference on Computer-Aided
Design (ICCAD '20), November 2-5, 2020, Virtual Event, US
Auto-NBA: Efficient and Effective Search Over the Joint Space of Networks, Bitwidths, and Accelerators
While maximizing deep neural networks' (DNNs') acceleration efficiency
requires a joint search/design of three different yet highly coupled aspects,
including the networks, bitwidths, and accelerators, the challenges associated
with such a joint search have not yet been fully understood and addressed. The
key challenges include (1) the dilemma of whether to explode the memory
consumption due to the huge joint space or achieve sub-optimal designs, (2) the
discrete nature of the accelerator design space that is coupled yet different
from that of the networks and bitwidths, and (3) the chicken and egg problem
associated with network-accelerator co-search, i.e., co-search requires
operation-wise hardware cost, which is lacking during search as the optimal
accelerator depending on the whole network is still unknown during search. To
tackle these daunting challenges towards optimal and fast development of DNN
accelerators, we propose a framework dubbed Auto-NBA to enable jointly
searching for the Networks, Bitwidths, and Accelerators, by efficiently
localizing the optimal design within the huge joint design space for each
target dataset and acceleration specification. Our Auto-NBA integrates a
heterogeneous sampling strategy to achieve unbiased search with constant memory
consumption, and a novel joint-search pipeline equipped with a generic
differentiable accelerator search engine. Extensive experiments and ablation
studies validate that both Auto-NBA generated networks and accelerators
consistently outperform state-of-the-art designs (including
co-search/exploration techniques, hardware-aware NAS methods, and DNN
accelerators), in terms of search time, task accuracy, and accelerator
efficiency. Our codes are available at: https://github.com/RICE-EIC/Auto-NBA.Comment: Accepted at ICML 202
Swordfish: A Framework for Evaluating Deep Neural Network-based Basecalling using Computation-In-Memory with Non-Ideal Memristors
Basecalling, an essential step in many genome analysis studies, relies on
large Deep Neural Networks (DNNs) to achieve high accuracy. Unfortunately,
these DNNs are computationally slow and inefficient, leading to considerable
delays and resource constraints in the sequence analysis process. A
Computation-In-Memory (CIM) architecture using memristors can significantly
accelerate the performance of DNNs. However, inherent device non-idealities and
architectural limitations of such designs can greatly degrade the basecalling
accuracy, which is critical for accurate genome analysis. To facilitate the
adoption of memristor-based CIM designs for basecalling, it is important to (1)
conduct a comprehensive analysis of potential CIM architectures and (2) develop
effective strategies for mitigating the possible adverse effects of inherent
device non-idealities and architectural limitations.
This paper proposes Swordfish, a novel hardware/software co-design framework
that can effectively address the two aforementioned issues. Swordfish
incorporates seven circuit and device restrictions or non-idealities from
characterized real memristor-based chips. Swordfish leverages various
hardware/software co-design solutions to mitigate the basecalling accuracy loss
due to such non-idealities. To demonstrate the effectiveness of Swordfish, we
take Bonito, the state-of-the-art (i.e., accurate and fast), open-source
basecaller as a case study. Our experimental results using Sword-fish show that
a CIM architecture can realistically accelerate Bonito for a wide range of real
datasets by an average of 25.7x, with an accuracy loss of 6.01%.Comment: To appear in 56th IEEE/ACM International Symposium on
Microarchitecture (MICRO), 202