3 research outputs found
Efficient Error-Tolerant Quantized Neural Network Accelerators
Neural Networks are currently one of the most widely deployed machine
learning algorithms. In particular, Convolutional Neural Networks (CNNs), are
gaining popularity and are evaluated for deployment in safety critical
applications such as self driving vehicles. Modern CNNs feature enormous memory
bandwidth and high computational needs, challenging existing hardware platforms
to meet throughput, latency and power requirements. Functional safety and error
tolerance need to be considered as additional requirement in safety critical
systems. In general, fault tolerant operation can be achieved by adding
redundancy to the system, which is further exacerbating the computational
demands. Furthermore, the question arises whether pruning and quantization
methods for performance scaling turn out to be counterproductive with regards
to fail safety requirements. In this work we present a methodology to evaluate
the impact of permanent faults affecting Quantized Neural Networks (QNNs) and
how to effectively decrease their effects in hardware accelerators. We use
FPGA-based hardware accelerated error injection, in order to enable the fast
evaluation. A detailed analysis is presented showing that QNNs containing
convolutional layers are by far not as robust to faults as commonly believed
and can lead to accuracy drops of up to 10%. To circumvent that, we propose two
different methods to increase their robustness: 1) selective channel
replication which adds significantly less redundancy than used by the common
triple modular redundancy and 2) a fault-aware scheduling of processing
elements for folded implementationsComment: 6 pages, 5 figure
Recommended from our members
Algorithm and Hardware Co-Design for Local/Edge Computing
Advances in VLSI manufacturing and design technology over the decades have created many computing paradigms for disparate computing needs. With concerns for transmission cost, security, latency of centralized computing, edge/local computing are increasingly prevalent in the faster growing sectors like Internet-of-Things (IoT) and other sectors that require energy/connectivity autonomous systems such as biomedical and industrial applications.
Energy and power efficient are the main design constraints in local and edge computing. While there exists a wide range of low power design techniques, they are often underutilized in custom circuit designs as the algorithms are developed independent of the hardware. Such compartmentalized design approach fails to take advantage of the many compatible algorithmic and hardware techniques that can improve the efficiency of the entire system. Algorithm hardware co-design is to explore the design space with whole stack awareness.
The main goal of the algorithm hardware co-design methodology is the enablement and improvement of small form factor edge and local VLSI systems operating under strict constraints of area and energy efficiency. This thesis presents selected works of application specific digital and mixed-signal integrated circuit designs. The application space ranges from implantable biomedical devices to edge machine learning acceleration