2,267 research outputs found
The Effects of Approximate Multiplication on Convolutional Neural Networks
This paper analyzes the effects of approximate multiplication when performing
inferences on deep convolutional neural networks (CNNs). The approximate
multiplication can reduce the cost of the underlying circuits so that CNN
inferences can be performed more efficiently in hardware accelerators. The
study identifies the critical factors in the convolution, fully-connected, and
batch normalization layers that allow more accurate CNN predictions despite the
errors from approximate multiplication. The same factors also provide an
arithmetic explanation of why bfloat16 multiplication performs well on CNNs.
The experiments are performed with recognized network architectures to show
that the approximate multipliers can produce predictions that are nearly as
accurate as the FP32 references, without additional training. For example, the
ResNet and Inception-v4 models with Mitch-6 multiplication produces Top-5
errors that are within 0.2% compared to the FP32 references. A brief cost
comparison of Mitch-6 against bfloat16 is presented, where a MAC operation
saves up to 80% of energy compared to the bfloat16 arithmetic. The most
far-reaching contribution of this paper is the analytical justification that
multiplications can be approximated while additions need to be exact in CNN MAC
operations.Comment: 12 pages, 11 figures, 4 tables, accepted for publication in the IEEE
Transactions on Emerging Topics in Computin
Approximate Computing Survey, Part II: Application-Specific & Architectural Approximation Techniques and Applications
The challenging deployment of compute-intensive applications from domains
such Artificial Intelligence (AI) and Digital Signal Processing (DSP), forces
the community of computing systems to explore new design approaches.
Approximate Computing appears as an emerging solution, allowing to tune the
quality of results in the design of a system in order to improve the energy
efficiency and/or performance. This radical paradigm shift has attracted
interest from both academia and industry, resulting in significant research on
approximation techniques and methodologies at different design layers (from
system down to integrated circuits). Motivated by the wide appeal of
Approximate Computing over the last 10 years, we conduct a two-part survey to
cover key aspects (e.g., terminology and applications) and review the
state-of-the art approximation techniques from all layers of the traditional
computing stack. In Part II of our survey, we classify and present the
technical details of application-specific and architectural approximation
techniques, which both target the design of resource-efficient
processors/accelerators & systems. Moreover, we present a detailed analysis of
the application spectrum of Approximate Computing and discuss open challenges
and future directions.Comment: Under Review at ACM Computing Survey
A Survey on Approximate Multiplier Designs for Energy Efficiency: From Algorithms to Circuits
Given the stringent requirements of energy efficiency for Internet-of-Things
edge devices, approximate multipliers, as a basic component of many processors
and accelerators, have been constantly proposed and studied for decades,
especially in error-resilient applications. The computation error and energy
efficiency largely depend on how and where the approximation is introduced into
a design. Thus, this article aims to provide a comprehensive review of the
approximation techniques in multiplier designs ranging from algorithms and
architectures to circuits. We have implemented representative approximate
multiplier designs in each category to understand the impact of the design
techniques on accuracy and efficiency. The designs can then be effectively
deployed in high-level applications, such as machine learning, to gain energy
efficiency at the cost of slight accuracy loss.Comment: 38 pages, 37 figure
Design Space Exploration of Neural Network Activation Function Circuits
The widespread application of artificial neural networks has prompted
researchers to experiment with FPGA and customized ASIC designs to speed up
their computation. These implementation efforts have generally focused on
weight multiplication and signal summation operations, and less on activation
functions used in these applications. Yet, efficient hardware implementations
of nonlinear activation functions like Exponential Linear Units (ELU), Scaled
Exponential Linear Units (SELU), and Hyperbolic Tangent (tanh), are central to
designing effective neural network accelerators, since these functions require
lots of resources. In this paper, we explore efficient hardware implementations
of activation functions using purely combinational circuits, with a focus on
two widely used nonlinear activation functions, i.e., SELU and tanh. Our
experiments demonstrate that neural networks are generally insensitive to the
precision of the activation function. The results also prove that the proposed
combinational circuit-based approach is very efficient in terms of speed and
area, with negligible accuracy loss on the MNIST, CIFAR-10 and IMAGENET
benchmarks. Synopsys Design Compiler synthesis results show that circuit
designs for tanh and SELU can save between 3.13-7.69 and 4.45-8:45 area
compared to the LUT/memory-based implementations, and can operate at 5.14GHz
and 4.52GHz using the 28nm SVT library, respectively. The implementation is
available at: https://github.com/ThomasMrY/ActivationFunctionDemo.Comment: 5 pages, 5 figures, 16 conferenc
Number Systems for Deep Neural Network Architectures: A Survey
Deep neural networks (DNNs) have become an enabling component for a myriad of
artificial intelligence applications. DNNs have shown sometimes superior
performance, even compared to humans, in cases such as self-driving, health
applications, etc. Because of their computational complexity, deploying DNNs in
resource-constrained devices still faces many challenges related to computing
complexity, energy efficiency, latency, and cost. To this end, several research
directions are being pursued by both academia and industry to accelerate and
efficiently implement DNNs. One important direction is determining the
appropriate data representation for the massive amount of data involved in DNN
processing. Using conventional number systems has been found to be sub-optimal
for DNNs. Alternatively, a great body of research focuses on exploring suitable
number systems. This article aims to provide a comprehensive survey and
discussion about alternative number systems for more efficient representations
of DNN data. Various number systems (conventional/unconventional) exploited for
DNNs are discussed. The impact of these number systems on the performance and
hardware design of DNNs is considered. In addition, this paper highlights the
challenges associated with each number system and various solutions that are
proposed for addressing them. The reader will be able to understand the
importance of an efficient number system for DNN, learn about the widely used
number systems for DNN, understand the trade-offs between various number
systems, and consider various design aspects that affect the impact of number
systems on DNN performance. In addition, the recent trends and related research
opportunities will be highlightedComment: 28 page
- …