596 research outputs found
VLSI Implementation of Deep Neural Network Using Integral Stochastic Computing
The hardware implementation of deep neural networks (DNNs) has recently
received tremendous attention: many applications in fact require high-speed
operations that suit a hardware implementation. However, numerous elements and
complex interconnections are usually required, leading to a large area
occupation and copious power consumption. Stochastic computing has shown
promising results for low-power area-efficient hardware implementations, even
though existing stochastic algorithms require long streams that cause long
latencies. In this paper, we propose an integer form of stochastic computation
and introduce some elementary circuits. We then propose an efficient
implementation of a DNN based on integral stochastic computing. The proposed
architecture has been implemented on a Virtex7 FPGA, resulting in 45% and 62%
average reductions in area and latency compared to the best reported
architecture in literature. We also synthesize the circuits in a 65 nm CMOS
technology and we show that the proposed integral stochastic architecture
results in up to 21% reduction in energy consumption compared to the binary
radix implementation at the same misclassification rate. Due to fault-tolerant
nature of stochastic architectures, we also consider a quasi-synchronous
implementation which yields 33% reduction in energy consumption w.r.t. the
binary radix implementation without any compromise on performance.Comment: 11 pages, 12 figure
Hardware-Efficient Structure of the Accelerating Module for Implementation of Convolutional Neural Network Basic Operation
This paper presents a structural design of the hardware-efficient module for
implementation of convolution neural network (CNN) basic operation with reduced
implementation complexity. For this purpose we utilize some modification of the
Winograd minimal filtering method as well as computation vectorization
principles. This module calculate inner products of two consecutive segments of
the original data sequence, formed by a sliding window of length 3, with the
elements of a filter impulse response. The fully parallel structure of the
module for calculating these two inner products, based on the implementation of
a naive method of calculation, requires 6 binary multipliers and 4 binary
adders. The use of the Winograd minimal filtering method allows to construct a
module structure that requires only 4 binary multipliers and 8 binary adders.
Since a high-performance convolutional neural network can contain tens or even
hundreds of such modules, such a reduction can have a significant effect.Comment: 3 pages, 5 figure
Analog VLSI Implementation of Feed Forward Neural Network for Signal Processing
With the emergence of VLSI Technology in electronic industry, the numerous applications of integrated circuits in high-performance computing, consumer electronics, and telecommunications has been rising steadily, and at a very fast pace. Artificial intelligence is integral part of a neural network is based on mathematical equations and artificial neurons. The focus here is the implementation of the Neural Network Architecture (NNA) with on chip learning in analog VLSI for generic signal processing applications. The artificial neural network comprises of analog components like multipliers and adders along with the tan-sigmoid function generating circuit. The given architecture uses components such as Gilbert cell mixer (GCM), neuron activation function (NAF) to implement the functions an artificial neural network. With the balanced operation of the Gilbert cell clearer output is obtained by eliminating unwanted signals. The architecture is designed using 180nm CMOS/VLSI technology with Cadence virtuoso tool.
DOI: 10.17762/ijritcc2321-8169.150517
Hardware emulation of stochastic p-bits for invertible logic
The common feature of nearly all logic and memory devices is that they make
use of stable units to represent 0's and 1's. A completely different paradigm
is based on three-terminal stochastic units which could be called "p-bits",
where the output is a random telegraphic signal continuously fluctuating
between 0 and 1 with a tunable mean. p-bits can be interconnected to receive
weighted contributions from others in a network, and these weighted
contributions can be chosen to not only solve problems of optimization and
inference but also to implement precise Boolean functions in an inverted mode.
This inverted operation of Boolean gates is particularly striking: They provide
inputs consistent to a given output along with unique outputs to a given set of
inputs. The existing demonstrations of accurate invertible logic are
intriguing, but will these striking properties observed in computer simulations
carry over to hardware implementations? This paper uses individual micro
controllers to emulate p-bits, and we present results for a 4-bit ripple carry
adder with 48 p-bits and a 4-bit multiplier with 46 p-bits working in inverted
mode as a factorizer. Our results constitute a first step towards implementing
p-bits with nano devices, like stochastic Magnetic Tunnel Junctions
An instruction systolic array architecture for multiple neural network types
Modern electronic systems, especially sensor and imaging systems, are beginning to
incorporate their own neural network subsystems. In order for these neural systems to learn in
real-time they must be implemented using VLSI technology, with as much of the learning
processes incorporated on-chip as is possible. The majority of current VLSI implementations
literally implement a series of neural processing cells, which can be connected together in an
arbitrary fashion. Many do not perform the entire neural learning process on-chip, instead
relying on other external systems to carry out part of the computation requirements of the
algorithm.
The work presented here utilises two dimensional instruction systolic arrays in an attempt to
define a general neural architecture which is closer to the biological basis of neural networks - it
is the synapses themselves, rather than the neurons, that have dedicated processing units. A
unified architecture is described which can be programmed at the microcode level in order to
facilitate the processing of multiple neural network types.
An essential part of neural network processing is the neuron activation function, which can
range from a sequential algorithm to a discrete mathematical expression. The architecture
presented can easily carry out the sequential functions, and introduces a fast method of
mathematical approximation for the more complex functions. This can be evaluated on-chip,
thus implementing the entire neural process within a single system.
VHDL circuit descriptions for the chip have been generated, and the systolic processing
algorithms and associated microcode instruction set for three different neural paradigms have
been designed. A software simulator of the architecture has been written, giving results for
several common applications in the field
A Mixed-Signal Feed-Forward Neural Network Architecture Using A High-Resolution Multiplying D/A Conversion Method
Artificial Neural Networks (ANNs) are parallel processors capable of learning from a set of sample data using a specific learning rule. Such systems are commonly used in applications where human brain may surpass conventional computers such as image processing, speech/character recognition, intelligent control and robotics to name a few. In this thesis, a mixed-signal neural network architecture is proposed employs a high resolution Multiplying Digital to Analog Converter (MDAC) designed using Delta Sigma Modulation (DSM). To reduce chip are, multiplexing is used in addition to analog implementation of arithmetic operations. This work employs a new method for filtering the high bit-rate signals using neurons nonlinear transfer function already existing in the network. Therefore, a configuration of a few MOS transistors are replacing the large resistors required to implement the low-pass filter in the network. This configuration noticeably decreases the chip area and also makes multiplexing feasible for hardware implementation
- …