30 research outputs found
Hardware-Efficient Structure of the Accelerating Module for Implementation of Convolutional Neural Network Basic Operation
This paper presents a structural design of the hardware-efficient module for
implementation of convolution neural network (CNN) basic operation with reduced
implementation complexity. For this purpose we utilize some modification of the
Winograd minimal filtering method as well as computation vectorization
principles. This module calculate inner products of two consecutive segments of
the original data sequence, formed by a sliding window of length 3, with the
elements of a filter impulse response. The fully parallel structure of the
module for calculating these two inner products, based on the implementation of
a naive method of calculation, requires 6 binary multipliers and 4 binary
adders. The use of the Winograd minimal filtering method allows to construct a
module structure that requires only 4 binary multipliers and 8 binary adders.
Since a high-performance convolutional neural network can contain tens or even
hundreds of such modules, such a reduction can have a significant effect.Comment: 3 pages, 5 figure
An algorithm for discrete fractional Hadamard transform
We present a novel algorithm for calculating the discrete fractional Hadamard
transform for data vectors whose size N is a power of two. A direct method for
calculation of the discrete fractional Hadamard transform requires
multiplications, while in proposed algorithm the number of real multiplications
is reduced to log.Comment: 22 pages, 4 figure
Some Schemes for Implementation of Arithmetic Operations with Complex Numbers Using Squaring Units
In this paper, new schemes for a squarer, multiplier and divider of complex
numbers are proposed. Traditional structural solutions for each of these
operations require the presence some number of general-purpose binary
multipliers. The advantage of our solutions is a removing of multiplications
through replacing them by less costly squarers. We use Logan's trick and
quarter square technique, which propose to replace the calculation of the
product of two real numbers by summing the squares. Replacing usual multipliers
on digital squares implies reducing power consumption as well as decreases
hardware circuit complexity. The squarer requiring less area and power as
compared to general-purpose multiplier, it is interesting to assess the use of
squarers to implementation of complex arithmetic.Comment: 3 pages. 3 figures, 2 table
Hardware-Efficient Schemes of Quaternion Multiplying Units for 2D Discrete Quaternion Fourier Transform Processors
In this paper, we offer and discuss three efficient structural solutions for
the hardware-oriented implementation of discrete quaternion Fourier transform
basic operations with reduced implementation complexities. The first solution:
a scheme for calculating sq product, the second solution: a scheme for
calculating qt product, and the third solution: a scheme for calculating sqt
product, where s is a so-called i-quaternion, t is an j-quaternion, and q is an
usual quaternion. The direct multiplication of two usual quaternions requires
16 real multiplications (or two-operand multipliers in the case of fully
parallel hardware implementation) and 12 real additions (or binary adders). At
the same time, our solutions allow to design the computation units, which
consume only 6 multipliers plus 6 two input adders for implementation of sq or
qt basic operations and 9 binary multipliers plus 6 two-input adders and 4
four-input adders for implementation of sqt basic operation.Comment: 3 pages, 3 figure