30 research outputs found

    Hardware-Efficient Structure of the Accelerating Module for Implementation of Convolutional Neural Network Basic Operation

    Full text link
    This paper presents a structural design of the hardware-efficient module for implementation of convolution neural network (CNN) basic operation with reduced implementation complexity. For this purpose we utilize some modification of the Winograd minimal filtering method as well as computation vectorization principles. This module calculate inner products of two consecutive segments of the original data sequence, formed by a sliding window of length 3, with the elements of a filter impulse response. The fully parallel structure of the module for calculating these two inner products, based on the implementation of a naive method of calculation, requires 6 binary multipliers and 4 binary adders. The use of the Winograd minimal filtering method allows to construct a module structure that requires only 4 binary multipliers and 8 binary adders. Since a high-performance convolutional neural network can contain tens or even hundreds of such modules, such a reduction can have a significant effect.Comment: 3 pages, 5 figure

    An algorithm for discrete fractional Hadamard transform

    Full text link
    We present a novel algorithm for calculating the discrete fractional Hadamard transform for data vectors whose size N is a power of two. A direct method for calculation of the discrete fractional Hadamard transform requires N2N^2 multiplications, while in proposed algorithm the number of real multiplications is reduced to NNlog2N_2N.Comment: 22 pages, 4 figure

    Some Schemes for Implementation of Arithmetic Operations with Complex Numbers Using Squaring Units

    Full text link
    In this paper, new schemes for a squarer, multiplier and divider of complex numbers are proposed. Traditional structural solutions for each of these operations require the presence some number of general-purpose binary multipliers. The advantage of our solutions is a removing of multiplications through replacing them by less costly squarers. We use Logan's trick and quarter square technique, which propose to replace the calculation of the product of two real numbers by summing the squares. Replacing usual multipliers on digital squares implies reducing power consumption as well as decreases hardware circuit complexity. The squarer requiring less area and power as compared to general-purpose multiplier, it is interesting to assess the use of squarers to implementation of complex arithmetic.Comment: 3 pages. 3 figures, 2 table

    Hardware-Efficient Schemes of Quaternion Multiplying Units for 2D Discrete Quaternion Fourier Transform Processors

    Full text link
    In this paper, we offer and discuss three efficient structural solutions for the hardware-oriented implementation of discrete quaternion Fourier transform basic operations with reduced implementation complexities. The first solution: a scheme for calculating sq product, the second solution: a scheme for calculating qt product, and the third solution: a scheme for calculating sqt product, where s is a so-called i-quaternion, t is an j-quaternion, and q is an usual quaternion. The direct multiplication of two usual quaternions requires 16 real multiplications (or two-operand multipliers in the case of fully parallel hardware implementation) and 12 real additions (or binary adders). At the same time, our solutions allow to design the computation units, which consume only 6 multipliers plus 6 two input adders for implementation of sq or qt basic operations and 9 binary multipliers plus 6 two-input adders and 4 four-input adders for implementation of sqt basic operation.Comment: 3 pages, 3 figure
    corecore