6 research outputs found

    Hardware-Efficient Structure of the Accelerating Module for Implementation of Convolutional Neural Network Basic Operation

    Full text link
    This paper presents a structural design of the hardware-efficient module for implementation of convolution neural network (CNN) basic operation with reduced implementation complexity. For this purpose we utilize some modification of the Winograd minimal filtering method as well as computation vectorization principles. This module calculate inner products of two consecutive segments of the original data sequence, formed by a sliding window of length 3, with the elements of a filter impulse response. The fully parallel structure of the module for calculating these two inner products, based on the implementation of a naive method of calculation, requires 6 binary multipliers and 4 binary adders. The use of the Winograd minimal filtering method allows to construct a module structure that requires only 4 binary multipliers and 8 binary adders. Since a high-performance convolutional neural network can contain tens or even hundreds of such modules, such a reduction can have a significant effect.Comment: 3 pages, 5 figure

    Field programmable gate array based sigmoid function implementation using differential lookup table and second order nonlinear function

    Get PDF
    Artificial neural network (ANN) is an established artificial intelligence technique that is widely used for solving numerous problems such as classification and clustering in various fields. However, the major problem with ANN is a factor of time. ANN takes a longer time to execute a huge number of neurons. In order to overcome this, ANN is implemented into hardware namely field-programmable-gate-array (FPGA). However, implementing the ANN into a field-programmable gate array (FPGA) has led to a new problem related to the sigmoid function implementation. Often used as the activation function for ANN, a sigmoid function cannot be directly implemented in FPGA. Owing to its accuracy, the lookup table (LUT) has always been used to implement the sigmoid function in FPGA. In this case, obtaining the high accuracy of LUT is expensive particularly in terms of its memory requirements in FPGA. Second-order nonlinear function (SONF) is an appealing replacement for LUT due to its small memory requirement. Although there is a trade-off between accuracy and memory size. Taking the advantage of the aforementioned approaches, this thesis proposed a combination of SONF and a modified LUT namely differential lookup table (dLUT). The deviation values between SONF and sigmoid function are used to create the dLUT. SONF is used as the first step to approximate the sigmoid function. Then it is followed by adding or deducting with the value that has been stored in the dLUT as a second step as demonstrated via simulation. This combination has successfully reduced the deviation value. The reduction value is significant as compared to previous implementations such as SONF, and LUT itself. Further simulation has been carried out to evaluate the accuracy of the ANN in detecting the object in an indoor environment by using the proposed method as a sigmoid function. The result has proven that the proposed method has produced the output almost as accurately as software implementation in detecting the target in indoor positioning problems. Therefore, the proposed method can be applied in any field that demands higher processing and high accuracy in sigmoid function outpu

    Heterogeneous computing system with field programmable gate array coprocessor for decision tree learning

    Get PDF
    U ovom radu prikazan je heterogeni računalni sustav i novi hibridni algoritam za učenje stabla odluke Dataflow decision tree construction – DF‑DTC. Algoritam DF‑DTC zasnovan je na algoritmu C4.5. Heterogeni sustav sadrži koprocesor izveden programirljivim poljem logičkih elemenata (FPGA, engl. field programmable gate array). Razrada arhitekture koprocesora i hibridnog algoritma DF‑DTC provedena je metodologijom programsko-sklopovskog suobliokovanja. U koprocesoru je izvedena obrada nominalnih atributa skupa za učenje, a u algoritam su uvedene prilagodbe podatkovnih struktura, te podrška za višedtretveno izvođenje. Vrednovanje performansi provedeno je mjerenjem ukupnog vremena izvršavanja rada programa, te mjerenjem vremena izvršavanja ključnih dijelova algoritma. Pri vrednovanju su korišteni sintetički skupovi za učenje, te skupovi za učenje javno dostupni na UCI repozitoriju. Performanse DF‑DTC-a uspoređene su s performansama postojeće programske implementacije algoritma EC4.5. Ubrzanje obrade nominalnih atributa na DF‑DTC-u iznosi u prosjeku 3, 00 puta u usporedbi s programskom implementacijom EC4.5. Za cjelokupno izvršavanje programa najbolje ubrzanje iznosi 1, 18 puta. Izvedba DF‑DTC-a za pokazala je potencijal FPGA-a kao platforme za ubrzanje učenja stabla odluke
    corecore