6 research outputs found
VLSI Implementation of Deep Neural Network Using Integral Stochastic Computing
The hardware implementation of deep neural networks (DNNs) has recently
received tremendous attention: many applications in fact require high-speed
operations that suit a hardware implementation. However, numerous elements and
complex interconnections are usually required, leading to a large area
occupation and copious power consumption. Stochastic computing has shown
promising results for low-power area-efficient hardware implementations, even
though existing stochastic algorithms require long streams that cause long
latencies. In this paper, we propose an integer form of stochastic computation
and introduce some elementary circuits. We then propose an efficient
implementation of a DNN based on integral stochastic computing. The proposed
architecture has been implemented on a Virtex7 FPGA, resulting in 45% and 62%
average reductions in area and latency compared to the best reported
architecture in literature. We also synthesize the circuits in a 65 nm CMOS
technology and we show that the proposed integral stochastic architecture
results in up to 21% reduction in energy consumption compared to the binary
radix implementation at the same misclassification rate. Due to fault-tolerant
nature of stochastic architectures, we also consider a quasi-synchronous
implementation which yields 33% reduction in energy consumption w.r.t. the
binary radix implementation without any compromise on performance.Comment: 11 pages, 12 figure
Heterogeneous computing system with field programmable gate array coprocessor for decision tree learning
U ovom radu prikazan je heterogeni računalni sustav i novi hibridni algoritam za učenje stabla odluke Dataflow decision tree construction – DF‑DTC. Algoritam DF‑DTC zasnovan je na algoritmu C4.5. Heterogeni sustav sadrži koprocesor izveden programirljivim poljem logičkih elemenata (FPGA, engl. field programmable gate array). Razrada arhitekture koprocesora i hibridnog algoritma DF‑DTC provedena je metodologijom programsko-sklopovskog suobliokovanja. U koprocesoru je izvedena obrada nominalnih atributa skupa za učenje, a u algoritam su uvedene prilagodbe podatkovnih struktura, te podrška za višedtretveno izvođenje. Vrednovanje performansi provedeno je mjerenjem ukupnog vremena izvršavanja rada programa, te mjerenjem vremena izvršavanja ključnih dijelova algoritma. Pri vrednovanju su korišteni sintetički skupovi za učenje, te skupovi za učenje javno dostupni na UCI repozitoriju. Performanse DF‑DTC-a uspoređene su s performansama postojeće programske implementacije algoritma EC4.5. Ubrzanje obrade nominalnih atributa na DF‑DTC-u iznosi u prosjeku 3, 00 puta u usporedbi s programskom implementacijom EC4.5. Za cjelokupno izvršavanje programa najbolje ubrzanje iznosi 1, 18 puta. Izvedba DF‑DTC-a za pokazala je potencijal FPGA-a kao platforme za ubrzanje učenja stabla odluke