4 research outputs found

    Accelerating Deterministic and Stochastic Binarized Neural Networks on FPGAs Using OpenCL

    Full text link
    Recent technological advances have proliferated the available computing power, memory, and speed of modern Central Processing Units (CPUs), Graphics Processing Units (GPUs), and Field Programmable Gate Arrays (FPGAs). Consequently, the performance and complexity of Artificial Neural Networks (ANNs) is burgeoning. While GPU accelerated Deep Neural Networks (DNNs) currently offer state-of-the-art performance, they consume large amounts of power. Training such networks on CPUs is inefficient, as data throughput and parallel computation is limited. FPGAs are considered a suitable candidate for performance critical, low power systems, e.g. the Internet of Things (IOT) edge devices. Using the Xilinx SDAccel or Intel FPGA SDK for OpenCL development environment, networks described using the high-level OpenCL framework can be accelerated on heterogeneous platforms. Moreover, the resource utilization and power consumption of DNNs can be further enhanced by utilizing regularization techniques that binarize network weights. In this paper, we introduce, to the best of our knowledge, the first FPGA-accelerated stochastically binarized DNN implementations, and compare them to implementations accelerated using both GPUs and FPGAs. Our developed networks are trained and benchmarked using the popular MNIST and CIFAR-10 datasets, and achieve near state-of-the-art performance, while offering a >16-fold improvement in power consumption, compared to conventional GPU-accelerated networks. Both our FPGA-accelerated determinsitic and stochastic BNNs reduce inference times on MNIST and CIFAR-10 by >9.89x and >9.91x, respectively.Comment: 4 pages, 3 figures, 1 tabl

    FPGA Implementation of Double Precision Floating Point Multiplier

    Get PDF
    High speed computation is the need of today’s generation of Processors.  To accomplish this major task,  many functions  are implemented  inside the hardware  of the processor rather than  having  software  computing  the  same  task. Majority of the operations which the processor executes are Arithmetic operations which are widely used in many applications that require heavy mathematical operations such as scientific calculations, image and signal processing. Especially in the field of signal processing, multiplication division operation is widely used in many applications. The major issue with these operations in hardware is that much iteration is required which results in slow operation while fast algorithms require complex computations within each cycle. The result of a Division operation results in a either  in Quotient  and  Remainder  or a Floating  point  number  which is the  major reason  to  make it  more complex than  Multiplication  operation

    FPGA-Based Acceleration of the Self-Organizing Map (SOM) Algorithm using High-Level Synthesis

    Get PDF
    One of the fastest growing and the most demanding areas of computer science is Machine Learning (ML). Self-Organizing Map (SOM), categorized as unsupervised ML, is a popular data-mining algorithm widely used in Artificial Neural Network (ANN) for mapping high dimensional data into low dimensional feature maps. SOM, being computationally intensive, requires high computational time and power when dealing with large datasets. Acceleration of many computationally intensive algorithms can be achieved using Field-Programmable Gate Arrays (FPGAs) but it requires extensive hardware knowledge and longer development time when employing traditional Hardware Description Language (HDL) based design methodology. Open Computing Language (OpenCL) is a standard framework for writing parallel computing programs that execute on heterogeneous computing systems. Intel FPGA Software Development Kit for OpenCL (IFSO) is a High-Level Synthesis (HLS) tool that provides a more efficient alternative to HDL-based design. This research presents an optimized OpenCL implementation of SOM algorithm on Stratix V and Arria 10 FPGAs using IFSO. Compared to recent SOM implementations on Central Processing Unit (CPU) and Graphics Processing Unit (GPU), our OpenCL implementation on FPGAs provides superior speed performance and power consumption results. Stratix V achieves speedup of 1.41x - 16.55x compared to AMD and Intel CPU and 2.18x compared to Nvidia GPU whereas Arria 10 achieves speedup of 1.63x - 19.15x compared to AMD and Intel CPU and 2.52x compared to Nvidia GPU. In terms of power consumption, Stratix V is 35.53x and 42.53x whereas Arria 10 is 15.82x and 15.93x more power efficient compared to CPU and GPU respectively
    corecore