5,441 research outputs found

    Accelerating Deterministic and Stochastic Binarized Neural Networks on FPGAs Using OpenCL

    Full text link
    Recent technological advances have proliferated the available computing power, memory, and speed of modern Central Processing Units (CPUs), Graphics Processing Units (GPUs), and Field Programmable Gate Arrays (FPGAs). Consequently, the performance and complexity of Artificial Neural Networks (ANNs) is burgeoning. While GPU accelerated Deep Neural Networks (DNNs) currently offer state-of-the-art performance, they consume large amounts of power. Training such networks on CPUs is inefficient, as data throughput and parallel computation is limited. FPGAs are considered a suitable candidate for performance critical, low power systems, e.g. the Internet of Things (IOT) edge devices. Using the Xilinx SDAccel or Intel FPGA SDK for OpenCL development environment, networks described using the high-level OpenCL framework can be accelerated on heterogeneous platforms. Moreover, the resource utilization and power consumption of DNNs can be further enhanced by utilizing regularization techniques that binarize network weights. In this paper, we introduce, to the best of our knowledge, the first FPGA-accelerated stochastically binarized DNN implementations, and compare them to implementations accelerated using both GPUs and FPGAs. Our developed networks are trained and benchmarked using the popular MNIST and CIFAR-10 datasets, and achieve near state-of-the-art performance, while offering a >16-fold improvement in power consumption, compared to conventional GPU-accelerated networks. Both our FPGA-accelerated determinsitic and stochastic BNNs reduce inference times on MNIST and CIFAR-10 by >9.89x and >9.91x, respectively.Comment: 4 pages, 3 figures, 1 tabl

    Document Classification Systems in Heterogeneous Computing Environments

    Get PDF
    Datacenter workloads demand high throughput, low cost and power efficient solutions. In most data centers the operating costs dominates the infrastructure cost. The ever growing amounts of data and the critical need for higher throughput, more energy efficient document classification solutions motivated us to investigate alternatives to the traditional homogeneous CPU based implementations of document classification systems. Several heterogeneous systems were investigated in the past where CPUs were combined with GPUs and FPGAs as system accelerators. The increasing complexity of FPGAs made them an interesting device in the heterogeneous computing environments and on the other hand difficult to program using Hardware Description languages. We explore the trade-offs when using high level synthesis and low level synthesis when programming FPGAs. Using low level synthesis results in less hardware resource usage on FPGAs and also offers the higher throughput compared to using HLS tool. While using HLS tool different heterogeneous computing devices such as multicore CPU and GPU targeted. Through our implementation experience and empirical results for data centric applications, we conclude that we can achieve power efficient results for these set of applications by either using low level synthesis or high level synthesis for programming FPGAs

    High-speed, in-band performance measurement instrumentation for next generation IP networks

    Get PDF
    Facilitating always-on instrumentation of Internet traffic for the purposes of performance measurement is crucial in order to enable accountability of resource usage and automated network control, management and optimisation. This has proven infeasible to date due to the lack of native measurement mechanisms that can form an integral part of the networkā€Ÿs main forwarding operation. However, Internet Protocol version 6 (IPv6) specification enables the efficient encoding and processing of optional per-packet information as a native part of the network layer, and this constitutes a strong reason for IPv6 to be adopted as the ubiquitous next generation Internet transport. In this paper we present a very high-speed hardware implementation of in-line measurement, a truly native traffic instrumentation mechanism for the next generation Internet, which facilitates performance measurement of the actual data-carrying traffic at small timescales between two points in the network. This system is designed to operate as part of the routers' fast path and to incur an absolutely minimal impact on the network operation even while instrumenting traffic between the edges of very high capacity links. Our results show that the implementation can be easily accommodated by current FPGA technology, and real Internet traffic traces verify that the overhead incurred by instrumenting every packet over a 10 Gb/s operational backbone link carrying a typical workload is indeed negligible

    An FPGA-based real-time event sampler

    Get PDF
    This paper presents the design and FPGA-implementation of a sampler that is suited for sampling real-time events in embedded systems. Such sampling is useful, for example, to test whether real-time events are handled in time on such systems. By designing and implementing the sampler as a logic analyzer on an FPGA, several design parameters can be explored and easily modiļ¬ed to match the behavior of diļ¬€erent kinds of embedded systems. Moreover, the trade-off between price and performance becomes easy, as it mainly exists of choosing the appropriate type and speed grade of an FPGA family
    • ā€¦
    corecore