Search CORE

3 research outputs found

Optimizing FPGA-based CNN accelerator for energy efficiency with an extended Rooine model

Author: Ab Rahman A. A. H.
Ayat S. O.
Khalil-Hani M.
Publication venue: 'The Scientific and Technological Research Council of Turkey'
Publication date: 01/01/2018
Field of study

In recent years, the convolutional neural network (CNN) has found wide acceptance in solving practical computer vision and image recognition problems. Also recently, due to its exibility, faster development time, and energy efficiency, the field-programmable gate array (FPGA) has become an attractive solution to exploit the inherent parallelism in the feedforward process of the CNN. However, to meet the demands for high accuracy of today's practical recognition applications that typically have massive datasets, the sizes of CNNs have to be larger and deeper. Enlargement of the CNN aggravates the problem of off-chip memory bottleneck in the FPGA platform since there is not enough space to save large datasets on-chip. In this work, we propose a memory system architecture that best matches the off-chip memory traffic with the optimum throughput of the computation engine, while it operates at the maximum allowable frequency. With the help of an extended version of the Rooine model proposed in this work, we can estimate memory bandwidth utilization of the system at different operating frequencies since the proposed model considers operating frequency in addition to bandwidth utilization and throughput. In order to find the optimal solution that has the best energy efficiency, we make a trade-off between energy efficiency and computational throughput. This solution saves 18% of energy utilization with the trade-off having less than 2% reduction in throughput performance. We also propose to use a race-to-halt strategy to further improve the energy efficiency of the designed CNN accelerator. Experimental results show that our CNN accelerator can achieve a peak performance of 52.11 GFLOPS and energy efficiency of 10.02 GFLOPS/W on a ZYNQ ZC706 FPGA board running at 250 MHz, which outperforms most previous approaches

Universiti Teknologi Malaysia Institutional Repository

Energy and performance trade-off optimization in heterogeneous computing via reinforcement learning

Author: Abbasi QH
Abdulghani AM
Dashtipour K
Heidari H
Imran MA
Machado P
Yu Z
Zahid A
Publication venue: 'MDPI AG'
Publication date: 01/01/2020
Field of study

This paper suggests an optimisation approach in heterogeneous computing systems to balance energy power consumption and efficiency. The work proposes a power measurement utility for a reinforcement learning (PMU-RL) algorithm to dynamically adjust the resource utilisation of heterogeneous platforms in order to minimise power consumption. A reinforcement learning(RL) technique is applied to analyse and optimise the resource utilisation of field programmable gate array (FPGA) control state capabilities, which is built for a simulation environment with aXilinx ZYNQ multi-processor systems-on-chip (MPSoC) board. In this study, the balance operation mode for improving power consumption and performance is established to dynamically change the programmable logic (PL) end work state. It is based on an RL algorithm that can quickly discover the optimization effect of PL on different workloads to improve energy efficiency. The results demonstrate a substantial reduction of 18% in energy consumption without affecting the application’s performance. Thus, the proposed PMU-RL technique has the potential to be considered for other heterogeneous computing platforms

Multidisciplinary Digital Publishing Institute

Heriot Watt Pure

Nottingham Trent Institutional Repository (IRep)

Enlighten

Accurate power control and monitoring in ZYNQ boards

Author: Beldachi Arash Farhadi
Nunez-Yanez Jose L.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

Crossref

Explore Bristol Research