Search CORE

25 research outputs found

Optimised OpenCL workgroup synthesis for hybrid ARM-FPGA devices

Author: Hosseinabady Mohammad
Nunez-Yanez Jose L
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 07/10/2015
Field of study

Crossref

Explore Bristol Research

Sparse and dense matrix multiplication hardware for heterogeneous multi-precision neural networks

Author: Hosseinabady Mohammad
Nunez-Yanez Jose L
Publication venue: 'Elsevier BV'
Publication date: 02/11/2021
Field of study

Explore Bristol Research

Energy Optimization of FPGA-Based Stream-Oriented Computing with Power Gating

Author: Hosseinabady Mohammad
Nunez-Yanez Jose
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2015
Field of study

Crossref

Explore Bristol Research

Run-Time Power Modelling in Embedded GPUs with Dynamic Voltage and Frequency Scaling

Author: Eder Kerstin
Hosseinabady Mohammad
Nikov Kris
Nunez-Yanez Jose
Publication venue
Publication date: 21/01/2020
Field of study

This paper investigates the application of a robust CPU-based power modelling methodology that performs an automatic search of explanatory events derived from performance counters to embedded GPUs. A 64-bit Tegra TX1 SoC is configured with DVFS enabled and multiple CUDA benchmarks are used to train and test models optimized for each frequency and voltage point. These optimized models are then compared with a simpler unified model that uses a single set of model coefficients for all frequency and voltage points of interest. To obtain this unified model, a number of experiments are conducted to extract information on idle, clock and static power to derive power usage from a single reference equation. The results show that the unified model offers competitive accuracy with an average 5\% error with four explanatory variables on the test data set and it is capable to correctly predict the impact of voltage, frequency and temperature on power consumption. This model could be used to replace direct power measurements when these are not available due to hardware limitations or worst-case analysis in emulation platforms

arXiv.org e-Print Archive

Explore Bristol Research

Energy Optimization in Commercial FPGAs with Voltage, Frequency and Logic Scaling

Author: Farhadi Beldachi Arash F
Hosseinabady Mohammad
Nunez-Yanez Jose L
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2016
Field of study

This paper investigates the energy reductions possible in commercially available FPGAs configured to support voltage, frequency and logic scalability combined with power gating. Voltage and frequency scaling is based on in-situ detectors that allow the device to detect valid working voltage and frequency pairs at run-time while logic scalability is achieved with partial dynamic reconfiguration. The considered devices are FPGA-processor hybrids with independent power domains fabricated in 28 nm process nodes. The test case is based on a number of operational scenarios in which the FPGA side is loaded with a motion estimation core that can be configured with a variable number of execution units. The results demonstrate that voltage scalability reduces power by up to 60 percent compared with nominal voltage operation at the same frequency. The energy analysis show that the most energy efficiency core configuration depends on the performance requirements. A low performance scenario shows that serial computation is more energy efficient than the parallel configuration while the opposite is true when the performance requirements increase. An algorithm is proposed to combine effectively adaptive voltage/logic scaling and power gating in the proposed system and application

Crossref

UWE Bristol Research Repository

Explore Bristol Research

Dynamic Energy Management of FPGA Accelerators in Embedded Systems

Author: Hosseinabady Mohammad
Nunez-Yanez Jose
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 22/05/2018
Field of study

In this article, we investigate how to utilise an Field-Programmable Gate Array (FPGA) in an embedded system to save energy. For this purpose, we study the energy efficiency of a hybrid FPGA-CPU device that can switch task execution between hardware and software with a focus on periodic tasks. To increase the applicability of this task switching, we also consider the voltage and frequency scaling (VFS) applied to the FPGA to reduce the system energy consumption. We show that in some cases, if the task's period is higher than a specific level, the FPGA accelerator cannot reduce the energy consumption associated to the task and the software version is the most energy efficient option. We have applied the proposed techniques to a robot map creation algorithm as a case study which shows up to 38% energy reduction compared to the FPGA implementation. Overall, experimental results show up to 48% energy reduction by applying the proposed techniques at runtime on 13 individual tasks

Crossref

UWE Bristol Research Repository

Explore Bristol Research

A Streaming Dataflow Engine for Sparse Matrix-Vector Multiplication using High-Level Synthesis

Author: Hosseinabady Mohammad
Nunez-Yanez Jose
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 23/04/2019
Field of study

Using high-level synthesis techniques, this paper proposes an adaptable high-performance streaming dataflow engine for sparse matrix dense vector multiplication (SpMV) suitable for embedded FPGAs. As the SpMV is a memory-bound algorithm, this engine combines the three concepts of loop pipelining, dataflow graph, and data streaming to utilize most of the memory bandwidth available to the FPGA. The main goal of this paper is to show that FPGAs can provide comparable performance for memory-bound applications to that of the corresponding CPUs and GPUs but with significantly less energy consumption. The experimental results indicate that the FPGA provides higher performance compared to that of embedded GPUs for small and medium-size matrices by an average factor of 3.25 whereas the embedded GPU is faster for larger size matrices by an average factor of 1.58. In addition, the FPGA implementation is more energy efficient for the range of considered matrices by an average factor of 8.9 compared to the embedded CPU and GPU. A case study based on adapting the proposed SpMV optimization to accelerate the support vector machine (SVM) algorithm, one of the successful classification techniques in the machine learning literature, justifies the benefits of utilizing the proposed FPGA-based SpMV compared to that of the embedded CPU and GPU. The experimental results show that the FPGA is faster by an average factor of 1.7 and consumes less energy by an average factor of 6.8 compared to the GPU

UWE Bristol Research Repository

Explore Bristol Research

Pipelined Streaming Computation of Histogram in FPGA OpenCL

Author: Hosseinabady Mohammad
Nunez-Yanez Jose Luis
Publication venue: 'IOS Press'
Publication date: 07/03/2018
Field of study

Explore Bristol Research