25 research outputs found

    Optimised OpenCL workgroup synthesis for hybrid ARM-FPGA devices

    Get PDF

    Energy Optimization of FPGA-Based Stream-Oriented Computing with Power Gating

    Get PDF

    Run-Time Power Modelling in Embedded GPUs with Dynamic Voltage and Frequency Scaling

    Full text link
    This paper investigates the application of a robust CPU-based power modelling methodology that performs an automatic search of explanatory events derived from performance counters to embedded GPUs. A 64-bit Tegra TX1 SoC is configured with DVFS enabled and multiple CUDA benchmarks are used to train and test models optimized for each frequency and voltage point. These optimized models are then compared with a simpler unified model that uses a single set of model coefficients for all frequency and voltage points of interest. To obtain this unified model, a number of experiments are conducted to extract information on idle, clock and static power to derive power usage from a single reference equation. The results show that the unified model offers competitive accuracy with an average 5\% error with four explanatory variables on the test data set and it is capable to correctly predict the impact of voltage, frequency and temperature on power consumption. This model could be used to replace direct power measurements when these are not available due to hardware limitations or worst-case analysis in emulation platforms

    Energy Optimization in Commercial FPGAs with Voltage, Frequency and Logic Scaling

    Get PDF
    This paper investigates the energy reductions possible in commercially available FPGAs configured to support voltage, frequency and logic scalability combined with power gating. Voltage and frequency scaling is based on in-situ detectors that allow the device to detect valid working voltage and frequency pairs at run-time while logic scalability is achieved with partial dynamic reconfiguration. The considered devices are FPGA-processor hybrids with independent power domains fabricated in 28 nm process nodes. The test case is based on a number of operational scenarios in which the FPGA side is loaded with a motion estimation core that can be configured with a variable number of execution units. The results demonstrate that voltage scalability reduces power by up to 60 percent compared with nominal voltage operation at the same frequency. The energy analysis show that the most energy efficiency core configuration depends on the performance requirements. A low performance scenario shows that serial computation is more energy efficient than the parallel configuration while the opposite is true when the performance requirements increase. An algorithm is proposed to combine effectively adaptive voltage/logic scaling and power gating in the proposed system and application

    Dynamic Energy Management of FPGA Accelerators in Embedded Systems

    Get PDF
    In this article, we investigate how to utilise an Field-Programmable Gate Array (FPGA) in an embedded system to save energy. For this purpose, we study the energy efficiency of a hybrid FPGA-CPU device that can switch task execution between hardware and software with a focus on periodic tasks. To increase the applicability of this task switching, we also consider the voltage and frequency scaling (VFS) applied to the FPGA to reduce the system energy consumption. We show that in some cases, if the task's period is higher than a specific level, the FPGA accelerator cannot reduce the energy consumption associated to the task and the software version is the most energy efficient option. We have applied the proposed techniques to a robot map creation algorithm as a case study which shows up to 38% energy reduction compared to the FPGA implementation. Overall, experimental results show up to 48% energy reduction by applying the proposed techniques at runtime on 13 individual tasks

    A Streaming Dataflow Engine for Sparse Matrix-Vector Multiplication using High-Level Synthesis

    Get PDF
    Using high-level synthesis techniques, this paper proposes an adaptable high-performance streaming dataflow engine for sparse matrix dense vector multiplication (SpMV) suitable for embedded FPGAs. As the SpMV is a memory-bound algorithm, this engine combines the three concepts of loop pipelining, dataflow graph, and data streaming to utilize most of the memory bandwidth available to the FPGA. The main goal of this paper is to show that FPGAs can provide comparable performance for memory-bound applications to that of the corresponding CPUs and GPUs but with significantly less energy consumption. The experimental results indicate that the FPGA provides higher performance compared to that of embedded GPUs for small and medium-size matrices by an average factor of 3.25 whereas the embedded GPU is faster for larger size matrices by an average factor of 1.58. In addition, the FPGA implementation is more energy efficient for the range of considered matrices by an average factor of 8.9 compared to the embedded CPU and GPU. A case study based on adapting the proposed SpMV optimization to accelerate the support vector machine (SVM) algorithm, one of the successful classification techniques in the machine learning literature, justifies the benefits of utilizing the proposed FPGA-based SpMV compared to that of the embedded CPU and GPU. The experimental results show that the FPGA is faster by an average factor of 1.7 and consumes less energy by an average factor of 6.8 compared to the GPU
    corecore