16 research outputs found
A Comprehensive and Accurate Energy Model for Arm's Cortex-M0 Processor
Energy modeling can enable energy-aware software development and assist the
developer in meeting an application's energy budget. Although many energy
models for embedded processors exist, most do not account for
processor-specific configurations, neither are they suitable for static energy
consumption estimation. This paper introduces a comprehensive energy model for
Arm's Cortex-M0 processor, ready to support energy-aware development of edge
computing applications using either profiling- or static-analysis-based energy
consumption estimation. The model accounts for the Frequency, PreFetch, and
WaitState processor configurations which all have a significant impact on the
execution time and energy consumption of edge computing applications. All
models have a prediction error of less than 5%.Comment: 10 pages, 1 figure, 2 table
Run-Time Power Modelling in Embedded GPUs with Dynamic Voltage and Frequency Scaling
This paper investigates the application of a robust CPU-based power modelling
methodology that performs an automatic search of explanatory events derived
from performance counters to embedded GPUs. A 64-bit Tegra TX1 SoC is
configured with DVFS enabled and multiple CUDA benchmarks are used to train and
test models optimized for each frequency and voltage point. These optimized
models are then compared with a simpler unified model that uses a single set of
model coefficients for all frequency and voltage points of interest. To obtain
this unified model, a number of experiments are conducted to extract
information on idle, clock and static power to derive power usage from a single
reference equation. The results show that the unified model offers competitive
accuracy with an average 5\% error with four explanatory variables on the test
data set and it is capable to correctly predict the impact of voltage,
frequency and temperature on power consumption. This model could be used to
replace direct power measurements when these are not available due to hardware
limitations or worst-case analysis in emulation platforms
EnergyAnalyzer: Using Static WCET Analysis Techniques to Estimate the Energy Consumption of Embedded Applications
This paper presents EnergyAnalyzer, a code-level static analysis tool for
estimating the energy consumption of embedded software based on statically
predictable hardware events. The tool utilises techniques usually used for
worst-case execution time (WCET) analysis together with bespoke energy models
developed for two predictable architectures - the ARM Cortex-M0 and the Gaisler
LEON3 - to perform energy usage analysis. EnergyAnalyzer has been applied in
various use cases, such as selecting candidates for an optimised convolutional
neural network, analysing the energy consumption of a camera pill prototype,
and analysing the energy consumption of satellite communications software. The
tool was developed as part of a larger project called TeamPlay, which aimed to
provide a toolchain for developing embedded applications where energy
properties are first-class citizens, allowing the developer to reflect directly
on these properties at the source code level. The analysis capabilities of
EnergyAnalyzer are validated across a large number of benchmarks for the two
target architectures and the results show that the statically estimated energy
consumption has, with a few exceptions, less than 1% difference compared to the
underlying empirical energy models which have been validated on real hardware
High-Performance Simultaneous Multiprocessing for Heterogeneous System-on-Chip
This paper presents a methodology for simultaneous heterogeneous computing,
named ENEAC, where a quad core ARM Cortex-A53 CPU works in tandem with a
preprogrammed on-board FPGA accelerator. A heterogeneous scheduler distributes
the tasks optimally among all the resources and all compute units run
asynchronously, which allows for improved performance for irregular workloads.
ENEAC achieves up to 17\% performance improvement \ignore{and 14\% energy usage
reduction,} when using all platform resources compared to just using the FPGA
accelerators and up to 865\% performance increase \ignore{and up to 89\% energy
usage decrease} when using just the CPU. The workflow uses existing commercial
tools and C/C++ as a single programming language for both accelerator design
and CPU programming for improved productivity and ease of verification.Comment: 7 pages, 5 figures, 1 table Presented at the 13th International
Workshop on Programmability and Architectures for Heterogeneous Multicores,
2020 (arXiv:2005.07619
Lightweight asynchronous scheduling in heterogeneous reconfigurable systems
The trend for heterogeneous embedded systems is the integration of accelerators and general-purpose CPU cores on the same die. In these integrated architectures, like the Zynq UltraScale+ board (CPU+FPGA) that we target in this work, hardware support for shared memory and low-overhead synchronization between the accelerator and the CPU cores make the case for exploring strategies that exploit a tight collaboration between the CPUs and the accelerator. In this paper we propose a novel lightweight scheduling strategy, FastFit, targeted to FPGA accelerators, and a new scheduler based on it, named MultiFastFit, which asynchronously tackles heterogeneous systems comprised of a variety of CPU cores and FPGA IPs. Our strategy significantly reduces the overhead to automatically compute the near-optimal chunksizes when compared to a previous state-of-the-art auto-tuned approach, which makes our approach more suitable for fine-grained applications. Additionally, our scheduler MultiFastFit has been designed to enable the efficient co-execution of work among compute devices in such a way that all the devices are busy while minimizing the load unbalance.
Our approaches have been evaluated using four benchmarks carefully tuned for the low-power UltraScale+ platform. Our experiments demonstrate that the FastFit strategy always finds the near-optimal FPGA chunksize for any device configuration at a reasonable cost, even for fine-grained and irregular applications, and that heterogeneous CPU+FPGA co-executions that exploit all the compute devices are usually faster and more energy efficient than the CPU-only and FPGA-only executions. We have also compared MultiFastFit with other state-of-the-art scheduling strategies, finding that it outperforms other auto-tuned approach up to 2x and it achieves similar results to manually-tuned schedulers without requiring an offline search of the ideal CPU-FPGA partition or FPGA chunk granularity.This work was partially supported by the Spanish projects PID2019-105396RB-I00, UMA18-FEDERJA-108, and UK EPSRC projects ENEAC (EP/N002539/1), HOPWARE (EP/V040863/1) and RS MINET (INF\R2\192044). Funding for open access charge: Universidad de Málaga / CBUA
The TeamPlay project : analysing and optimising time, energy, and security for cyber-physical systems
Funding: This work was supported by the EU Horizon-2020 project TeamPlay (https://www.teamplay-h2020.eu), grant #779882.Non-functional properties, such as energy, time, and security (ETS) are becoming increasingly important for the programming of Cyber-Physical Systems (CPS). This paper describes TeamPlay, a research project funded under the EU Horizon 2020 programme between January 2018 and June 2021.TeamPlay aimed to provide the system designer with a toolchain for developing embedded applications where ETS properties are first-class citizens, allowing the developer to reflect directly on energy, time and security properties at the source code level. In this paper we give an overview of the TeamPlay methodology, introduce the challenges and solutions of our approach and summarise the results achieved. Overall, applying our TeamPlay methodology led to an improvement of up to 18% performance and 52% energy usage over traditional approaches.Postprin