20,977 research outputs found
Modeling and visualizing networked multi-core embedded software energy consumption
In this report we present a network-level multi-core energy model and a
software development process workflow that allows software developers to
estimate the energy consumption of multi-core embedded programs. This work
focuses on a high performance, cache-less and timing predictable embedded
processor architecture, XS1. Prior modelling work is improved to increase
accuracy, then extended to be parametric with respect to voltage and frequency
scaling (VFS) and then integrated into a larger scale model of a network of
interconnected cores. The modelling is supported by enhancements to an open
source instruction set simulator to provide the first network timing aware
simulations of the target architecture. Simulation based modelling techniques
are combined with methods of results presentation to demonstrate how such work
can be integrated into a software developer's workflow, enabling the developer
to make informed, energy aware coding decisions. A set of single-,
multi-threaded and multi-core benchmarks are used to exercise and evaluate the
models and provide use case examples for how results can be presented and
interpreted. The models all yield accuracy within an average +/-5 % error
margin
Yield-driven power-delay-optimal CMOS full-adder design complying with automotive product specifications of PVT variations and NBTI degradations
We present the detailed results of the application of mathematical optimization algorithms to transistor sizing in a full-adder cell design, to obtain the maximum expected fabrication yield. The approach takes into account all the fabrication process parameter variations specified in an industrial PDK, in addition to operating condition range and NBTI aging. The final design solutions present transistor sizing, which depart from intuitive transistor sizing criteria and show dramatic yield improvements, which have been verified by Monte Carlo SPICE analysis
An Experimental Study of Reduced-Voltage Operation in Modern FPGAs for Neural Network Acceleration
We empirically evaluate an undervolting technique, i.e., underscaling the
circuit supply voltage below the nominal level, to improve the power-efficiency
of Convolutional Neural Network (CNN) accelerators mapped to Field Programmable
Gate Arrays (FPGAs). Undervolting below a safe voltage level can lead to timing
faults due to excessive circuit latency increase. We evaluate the
reliability-power trade-off for such accelerators. Specifically, we
experimentally study the reduced-voltage operation of multiple components of
real FPGAs, characterize the corresponding reliability behavior of CNN
accelerators, propose techniques to minimize the drawbacks of reduced-voltage
operation, and combine undervolting with architectural CNN optimization
techniques, i.e., quantization and pruning. We investigate the effect of
environmental temperature on the reliability-power trade-off of such
accelerators. We perform experiments on three identical samples of modern
Xilinx ZCU102 FPGA platforms with five state-of-the-art image classification
CNN benchmarks. This approach allows us to study the effects of our
undervolting technique for both software and hardware variability. We achieve
more than 3X power-efficiency (GOPs/W) gain via undervolting. 2.6X of this gain
is the result of eliminating the voltage guardband region, i.e., the safe
voltage region below the nominal level that is set by FPGA vendor to ensure
correct functionality in worst-case environmental and circuit conditions. 43%
of the power-efficiency gain is due to further undervolting below the
guardband, which comes at the cost of accuracy loss in the CNN accelerator. We
evaluate an effective frequency underscaling technique that prevents this
accuracy loss, and find that it reduces the power-efficiency gain from 43% to
25%.Comment: To appear at the DSN 2020 conferenc
A software controlled voltage tuning system using multi-purpose ring oscillators
This paper presents a novel software driven voltage tuning method that
utilises multi-purpose Ring Oscillators (ROs) to provide process variation and
environment sensitive energy reductions. The proposed technique enables voltage
tuning based on the observed frequency of the ROs, taken as a representation of
the device speed and used to estimate a safe minimum operating voltage at a
given core frequency. A conservative linear relationship between RO frequency
and silicon speed is used to approximate the critical path of the processor.
Using a multi-purpose RO not specifically implemented for critical path
characterisation is a unique approach to voltage tuning. The parameters
governing the relationship between RO and silicon speed are obtained through
the testing of a sample of processors from different wafer regions. These
parameters can then be used on all devices of that model. The tuning method and
software control framework is demonstrated on a sample of XMOS XS1-U8A-64
embedded microprocessors, yielding a dynamic power saving of up to 25% with no
performance reduction and no negative impact on the real-time constraints of
the embedded software running on the processor
- …