28,315 research outputs found
Automated Circuit Approximation Method Driven by Data Distribution
We propose an application-tailored data-driven fully automated method for
functional approximation of combinational circuits. We demonstrate how an
application-level error metric such as the classification accuracy can be
translated to a component-level error metric needed for an efficient and fast
search in the space of approximate low-level components that are used in the
application. This is possible by employing a weighted mean error distance
(WMED) metric for steering the circuit approximation process which is conducted
by means of genetic programming. WMED introduces a set of weights (calculated
from the data distribution measured on a selected signal in a given
application) determining the importance of each input vector for the
approximation process. The method is evaluated using synthetic benchmarks and
application-specific approximate MAC (multiply-and-accumulate) units that are
designed to provide the best trade-offs between the classification accuracy and
power consumption of two image classifiers based on neural networks.Comment: Accepted for publication at Design, Automation and Test in Europe
(DATE 2019). Florence, Ital
Learning to infer: RL-based search for DNN primitive selection on Heterogeneous Embedded Systems
Deep Learning is increasingly being adopted by industry for computer vision
applications running on embedded devices. While Convolutional Neural Networks'
accuracy has achieved a mature and remarkable state, inference latency and
throughput are a major concern especially when targeting low-cost and low-power
embedded platforms. CNNs' inference latency may become a bottleneck for Deep
Learning adoption by industry, as it is a crucial specification for many
real-time processes. Furthermore, deployment of CNNs across heterogeneous
platforms presents major compatibility issues due to vendor-specific technology
and acceleration libraries. In this work, we present QS-DNN, a fully automatic
search based on Reinforcement Learning which, combined with an inference engine
optimizer, efficiently explores through the design space and empirically finds
the optimal combinations of libraries and primitives to speed up the inference
of CNNs on heterogeneous embedded devices. We show that, an optimized
combination can achieve 45x speedup in inference latency on CPU compared to a
dependency-free baseline and 2x on average on GPGPU compared to the best vendor
library. Further, we demonstrate that, the quality of results and time
"to-solution" is much better than with Random Search and achieves up to 15x
better results for a short-time search
- …