23,184 research outputs found
An Experimental Study of Reduced-Voltage Operation in Modern FPGAs for Neural Network Acceleration
We empirically evaluate an undervolting technique, i.e., underscaling the
circuit supply voltage below the nominal level, to improve the power-efficiency
of Convolutional Neural Network (CNN) accelerators mapped to Field Programmable
Gate Arrays (FPGAs). Undervolting below a safe voltage level can lead to timing
faults due to excessive circuit latency increase. We evaluate the
reliability-power trade-off for such accelerators. Specifically, we
experimentally study the reduced-voltage operation of multiple components of
real FPGAs, characterize the corresponding reliability behavior of CNN
accelerators, propose techniques to minimize the drawbacks of reduced-voltage
operation, and combine undervolting with architectural CNN optimization
techniques, i.e., quantization and pruning. We investigate the effect of
environmental temperature on the reliability-power trade-off of such
accelerators. We perform experiments on three identical samples of modern
Xilinx ZCU102 FPGA platforms with five state-of-the-art image classification
CNN benchmarks. This approach allows us to study the effects of our
undervolting technique for both software and hardware variability. We achieve
more than 3X power-efficiency (GOPs/W) gain via undervolting. 2.6X of this gain
is the result of eliminating the voltage guardband region, i.e., the safe
voltage region below the nominal level that is set by FPGA vendor to ensure
correct functionality in worst-case environmental and circuit conditions. 43%
of the power-efficiency gain is due to further undervolting below the
guardband, which comes at the cost of accuracy loss in the CNN accelerator. We
evaluate an effective frequency underscaling technique that prevents this
accuracy loss, and find that it reduces the power-efficiency gain from 43% to
25%.Comment: To appear at the DSN 2020 conferenc
Deep Feature-based Face Detection on Mobile Devices
We propose a deep feature-based face detector for mobile devices to detect
user's face acquired by the front facing camera. The proposed method is able to
detect faces in images containing extreme pose and illumination variations as
well as partial faces. The main challenge in developing deep feature-based
algorithms for mobile devices is the constrained nature of the mobile platform
and the non-availability of CUDA enabled GPUs on such devices. Our
implementation takes into account the special nature of the images captured by
the front-facing camera of mobile devices and exploits the GPUs present in
mobile devices without CUDA-based frameorks, to meet these challenges.Comment: ISBA 201
Local Motion Planner for Autonomous Navigation in Vineyards with a RGB-D Camera-Based Algorithm and Deep Learning Synergy
With the advent of agriculture 3.0 and 4.0, researchers are increasingly
focusing on the development of innovative smart farming and precision
agriculture technologies by introducing automation and robotics into the
agricultural processes. Autonomous agricultural field machines have been
gaining significant attention from farmers and industries to reduce costs,
human workload, and required resources. Nevertheless, achieving sufficient
autonomous navigation capabilities requires the simultaneous cooperation of
different processes; localization, mapping, and path planning are just some of
the steps that aim at providing to the machine the right set of skills to
operate in semi-structured and unstructured environments. In this context, this
study presents a low-cost local motion planner for autonomous navigation in
vineyards based only on an RGB-D camera, low range hardware, and a dual layer
control algorithm. The first algorithm exploits the disparity map and its depth
representation to generate a proportional control for the robotic platform.
Concurrently, a second back-up algorithm, based on representations learning and
resilient to illumination variations, can take control of the machine in case
of a momentaneous failure of the first block. Moreover, due to the double
nature of the system, after initial training of the deep learning model with an
initial dataset, the strict synergy between the two algorithms opens the
possibility of exploiting new automatically labeled data, coming from the
field, to extend the existing model knowledge. The machine learning algorithm
has been trained and tested, using transfer learning, with acquired images
during different field surveys in the North region of Italy and then optimized
for on-device inference with model pruning and quantization. Finally, the
overall system has been validated with a customized robot platform in the
relevant environment
Deep Learning in the Automotive Industry: Applications and Tools
Deep Learning refers to a set of machine learning techniques that utilize
neural networks with many hidden layers for tasks, such as image
classification, speech recognition, language understanding. Deep learning has
been proven to be very effective in these domains and is pervasively used by
many Internet services. In this paper, we describe different automotive uses
cases for deep learning in particular in the domain of computer vision. We
surveys the current state-of-the-art in libraries, tools and infrastructures
(e.\,g.\ GPUs and clouds) for implementing, training and deploying deep neural
networks. We particularly focus on convolutional neural networks and computer
vision use cases, such as the visual inspection process in manufacturing plants
and the analysis of social media data. To train neural networks, curated and
labeled datasets are essential. In particular, both the availability and scope
of such datasets is typically very limited. A main contribution of this paper
is the creation of an automotive dataset, that allows us to learn and
automatically recognize different vehicle properties. We describe an end-to-end
deep learning application utilizing a mobile app for data collection and
process support, and an Amazon-based cloud backend for storage and training.
For training we evaluate the use of cloud and on-premises infrastructures
(including multiple GPUs) in conjunction with different neural network
architectures and frameworks. We assess both the training times as well as the
accuracy of the classifier. Finally, we demonstrate the effectiveness of the
trained classifier in a real world setting during manufacturing process.Comment: 10 page
- …