2,707 research outputs found
Deep Learning-Based Multiple Object Visual Tracking on Embedded System for IoT and Mobile Edge Computing Applications
Compute and memory demands of state-of-the-art deep learning methods are
still a shortcoming that must be addressed to make them useful at IoT
end-nodes. In particular, recent results depict a hopeful prospect for image
processing using Convolutional Neural Netwoks, CNNs, but the gap between
software and hardware implementations is already considerable for IoT and
mobile edge computing applications due to their high power consumption. This
proposal performs low-power and real time deep learning-based multiple object
visual tracking implemented on an NVIDIA Jetson TX2 development kit. It
includes a camera and wireless connection capability and it is battery powered
for mobile and outdoor applications. A collection of representative sequences
captured with the on-board camera, dETRUSC video dataset, is used to exemplify
the performance of the proposed algorithm and to facilitate benchmarking. The
results in terms of power consumption and frame rate demonstrate the
feasibility of deep learning algorithms on embedded platforms although more
effort to joint algorithm and hardware design of CNNs is needed.Comment: This work has been submitted to the IEEE for possible publication.
Copyright may be transferred without notice, after which this version may no
longer be accessibl
Towards a Scalable Hardware/Software Co-Design Platform for Real-time Pedestrian Tracking Based on a ZYNQ-7000 Device
Currently, most designers face a daunting task to
research different design flows and learn the intricacies of
specific software from various manufacturers in
hardware/software co-design. An urgent need of creating a
scalable hardware/software co-design platform has become a key
strategic element for developing hardware/software integrated
systems. In this paper, we propose a new design flow for building
a scalable co-design platform on FPGA-based system-on-chip.
We employ an integrated approach to implement a histogram
oriented gradients (HOG) and a support vector machine (SVM)
classification on a programmable device for pedestrian tracking.
Not only was hardware resource analysis reported, but the
precision and success rates of pedestrian tracking on nine open
access image data sets are also analysed. Finally, our proposed
design flow can be used for any real-time image processingrelated
products on programmable ZYNQ-based embedded
systems, which benefits from a reduced design time and provide a
scalable solution for embedded image processing products
LEGaTO: first steps towards energy-efficient toolset for heterogeneous computing
LEGaTO is a three-year EU H2020 project which started in December 2017. The LEGaTO project will leverage task-based programming models to provide a software ecosystem for Made-in-Europe heterogeneous hardware composed of CPUs, GPUs, FPGAs and dataflow engines. The aim is to attain one order of magnitude energy savings from the edge to the converged cloud/HPC.Peer ReviewedPostprint (author's final draft
NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps
Convolutional neural networks (CNNs) have become the dominant neural network
architecture for solving many state-of-the-art (SOA) visual processing tasks.
Even though Graphical Processing Units (GPUs) are most often used in training
and deploying CNNs, their power efficiency is less than 10 GOp/s/W for
single-frame runtime inference. We propose a flexible and efficient CNN
accelerator architecture called NullHop that implements SOA CNNs useful for
low-power and low-latency application scenarios. NullHop exploits the sparsity
of neuron activations in CNNs to accelerate the computation and reduce memory
requirements. The flexible architecture allows high utilization of available
computing resources across kernel sizes ranging from 1x1 to 7x7. NullHop can
process up to 128 input and 128 output feature maps per layer in a single pass.
We implemented the proposed architecture on a Xilinx Zynq FPGA platform and
present results showing how our implementation reduces external memory
transfers and compute time in five different CNNs ranging from small ones up to
the widely known large VGG16 and VGG19 CNNs. Post-synthesis simulations using
Mentor Modelsim in a 28nm process with a clock frequency of 500 MHz show that
the VGG19 network achieves over 450 GOp/s. By exploiting sparsity, NullHop
achieves an efficiency of 368%, maintains over 98% utilization of the MAC
units, and achieves a power efficiency of over 3TOp/s/W in a core area of
6.3mm. As further proof of NullHop's usability, we interfaced its FPGA
implementation with a neuromorphic event camera for real time interactive
demonstrations
SVITE: A Spike-Based VITE Neuro-Inspired Robot Controller
This paper presents an implementation of a neuro-inspired algorithm
called VITE (Vector Integration To End Point) in FPGA in the spikes domain.
VITE aims to generate a non-planned trajectory for reaching tasks in robots.
The algorithm has been adapted to work completely in the spike domain under
Simulink simulations. The FPGA implementation consists in 4 VITE in parallel
for controlling a 4-degree-of-freedom stereo-vision robot. This work represents
the main layer of a complex spike-based architecture for robot neuro-inspired
reaching tasks in FPGAs. It has been implemented in two Xilinx FPGA
families: Virtex-5 and Spartan-6. Resources consumption comparative between
both devices is presented. Results obtained for Spartan device could allow
controlling complex robotic structures with up to 96 degrees of freedom per
FPGA, providing, in parallel, high speed connectivity with other neuromorphic
systems sending movement references. An exponential and gamma distribution
test over the inter spike interval has been performed to proof the approach to the
neural code proposed.Ministerio de Economía y Competitividad TEC2012-37868-C04-0
- …