Benchmarking Edge Computing Devices for Grape Bunches and Trunks
Detection using Accelerated Object Detection Single Shot MultiBox Deep
Learning Models
Purpose: Visual perception enables robots to perceive the environment. Visual
data is processed using computer vision algorithms that are usually
time-expensive and require powerful devices to process the visual data in
real-time, which is unfeasible for open-field robots with limited energy. This
work benchmarks the performance of different heterogeneous platforms for object
detection in real-time. This research benchmarks three architectures: embedded
GPU -- Graphical Processing Units (such as NVIDIA Jetson Nano 2 GB and 4 GB,
and NVIDIA Jetson TX2), TPU -- Tensor Processing Unit (such as Coral Dev Board
TPU), and DPU -- Deep Learning Processor Unit (such as in AMD-Xilinx ZCU104
Development Board, and AMD-Xilinx Kria KV260 Starter Kit). Method: The authors
used the RetinaNet ResNet-50 fine-tuned using the natural VineSet dataset.
After the trained model was converted and compiled for target-specific hardware
formats to improve the execution efficiency. Conclusions and Results: The
platforms were assessed in terms of performance of the evaluation metrics and
efficiency (time of inference). Graphical Processing Units (GPUs) were the
slowest devices, running at 3 FPS to 5 FPS, and Field Programmable Gate Arrays
(FPGAs) were the fastest devices, running at 14 FPS to 25 FPS. The efficiency
of the Tensor Processing Unit (TPU) is irrelevant and similar to NVIDIA Jetson
TX2. TPU and GPU are the most power-efficient, consuming about 5W. The
performance differences, in the evaluation metrics, across devices are
irrelevant and have an F1 of about 70 % and mean Average Precision (mAP) of
about 60 %