Search CORE

1,037 research outputs found

FPGA-Based Hardware Accelerators for Deep Learning in Mobile Robotics

Author: Al-Ameri Yasir
Publication venue
Publication date: 23/11/2023
Field of study

The increasing demand for real-time low-power hardware processing systems, endowed with the capacity to perform compute-intensive applications, accentuated the inadequacy of the conventional architecture of multicore general-purpose processors. In an effort to meet this demand, edge computing hardware accelerators have come to the forefront, notably with regard to deep learning and robotic systems. This thesis explores preeminent hardware accelerators and examines the performance, accuracy, and power consumption of a GPU and an FPGA-based platform, both specifically designed for edge computing applications. The experiments were conducted using three deep neural network models, namely AlexNet, GoogLeNet, and ResNet-18, trained to perform binary image classification in a known environment. Our results demonstrate that the FPGA-based platform, particularly a Kria KV260 Vision AI starter kit, exhibited an inference speed of up to nine and a half times faster than that of the GPU-based Jetson Nano developer kit. Additionally, the empirical findings of this work reported as much as a quintuple efficiency over the Jetson Nano in terms of inference speed per watt with a mere 5.4\% drop in accuracy caused by the quantization process required by the FPGA. However, the Jetson Nano showed a 1.6 times faster inference rate with the AlexNet model over the KV260 and its deployment process proved to be less challenging

UTUPub

Generative Adversarial Super-Resolution at the Edge with Knowledge Distillation

Author: Angarano Simone
Chiaberge Marcello
Martini Mauro
Salvetti Francesco
Publication venue
Publication date: 07/09/2022
Field of study

Single-Image Super-Resolution can support robotic tasks in environments where a reliable visual stream is required to monitor the mission, handle teleoperation or study relevant visual details. In this work, we propose an efficient Generative Adversarial Network model for real-time Super-Resolution. We adopt a tailored architecture of the original SRGAN and model quantization to boost the execution on CPU and Edge TPU devices, achieving up to 200 fps inference. We further optimize our model by distilling its knowledge to a smaller version of the network and obtain remarkable improvements compared to the standard training approach. Our experiments show that our fast and lightweight model preserves considerably satisfying image quality compared to heavier state-of-the-art models. Finally, we conduct experiments on image transmission with bandwidth degradation to highlight the advantages of the proposed system for mobile robotic applications

arXiv.org e-Print Archive

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Tiny Machine Learning Environment: Enabling Intelligence on Constrained Devices

Author: SAKR FOUAD
Publication venue: Università degli studi di Genova
Publication date: 28/04/2023
Field of study

Running machine learning algorithms (ML) on constrained devices at the extreme edge of the network is problematic due to the computational overhead of ML algorithms, available resources on the embedded platform, and application budget (i.e., real-time requirements, power constraints, etc.). This required the development of specific solutions and development tools for what is now referred to as TinyML. In this dissertation, we focus on improving the deployment and performance of TinyML applications, taking into consideration the aforementioned challenges, especially memory requirements. This dissertation contributed to the construction of the Edge Learning Machine environment (ELM), a platform-independent open-source framework that provides three main TinyML services, namely shallow ML, self-supervised ML, and binary deep learning on constrained devices. In this context, this work includes the following steps, which are reflected in the thesis structure. First, we present the performance analysis of state-of-the-art shallow ML algorithms including dense neural networks, implemented on mainstream microcontrollers. The comprehensive analysis in terms of algorithms, hardware platforms, datasets, preprocessing techniques, and configurations shows similar performance results compared to a desktop machine and highlights the impact of these factors on overall performance. Second, despite the assumption that TinyML only permits models inference provided by the scarcity of resources, we have gone a step further and enabled self-supervised on-device training on microcontrollers and tiny IoT devices by developing the Autonomous Edge Pipeline (AEP) system. AEP achieves comparable accuracy compared to the typical TinyML paradigm, i.e., models trained on resource-abundant devices and then deployed on microcontrollers. Next, we present the development of a memory allocation strategy for convolutional neural networks (CNNs) layers, that optimizes memory requirements. This approach reduces the memory footprint without affecting accuracy nor latency. Moreover, e-skin systems share the main requirements of the TinyML fields: enabling intelligence with low memory, low power consumption, and low latency. Therefore, we designed an efficient Tiny CNN architecture for e-skin applications. The architecture leverages the memory allocation strategy presented earlier and provides better performance than existing solutions. A major contribution of the thesis is given by CBin-NN, a library of functions for implementing extremely efficient binary neural networks on constrained devices. The library outperforms state of the art NN deployment solutions by drastically reducing memory footprint and inference latency. All the solutions proposed in this thesis have been implemented on representative devices and tested in relevant applications, of which results are reported and discussed. The ELM framework is open source, and this work is clearly becoming a useful, versatile toolkit for the IoT and TinyML research and development community

Archivio istituzionale della ricerca - Università di Genova

Just Round: Quantized Observation Spaces Enable Memory Efficient Learning of Dynamic Locomotion

Author: Grossman Lev
Plancher Brian
Publication venue
Publication date: 22/04/2023
Field of study

Deep reinforcement learning (DRL) is one of the most powerful tools for synthesizing complex robotic behaviors. But training DRL models is incredibly compute and memory intensive, requiring large training datasets and replay buffers to achieve performant results. This poses a challenge for the next generation of field robots that will need to learn on the edge to adapt to their environment. In this paper, we begin to address this issue through observation space quantization. We evaluate our approach using four simulated robot locomotion tasks and two state-of-the-art DRL algorithms, the on-policy Proximal Policy Optimization (PPO) and off-policy Soft Actor-Critic (SAC) and find that observation space quantization reduces overall memory costs by as much as 4.2x without impacting learning performance.Comment: Accepted to ICRA 202

arXiv.org e-Print Archive

Benchmarking Object Detection Deep Learning Models in Embedded Devices

Author: Cantero David
Esnaola-Gonzalez Iker
Jauregi Iztueta Ekaitz
Miguel-Alonso Jose
Publication venue: 'MDPI AG'
Publication date: 01/05/2022
Field of study

Object detection is an essential capability for performing complex tasks in robotic applications. Today, deep learning (DL) approaches are the basis of state-of-the-art solutions in computer vision, where they provide very high accuracy albeit with high computational costs. Due to the physical limitations of robotic platforms, embedded devices are not as powerful as desktop computers, and adjustments have to be made to deep learning models before transferring them to robotic applications. This work benchmarks deep learning object detection models in embedded devices. Furthermore, some hardware selection guidelines are included, together with a description of the most relevant features of the two boards selected for this benchmark. Embedded electronic devices integrate a powerful AI co-processor to accelerate DL applications. To take advantage of these co-processors, models must be converted to a specific embedded runtime format. Five quantization levels applied to a collection of DL models are considered; two of them allow the execution of models in the embedded general-purpose CPU and are used as the baseline to assess the improvements obtained when running the same models with the three remaining quantization levels in the AI co-processors. The benchmark procedure is explained in detail, and a comprehensive analysis of the collected data is presented. Finally, the feasibility and challenges of the implementation of embedded object detection applications are discussed.This work has received support from the following programs: PID2019-104966GB-I00 (Spanish Ministry of Science and Innovation), IT-1244-19 (Basque Government), KK-2020/00049, KK-2021/00111 and KK-2021/00095 (Elkartek projects 3KIA, ERTZEAN and SIGZE, funded by the SPRI-Basque Government) and the AI-PROFICIENT project funded by European Union’s Horizon 2020 research and innovation program under grant agreement no. 9573

Directory of Open Access Journals

Archivo Digital para la Docencia y la Investigación

PubMed Central

Segmentation and detection of Woody Trunks using Deep Learning for Agricultural Robotics

Author: Nuno Gonçalo Pinto Machado Namora Monteiro
Publication venue
Publication date: 22/07/2020
Field of study

This project aims to help the implementation of image processing algorithms in agriculture robots so that they are robust to different aspects like weather conditions, vineyard terrain irregularities and efficient to operate in small robots with low energy consumption. Along with this, Deep Learning models became more complex. Thus, not all processors can handle such models. So, to develop a system with real-time detection for low-power processors becomes demanding because there is a lack of real datasets annotated for vine trunks and expedite tools to support this work. To support the deployment of deep-learning technology in agricultural robots, this dissertation presents the first public dataset of vine trunk images, called VineSet, with respective annotations for each trunk. This dataset was built from scratch, having a total of 9481 images of 5 different Douro vineyards, resulting from the images initially collected by AgRob V16 and various augmentation operations. Then, this dataset was used to train different state-of-the-art Deep Learning object detection models, together with Google Tensor Processing Unit. In parallel with this, this work presents an assisted labelling procedure that uses our trained models to reduce the time spent on labelling in the creation of new datasets. Also, this dissertation proposes the segmentation of vine trunks, using object detection models and semantic segmentation models. In this way, all the work done will allow the integration of edge-AI algorithms in SLAM, like Vine-SLAM, which will serve for the localisation and mapping of the robot, through natural markers in the vineyards.Agricultural robots need image processing algorithms, which should be reliable under all weather conditions and be computationally efficient. Furthermore, several limitations may arise, such as the characteristic vineyard terrain irregularities or overfitting in the training of neural networks that may affect the performance. In parallel with this, the evolution of Deep Learning models became more complex, demanding an increased computational complexity. Thus, not all processors can handle such models efficiently. So, developing a system with a real-time performance for low-power processors becomes demanding and is nowadays a research and development challenge because there is a lack of real data sets annotated and expedite tools to support this work. To support the deployment of deep-learning technology in agricultural robots, this dissertation presents a public VineSet dataset, the first public large collection of vine trunk images. The dataset was built from scratch, having a total of 9481 real image frames and providing the vine trunks annotations in each one of them. VineSet is composed of RGB and thermal images of 5 different Douro vineyards, with 952 initially collected by AgRob V16 robot, and others 8529 image frames resulting from a vast number of augmentation operations. To check the validity and usefulness of this VineSet dataset, in this work is presented an experimental baseline study, using state-of-the-art Deep Learning models together with Google Tensor Processing Unit. To simplify the task of augmentation in the creation of future datasets, we propose an assisted labelling procedure - by using our trained models - to reduce the labelling time, in some cases ten times faster per frame. This dissertation presents preliminary results to support future research in this topic, for example with VineSet leads possible to train (by transfer learning procedure) existing deep neural networks with Average Precision (AP) higher than 80% for vineyards trunks detection. For example, an AP of 84.16% was achieved for SSD MobileNet-V1. Also, the models trained with VineSet present good results in other environments such as orchards or forests. Our automatic labelling tool proves this, reducing annotation time by more than 30% in various areas of agriculture and more than 70% on vineyards. In this dissertation, we also propose the segmentation of the vine trunks. Firstly, object detection models were used together with VineSet to perform the trunk segmentation. To evaluate the performance of the different models, a script that implements some metrics of semantic segmentation was built. The results showed that the object detection models trained with VineSet were not only suitable for trunk detection but also trunk segmentation. For example, a DICE Similarity Index (DSI) of 70.78% was achieved for SSD MobileNet-V1. Finally, semantic segmentation was also briefly approached. A subset of the images of VineSet was used to train several models. Results show that semantic segmentation can substitute DL-based object detection models for pixel-based classification if a proper training set is provided. In this way, all the work done will allow the integration of edge-AI algorithms in SLAM, like Vine-SLAM, which will serve for the localisation and mapping of the robot, through natural markers in the vineyards

Repositório Aberto da Universidade do Porto