Segmentation and detection of Woody Trunks using Deep Learning for Agricultural Robotics

Abstract

This project aims to help the implementation of image processing algorithms in agriculture robots so that they are robust to different aspects like weather conditions, vineyard terrain irregularities and efficient to operate in small robots with low energy consumption. Along with this, Deep Learning models became more complex. Thus, not all processors can handle such models. So, to develop a system with real-time detection for low-power processors becomes demanding because there is a lack of real datasets annotated for vine trunks and expedite tools to support this work. To support the deployment of deep-learning technology in agricultural robots, this dissertation presents the first public dataset of vine trunk images, called VineSet, with respective annotations for each trunk. This dataset was built from scratch, having a total of 9481 images of 5 different Douro vineyards, resulting from the images initially collected by AgRob V16 and various augmentation operations. Then, this dataset was used to train different state-of-the-art Deep Learning object detection models, together with Google Tensor Processing Unit. In parallel with this, this work presents an assisted labelling procedure that uses our trained models to reduce the time spent on labelling in the creation of new datasets. Also, this dissertation proposes the segmentation of vine trunks, using object detection models and semantic segmentation models. In this way, all the work done will allow the integration of edge-AI algorithms in SLAM, like Vine-SLAM, which will serve for the localisation and mapping of the robot, through natural markers in the vineyards.Agricultural robots need image processing algorithms, which should be reliable under all weather conditions and be computationally efficient. Furthermore, several limitations may arise, such as the characteristic vineyard terrain irregularities or overfitting in the training of neural networks that may affect the performance. In parallel with this, the evolution of Deep Learning models became more complex, demanding an increased computational complexity. Thus, not all processors can handle such models efficiently. So, developing a system with a real-time performance for low-power processors becomes demanding and is nowadays a research and development challenge because there is a lack of real data sets annotated and expedite tools to support this work. To support the deployment of deep-learning technology in agricultural robots, this dissertation presents a public VineSet dataset, the first public large collection of vine trunk images. The dataset was built from scratch, having a total of 9481 real image frames and providing the vine trunks annotations in each one of them. VineSet is composed of RGB and thermal images of 5 different Douro vineyards, with 952 initially collected by AgRob V16 robot, and others 8529 image frames resulting from a vast number of augmentation operations. To check the validity and usefulness of this VineSet dataset, in this work is presented an experimental baseline study, using state-of-the-art Deep Learning models together with Google Tensor Processing Unit. To simplify the task of augmentation in the creation of future datasets, we propose an assisted labelling procedure - by using our trained models - to reduce the labelling time, in some cases ten times faster per frame. This dissertation presents preliminary results to support future research in this topic, for example with VineSet leads possible to train (by transfer learning procedure) existing deep neural networks with Average Precision (AP) higher than 80% for vineyards trunks detection. For example, an AP of 84.16% was achieved for SSD MobileNet-V1. Also, the models trained with VineSet present good results in other environments such as orchards or forests. Our automatic labelling tool proves this, reducing annotation time by more than 30% in various areas of agriculture and more than 70% on vineyards. In this dissertation, we also propose the segmentation of the vine trunks. Firstly, object detection models were used together with VineSet to perform the trunk segmentation. To evaluate the performance of the different models, a script that implements some metrics of semantic segmentation was built. The results showed that the object detection models trained with VineSet were not only suitable for trunk detection but also trunk segmentation. For example, a DICE Similarity Index (DSI) of 70.78% was achieved for SSD MobileNet-V1. Finally, semantic segmentation was also briefly approached. A subset of the images of VineSet was used to train several models. Results show that semantic segmentation can substitute DL-based object detection models for pixel-based classification if a proper training set is provided. In this way, all the work done will allow the integration of edge-AI algorithms in SLAM, like Vine-SLAM, which will serve for the localisation and mapping of the robot, through natural markers in the vineyards

    Similar works