35 research outputs found

    Instance segmentation of standing dead trees in dense forest from aerial imagery using deep learning

    Get PDF
    "© 2022 The Author(s). Published by Elsevier B.V. on behalf of International Society of Photogrammetry and Remote Sensing (isprs). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)"Mapping standing dead trees, especially, in natural forests is very important for evaluation of the forest's health status, and its capability for storing Carbon, and the conservation of biodiversity. Apparently, natural forests have larger areas which renders the classical field surveying method very challenging, time-consuming, labor-intensive, and unsustainable. Thus, for effective forest management, there is the need for an automated approach that would be cost-effective. With the advent of Machine Learning, Deep Learning has proven to successfully achieve excellent results. This study presents an adjusted Mask R-CNN Deep Learning approach for detecting and segmenting standing dead trees in a mixed dense forest from CIR aerial imagery using a limited (195 images) training dataset. First, transfer learning is considered coupled with the image augmentation technique to leverage the limitation of training datasets. Then, we strategically selected hyperparameters to suit appropriately our model's architecture that fits well with our type of data (dead trees in images). Finally, to assess the generalization capability of our model's performance, a test dataset that was not confronted to the deep neural network was used for comprehensive evaluation. Our model recorded promising results reaching a mean average precision, average recall, and average F1-Score of 0.85, 0.88, and 0.87 respectively, despite our relatively low resolution (20 cm) dataset. Consequently, our model could be used for automation in standing dead tree detection and segmentation for enhanced forest management. This is equally significant for biodiversity conservation, and forest Carbon storage estimation.publishedVersio

    Naval Mine Detection and Seabed Segmentation in Sonar Images with Deep Learning

    Get PDF
    Underwater mines are a cost-effective method in asymmetric warfare, and are commonly used to block shipping lanes and restrict naval operations. Consequently, they threaten commercial and military vessels, disrupt humanitarian aids, and damage sea environments. There is a strong international interest in using sonars and AI for mine countermeasures and undersea surveillance. High-resolution imaging sonars are well-suited for detecting underwater mines and other targets. Compared to other sensors, sonars are more effective for undersea environments with low visibility. This project aims to investigate deep learning algorithms for two important tasks in undersea surveillance: naval mine detection and seabed terrain segmentation. Our goal is to automatically classify the composition of the seabed and localise naval mines. This research utilises the real sonar data provided by the Defence Science and Technology Group (DSTG). To conduct the experiments, we annotated 150 sonar images for semantic segmentation; the annotation is guided by experts from the DSTG.We also used 152 sonar images with mine detection annotations prepared by members of Centre for Signal and Information Processing at the University of Wollongong. Our results show Faster-RCNN to achieve the highest performance in object detection. We evaluated transfer learning and data augmentation for object detection. Each method improved our detection models mAP by 11.9% and 16.9% and mAR by 17.8% and 21.1%, respectively. Furthermore, we developed a data augmentation algorithm called Evolutionary Cut-Paste which yielded a 20.2% increase in performance. For segmentation, we found highly-tuned DeepLabV3 and U-Nett++models perform best. We evaluate various configurations of optimisers, learning rate schedules and encoder networks for each model architecture. Additionally, model hyper-parameters are tuned prior to training using various tests. Finally, we apply Median Frequency Balancing to mitigate model bias towards frequently occurring classes. We favour DeepLabV3 due to its reliable detection of underrepresented classes as opposed to the accurate boundaries produced by U-Nett++. All of the models satisfied the constraint of real-time operation when running on an NVIDIA GTX 1070

    Exploiting Spatio-Temporal Coherence for Video Object Detection in Robotics

    Get PDF
    This paper proposes a method to enhance video object detection for indoor environments in robotics. Concretely, it exploits knowledge about the camera motion between frames to propagate previously detected objects to successive frames. The proposal is rooted in the concepts of planar homography to propose regions of interest where to find objects, and recursive Bayesian filtering to integrate observations over time. The proposal is evaluated on six virtual, indoor environments, accounting for the detection of nine object classes over a total of ∼ 7k frames. Results show that our proposal improves the recall and the F1-score by a factor of 1.41 and 1.27, respectively, as well as it achieves a significant reduction of the object categorization entropy (58.8%) when compared to a two-stage video object detection method used as baseline, at the cost of small time overheads (120 ms) and precision loss (0.92).</p

    Novel deep learning architectures for marine and aquaculture applications

    Get PDF
    Alzayat Saleh's research was in the area of artificial intelligence and machine learning to autonomously recognise fish and their morphological features from digital images. Here he created new deep learning architectures that solved various computer vision problems specific to the marine and aquaculture context. He found that these techniques can facilitate aquaculture management and environmental protection. Fisheries and conservation agencies can use his results for better monitoring strategies and sustainable fishing practices

    Pixel-Level Deep Multi-Dimensional Embeddings for Homogeneous Multiple Object Tracking

    Get PDF
    The goal of Multiple Object Tracking (MOT) is to locate multiple objects and keep track of their individual identities and trajectories given a sequence of (video) frames. A popular approach to MOT is tracking by detection consisting of two processing components: detection (identification of objects of interest in individual frames) and data association (connecting data from multiple frames). This work addresses the detection component by introducing a method based on semantic instance segmentation, i.e., assigning labels to all visible pixels such that they are unique among different instances. Modern tracking methods often built around Convolutional Neural Networks (CNNs) and additional, explicitly-defined post-processing steps. This work introduces two detection methods that incorporate multi-dimensional embeddings. We train deep CNNs to produce easily-clusterable embeddings for semantic instance segmentation and to enable object detection through pose estimation. The use of embeddings allows the method to identify per-pixel instance membership for both tasks. Our method specifically targets applications that require long-term tracking of homogeneous targets using a stationary camera. Furthermore, this method was developed and evaluated on a livestock tracking application which presents exceptional challenges that generalized tracking methods are not equipped to solve. This is largely because contemporary datasets for multiple object tracking lack properties that are specific to livestock environments. These include a high degree of visual similarity between targets, complex physical interactions, long-term inter-object occlusions, and a fixed-cardinality set of targets. For the reasons stated above, our method is developed and tested with the livestock application in mind and, specifically, group-housed pigs are evaluated in this work. Our method reliably detects pigs in a group housed environment based on the publicly available dataset with 99% precision and 95% using pose estimation and achieves 80% accuracy when using semantic instance segmentation at 50% IoU threshold. Results demonstrate our method\u27s ability to achieve consistent identification and tracking of group-housed livestock, even in cases where the targets are occluded and despite the fact that they lack uniquely identifying features. The pixel-level embeddings used by the proposed method are thoroughly evaluated in order to demonstrate their properties and behaviors when applied to real data. Adivser: Lance C. Pére

    Deep Neural Networks and Data for Automated Driving

    Get PDF
    This open access book brings together the latest developments from industry and research on automated driving and artificial intelligence. Environment perception for highly automated driving heavily employs deep neural networks, facing many challenges. How much data do we need for training and testing? How to use synthetic data to save labeling costs for training? How do we increase robustness and decrease memory usage? For inevitably poor conditions: How do we know that the network is uncertain about its decisions? Can we understand a bit more about what actually happens inside neural networks? This leads to a very practical problem particularly for DNNs employed in automated driving: What are useful validation techniques and how about safety? This book unites the views from both academia and industry, where computer vision and machine learning meet environment perception for highly automated driving. Naturally, aspects of data, robustness, uncertainty quantification, and, last but not least, safety are at the core of it. This book is unique: In its first part, an extended survey of all the relevant aspects is provided. The second part contains the detailed technical elaboration of the various questions mentioned above

    Gaze-Based Human-Robot Interaction by the Brunswick Model

    Get PDF
    We present a new paradigm for human-robot interaction based on social signal processing, and in particular on the Brunswick model. Originally, the Brunswick model copes with face-to-face dyadic interaction, assuming that the interactants are communicating through a continuous exchange of non verbal social signals, in addition to the spoken messages. Social signals have to be interpreted, thanks to a proper recognition phase that considers visual and audio information. The Brunswick model allows to quantitatively evaluate the quality of the interaction using statistical tools which measure how effective is the recognition phase. In this paper we cast this theory when one of the interactants is a robot; in this case, the recognition phase performed by the robot and the human have to be revised w.r.t. the original model. The model is applied to Berrick, a recent open-source low-cost robotic head platform, where the gazing is the social signal to be considered

    Representations and representation learning for image aesthetics prediction and image enhancement

    Get PDF
    With the continual improvement in cell phone cameras and improvements in the connectivity of mobile devices, we have seen an exponential increase in the images that are captured, stored and shared on social media. For example, as of July 1st 2017 Instagram had over 715 million registered users which had posted just shy of 35 billion images. This represented approximately seven and nine-fold increase in the number of users and photos present on Instagram since 2012. Whether the images are stored on personal computers or reside on social networks (e.g. Instagram, Flickr), the sheer number of images calls for methods to determine various image properties, such as object presence or appeal, for the purpose of automatic image management and curation. One of the central problems in consumer photography centers around determining the aesthetic appeal of an image and motivates us to explore questions related to understanding aesthetic preferences, image enhancement and the possibility of using such models on devices with constrained resources. In this dissertation, we present our work on exploring representations and representation learning approaches for aesthetic inference, composition ranking and its application to image enhancement. Firstly, we discuss early representations that mainly consisted of expert features, and their possibility to enhance Convolutional Neural Networks (CNN). Secondly, we discuss the ability of resource-constrained CNNs, and the different architecture choices (inputs size and layer depth) in solving various aesthetic inference tasks: binary classification, regression, and image cropping. We show that if trained for solving fine-grained aesthetics inference, such models can rival the cropping performance of other aesthetics-based croppers, however they fall short in comparison to models trained for composition ranking. Lastly, we discuss our work on exploring and identifying the design choices in training composition ranking functions, with the goal of using them for image composition enhancement
    corecore