15 research outputs found

    A Comprehensive Survey of Deep Learning in Remote Sensing: Theories, Tools and Challenges for the Community

    Full text link
    In recent years, deep learning (DL), a re-branding of neural networks (NNs), has risen to the top in numerous areas, namely computer vision (CV), speech recognition, natural language processing, etc. Whereas remote sensing (RS) possesses a number of unique challenges, primarily related to sensors and applications, inevitably RS draws from many of the same theories as CV; e.g., statistics, fusion, and machine learning, to name a few. This means that the RS community should be aware of, if not at the leading edge of, of advancements like DL. Herein, we provide the most comprehensive survey of state-of-the-art RS DL research. We also review recent new developments in the DL field that can be used in DL for RS. Namely, we focus on theories, tools and challenges for the RS community. Specifically, we focus on unsolved challenges and opportunities as it relates to (i) inadequate data sets, (ii) human-understandable solutions for modelling physical phenomena, (iii) Big Data, (iv) non-traditional heterogeneous data sources, (v) DL architectures and learning algorithms for spectral, spatial and temporal data, (vi) transfer learning, (vii) an improved theoretical understanding of DL systems, (viii) high barriers to entry, and (ix) training and optimizing the DL.Comment: 64 pages, 411 references. To appear in Journal of Applied Remote Sensin

    TractorEYE: Vision-based Real-time Detection for Autonomous Vehicles in Agriculture

    Get PDF
    Agricultural vehicles such as tractors and harvesters have for decades been able to navigate automatically and more efficiently using commercially available products such as auto-steering and tractor-guidance systems. However, a human operator is still required inside the vehicle to ensure the safety of vehicle and especially surroundings such as humans and animals. To get fully autonomous vehicles certified for farming, computer vision algorithms and sensor technologies must detect obstacles with equivalent or better than human-level performance. Furthermore, detections must run in real-time to allow vehicles to actuate and avoid collision.This thesis proposes a detection system (TractorEYE), a dataset (FieldSAFE), and procedures to fuse information from multiple sensor technologies to improve detection of obstacles and to generate a map. TractorEYE is a multi-sensor detection system for autonomous vehicles in agriculture. The multi-sensor system consists of three hardware synchronized and registered sensors (stereo camera, thermal camera and multi-beam lidar) mounted on/in a ruggedized and water-resistant casing. Algorithms have been developed to run a total of six detection algorithms (four for rgb camera, one for thermal camera and one for a Multi-beam lidar) and fuse detection information in a common format using either 3D positions or Inverse Sensor Models. A GPU powered computational platform is able to run detection algorithms online. For the rgb camera, a deep learning algorithm is proposed DeepAnomaly to perform real-time anomaly detection of distant, heavy occluded and unknown obstacles in agriculture. DeepAnomaly is -- compared to a state-of-the-art object detector Faster R-CNN -- for an agricultural use-case able to detect humans better and at longer ranges (45-90m) using a smaller memory footprint and 7.3-times faster processing. Low memory footprint and fast processing makes DeepAnomaly suitable for real-time applications running on an embedded GPU. FieldSAFE is a multi-modal dataset for detection of static and moving obstacles in agriculture. The dataset includes synchronized recordings from a rgb camera, stereo camera, thermal camera, 360-degree camera, lidar and radar. Precise localization and pose is provided using IMU and GPS. Ground truth of static and moving obstacles (humans, mannequin dolls, barrels, buildings, vehicles, and vegetation) are available as an annotated orthophoto and GPS coordinates for moving obstacles. Detection information from multiple detection algorithms and sensors are fused into a map using Inverse Sensor Models and occupancy grid maps. This thesis presented many scientific contribution and state-of-the-art within perception for autonomous tractors; this includes a dataset, sensor platform, detection algorithms and procedures to perform multi-sensor fusion. Furthermore, important engineering contributions to autonomous farming vehicles are presented such as easily applicable, open-source software packages and algorithms that have been demonstrated in an end-to-end real-time detection system. The contributions of this thesis have demonstrated, addressed and solved critical issues to utilize camera-based perception systems that are essential to make autonomous vehicles in agriculture a reality

    Lidar-based Obstacle Detection and Recognition for Autonomous Agricultural Vehicles

    Get PDF
    Today, agricultural vehicles are available that can drive autonomously and follow exact route plans more precisely than human operators. Combined with advancements in precision agriculture, autonomous agricultural robots can reduce manual labor, improve workflow, and optimize yield. However, as of today, human operators are still required for monitoring the environment and acting upon potential obstacles in front of the vehicle. To eliminate this need, safety must be ensured by accurate and reliable obstacle detection and avoidance systems.In this thesis, lidar-based obstacle detection and recognition in agricultural environments has been investigated. A rotating multi-beam lidar generating 3D point clouds was used for point-wise classification of agricultural scenes, while multi-modal fusion with cameras and radar was used to increase performance and robustness. Two research perception platforms were presented and used for data acquisition. The proposed methods were all evaluated on recorded datasets that represented a wide range of realistic agricultural environments and included both static and dynamic obstacles.For 3D point cloud classification, two methods were proposed for handling density variations during feature extraction. One method outperformed a frequently used generic 3D feature descriptor, whereas the other method showed promising preliminary results using deep learning on 2D range images. For multi-modal fusion, four methods were proposed for combining lidar with color camera, thermal camera, and radar. Gradual improvements in classification accuracy were seen, as spatial, temporal, and multi-modal relationships were introduced in the models. Finally, occupancy grid mapping was used to fuse and map detections globally, and runtime obstacle detection was applied on mapped detections along the vehicle path, thus simulating an actual traversal.The proposed methods serve as a first step towards full autonomy for agricultural vehicles. The study has thus shown that recent advancements in autonomous driving can be transferred to the agricultural domain, when accurate distinctions are made between obstacles and processable vegetation. Future research in the domain has further been facilitated with the release of the multi-modal obstacle dataset, FieldSAFE

    Semantic location extraction from crowdsourced data

    Get PDF
    Crowdsourced Data (CSD) has recently received increased attention in many application areas including disaster management. Convenience of production and use, data currency and abundancy are some of the key reasons for attracting this high interest. Conversely, quality issues like incompleteness, credibility and relevancy prevent the direct use of such data in important applications like disaster management. Moreover, location information availability of CSD is problematic as it remains very low in many crowd sourced platforms such as Twitter. Also, this recorded location is mostly related to the mobile device or user location and often does not represent the event location. In CSD, event location is discussed descriptively in the comments in addition to the recorded location (which is generated by means of mobile device's GPS or mobile communication network). This study attempts to semantically extract the CSD location information with the help of an ontological Gazetteer and other available resources. 2011 Queensland flood tweets and Ushahidi Crowd Map data were semantically analysed to extract the location information with the support of Queensland Gazetteer which is converted to an ontological gazetteer and a global gazetteer. Some preliminary results show that the use of ontologies and semantics can improve the accuracy of place name identification of CSD and the process of location information extraction

    Multimodal headpose estimation and applications

    Get PDF
    This thesis presents new research into human headpose estimation and its applications in multi-modal data. We develop new methods for head pose estimation spanning RGB-D Human Computer Interaction (HCI) to far away "in the wild" surveillance quality data. We present the state-of-the-art solution in both head detection and head pose estimation through a new end-to-end Convolutional Neural Network architecture that reuses all of the computation for detection and pose estimation. In contrast to prior work, our method successfully spans close up HCI to low-resolution surveillance data and is cross modality: operating on both RGB and RGB-D data. We further address the problem of limited amount of standard data, and different quality of annotations by semi supervised learning and novel data augmentation. (This latter contribution also finds application in the domain of life sciences.) We report the highest accuracy by a large margin: 60% improvement; and demonstrate leading performance on multiple standardized datasets. In HCI we reduce the angular error by 40% relative to the previous reported literature. Furthermore, by defining a probabilistic spatial gaze model from the head pose we show application in human-human, human-scene interaction understanding. We present the state-of-the art results on the standard interaction datasets. A new metric to model "social mimicry" through the temporal correlation of the headpose signal is contributed and shown to be valid qualitatively and intuitively. As an application in surveillance, it is shown that with the robust headpose signal as a prior, state-of-the-art results in tracking under occlusion using a Kalman filter can be achieved. This model is named the Intentional Tracker and it improves visual tracking metrics by up to 15%. We also apply the ALICE loss that was developed for the end-to-end detection and classification, to dense classiffication of underwater coral reefs imagery. The objective of this work is to solve the challenging task of recognizing and segmenting underwater coral imagery in the wild with sparse point-based ground truth labelling. To achieve this, we propose an integrated Fully Convolutional Neural Network (FCNN) and Fully-Connected Conditional Random Field (CRF) based classification and segmentation algorithm. Our major contributions lie in four major areas. First, we show that multi-scale crop based training is useful in learning of the initial weights in the canonical one class classiffication problem. Second, we propose a modified ALICE loss for training the FCNN on sparse labels with class imbalance and establish its signi cance empirically. Third we show that by arti cially enhancing the point labels to small regions based on class distance transform, we can improve the classification accuracy further. Fourth, we improve the segmentation results using fully connected CRFs by using a bilateral message passing prior. We improve upon state-of-the-art results on all publicly available datasets by a significant margin

    Autonomous Vehicles

    Get PDF
    This edited volume, Autonomous Vehicles, is a collection of reviewed and relevant research chapters, offering a comprehensive overview of recent developments in the field of vehicle autonomy. The book comprises nine chapters authored by various researchers and edited by an expert active in the field of study. All chapters are complete in itself but united under a common research study topic. This publication aims to provide a thorough overview of the latest research efforts by international authors, open new possible research paths for further novel developments, and to inspire the younger generations into pursuing relevant academic studies and professional careers within the autonomous vehicle field

    Segmentation mutuelle d'objets d'intérêt dans des séquences d'images stéréo multispectrales

    Get PDF
    Les systèmes de vidéosurveillance automatisés actuellement déployés dans le monde sont encore bien loin de ceux qui sont représentés depuis des années dans les oeuvres de sciencefiction. Une des raisons derrière ce retard de développement est le manque d’outils de bas niveau permettant de traiter les données brutes captées sur le terrain. Le pré-traitement de ces données sert à réduire la quantité d’information qui transige vers des serveurs centralisés, qui eux effectuent l’interprétation complète du contenu visuel capté. L’identification d’objets d’intérêt dans les images brutes à partir de leur mouvement est un exemple de pré-traitement qui peut être réalisé. Toutefois, dans un contexte de vidéosurveillance, une méthode de pré-traitement ne peut généralement pas se fier à un modèle d’apparence ou de forme qui caractérise ces objets, car leur nature exacte n’est pas connue d’avance. Cela complique donc l’élaboration des méthodes de traitement de bas niveau. Dans cette thèse, nous présentons différentes méthodes permettant de détecter et de segmenter des objets d’intérêt à partir de séquences vidéo de manière complètement automatisée. Nous explorons d’abord les approches de segmentation vidéo monoculaire par soustraction d’arrière-plan. Ces approches se basent sur l’idée que l’arrière-plan d’une scène peut être modélisé au fil du temps, et que toute variation importante d’apparence non prédite par le modèle dévoile en fait la présence d’un objet en intrusion. Le principal défi devant être relevé par ce type de méthode est que leur modèle d’arrière-plan doit pouvoir s’adapter aux changements dynamiques des conditions d’observation de la scène. La méthode conçue doit aussi pouvoir rester sensible à l’apparition de nouveaux objets d’intérêt, malgré cette robustesse accrue aux comportements dynamiques prévisibles. Nous proposons deux méthodes introduisant différentes techniques de modélisation qui permettent de mieux caractériser l’apparence de l’arrière-plan sans que le modèle soit affecté par les changements d’illumination, et qui analysent la persistance locale de l’arrière-plan afin de mieux détecter les objets d’intérêt temporairement immobilisés. Nous introduisons aussi de nouveaux mécanismes de rétroaction servant à ajuster les hyperparamètres de nos méthodes en fonction du dynamisme observé de la scène et de la qualité des résultats produits.----------ABSTRACT: The automated video surveillance systems currently deployed around the world are still quite far in terms of capabilities from the ones that have inspired countless science fiction works over the past few years. One of the reasons behind this lag in development is the lack of lowlevel tools that allow raw image data to be processed directly in the field. This preprocessing is used to reduce the amount of information transferred to centralized servers that have to interpret the captured visual content for further use. The identification of objects of interest in raw images based on motion is an example of a reprocessing step that might be required by a large system. However, in a surveillance context, the preprocessing method can seldom rely on an appearance or shape model to recognize these objects since their exact nature cannot be known exactly in advance. This complicates the elaboration of low-level image processing methods. In this thesis, we present different methods that detect and segment objects of interest from video sequences in a fully unsupervised fashion. We first explore monocular video segmentation approaches based on background subtraction. These approaches are based on the idea that the background of an observed scene can be modeled over time, and that any drastic variation in appearance that is not predicted by the model actually reveals the presence of an intruding object. The main challenge that must be met by background subtraction methods is that their model should be able to adapt to dynamic changes in scene conditions. The designed methods must also remain sensitive to the emergence of new objects of interest despite this increased robustness to predictable dynamic scene behaviors. We propose two methods that introduce different modeling techniques to improve background appearance description in an illumination-invariant way, and that analyze local background persistence to improve the detection of temporarily stationary objects. We also introduce new feedback mechanisms used to adjust the hyperparameters of our methods based on the observed dynamics of the scene and the quality of the generated output

    Deep Learning Methods for Remote Sensing

    Get PDF
    Remote sensing is a field where important physical characteristics of an area are exacted using emitted radiation generally captured by satellite cameras, sensors onboard aerial vehicles, etc. Captured data help researchers develop solutions to sense and detect various characteristics such as forest fires, flooding, changes in urban areas, crop diseases, soil moisture, etc. The recent impressive progress in artificial intelligence (AI) and deep learning has sparked innovations in technologies, algorithms, and approaches and led to results that were unachievable until recently in multiple areas, among them remote sensing. This book consists of sixteen peer-reviewed papers covering new advances in the use of AI for remote sensing

    Context-Enabled Visualization Strategies for Automation Enabled Human-in-the-loop Inspection Systems to Enhance the Situation Awareness of Windstorm Risk Engineers

    Get PDF
    Insurance loss prevention survey, specifically windstorm risk inspection survey is the process of investigating potential damages associated with a building or structure in the event of an extreme weather condition such as a hurricane or tornado. Traditionally, the risk inspection process is highly subjective and depends on the skills of the engineer performing it. This dissertation investigates the sensemaking process of risk engineers while performing risk inspection with special focus on various factors influencing it. This research then investigates how context-based visualizations strategies enhance the situation awareness and performance of windstorm risk engineers. An initial study investigated the sensemaking process and situation awareness requirements of the windstorm risk engineers. The data frame theory of sensemaking was used as the framework to carry out this study. Ten windstorm risk engineers were interviewed, and the data collected were analyzed following an inductive thematic approach. The themes emerged from the data explained the sensemaking process of risk engineers, the process of making sense of contradicting information, importance of their experience level, internal and external biases influencing the inspection process, difficulty developing mental models, and potential technology interventions. More recently human in the loop systems such as drones have been used to improve the efficiency of windstorm risk inspection. This study provides recommendations to guide the design of such systems to support the sensemaking process and situation awareness of windstorm visual risk inspection. The second study investigated the effect of context-based visualization strategies to enhance the situation awareness of the windstorm risk engineers. More specifically, the study investigated how different types of information contribute towards the three levels of situation awareness. Following a between subjects study design 65 civil/construction engineering students completed this study. A checklist based and predictive display based decision aids were tested and found to be effective in supporting the situation awareness requirements as well as performance of windstorm risk engineers. However, the predictive display only helped with certain tasks like understanding the interaction among different components on the rooftop. For remaining tasks, checklist alone was sufficient. Moreover, the decision aids did not place any additional cognitive demand on the participants. This study helped us understand the advantages and disadvantages of the decision aids tested. The final study evaluated the transfer of training effect of the checklist and predictive display based decision aids. After one week of the previous study, participants completed a follow-up study without any decision aids. The performance and situation awareness of participants in the checklist and predictive display group did not change significantly from first trial to second trial. However, the performance and situation awareness of participants in the control condition improved significantly in the second trial. They attributed this to their exposure to SAGAT questionnaire in the first study. They knew what issues to look for and what tasks need to be completed in the simulation. The confounding effect of SAGAT questionnaires needs to be studied in future research efforts
    corecore