480 research outputs found
Flood dynamics derived from video remote sensing
Flooding is by far the most pervasive natural hazard, with the human impacts of floods expected to worsen in the coming decades due to climate change. Hydraulic models are a key tool for understanding flood dynamics and play a pivotal role in unravelling the processes that occur during a flood event, including inundation flow patterns and velocities. In the realm of river basin dynamics, video remote sensing is emerging as a transformative tool that can offer insights into flow dynamics and thus, together with other remotely sensed data, has the potential to be deployed to estimate discharge. Moreover, the integration of video remote sensing data with hydraulic models offers a pivotal opportunity to enhance the predictive capacity of these models.
Hydraulic models are traditionally built with accurate terrain, flow and bathymetric data and are often calibrated and validated using observed data to obtain meaningful and actionable model predictions. Data for accurately calibrating and validating hydraulic models are not always available, leaving the assessment of the predictive capabilities of some models deployed in flood risk management in question. Recent advances in remote sensing have heralded the availability of vast video datasets of high resolution. The parallel evolution of computing capabilities, coupled with advancements in artificial intelligence are enabling the processing of data at unprecedented scales and complexities, allowing us to glean meaningful insights into datasets that can be integrated with hydraulic models. The aims of the research presented in this thesis were twofold. The first aim was to evaluate and explore the potential applications of video from air- and space-borne platforms to comprehensively calibrate and validate two-dimensional hydraulic models. The second aim was to estimate river discharge using satellite video combined with high resolution topographic data. In the first of three empirical chapters, non-intrusive image velocimetry techniques were employed to estimate river surface velocities in a rural catchment. For the first time, a 2D hydraulicvmodel was fully calibrated and validated using velocities derived from Unpiloted Aerial Vehicle (UAV) image velocimetry approaches. This highlighted the value of these data in mitigating the limitations associated with traditional data sources used in parameterizing two-dimensional hydraulic models. This finding inspired the subsequent chapter where river surface velocities, derived using Large Scale Particle Image Velocimetry (LSPIV), and flood extents, derived using deep neural network-based segmentation, were extracted from satellite video and used to rigorously assess the skill of a two-dimensional hydraulic model. Harnessing the ability of deep neural networks to learn complex features and deliver accurate and contextually informed flood segmentation, the potential value of satellite video for validating two dimensional hydraulic model simulations is exhibited. In the final empirical chapter, the convergence of satellite video imagery and high-resolution topographical data bridges the gap between visual observations and quantitative measurements by enabling the direct extraction of velocities from video imagery, which is used to estimate river discharge. Overall, this thesis demonstrates the significant potential of emerging video-based remote sensing datasets and offers approaches for integrating these data into hydraulic modelling and discharge estimation practice. The incorporation of LSPIV techniques into flood modelling workflows signifies a methodological progression, especially in areas lacking robust data collection infrastructure. Satellite video remote sensing heralds a major step forward in our ability to observe river dynamics in real time, with potentially significant implications in the domain of flood modelling science
Multitemporal Feature-Level Fusion on Hyperspectral and LiDAR Data in the Urban Environment
publishedVersio
Synthetic Aperture Radar (SAR) Meets Deep Learning
This reprint focuses on the application of the combination of synthetic aperture radars and depth learning technology. It aims to further promote the development of SAR image intelligent interpretation technology. A synthetic aperture radar (SAR) is an important active microwave imaging sensor, whose all-day and all-weather working capacity give it an important place in the remote sensing community. Since the United States launched the first SAR satellite, SAR has received much attention in the remote sensing community, e.g., in geological exploration, topographic mapping, disaster forecast, and traffic monitoring. It is valuable and meaningful, therefore, to study SAR-based remote sensing applications. In recent years, deep learning represented by convolution neural networks has promoted significant progress in the computer vision community, e.g., in face recognition, the driverless field and Internet of things (IoT). Deep learning can enable computational models with multiple processing layers to learn data representations with multiple-level abstractions. This can greatly improve the performance of various applications. This reprint provides a platform for researchers to handle the above significant challenges and present their innovative and cutting-edge research results when applying deep learning to SAR in various manuscript types, e.g., articles, letters, reviews and technical reports
Sea Ice Extraction via Remote Sensed Imagery: Algorithms, Datasets, Applications and Challenges
The deep learning, which is a dominating technique in artificial
intelligence, has completely changed the image understanding over the past
decade. As a consequence, the sea ice extraction (SIE) problem has reached a
new era. We present a comprehensive review of four important aspects of SIE,
including algorithms, datasets, applications, and the future trends. Our review
focuses on researches published from 2016 to the present, with a specific focus
on deep learning-based approaches in the last five years. We divided all
relegated algorithms into 3 categories, including classical image segmentation
approach, machine learning-based approach and deep learning-based methods. We
reviewed the accessible ice datasets including SAR-based datasets, the
optical-based datasets and others. The applications are presented in 4 aspects
including climate research, navigation, geographic information systems (GIS)
production and others. It also provides insightful observations and inspiring
future research directions.Comment: 24 pages, 6 figure
Visual Guidance for Unmanned Aerial Vehicles with Deep Learning
Unmanned Aerial Vehicles (UAVs) have been widely applied in the military and civilian domains. In recent years, the operation mode of UAVs is evolving from teleoperation to autonomous flight. In order to fulfill the goal of autonomous flight, a reliable guidance system is essential. Since the combination of Global Positioning System (GPS) and Inertial Navigation System (INS) systems cannot sustain autonomous flight in some situations where GPS can be degraded or unavailable, using computer vision as a primary method for UAV guidance has been widely explored. Moreover, GPS does not provide any information to the robot on the presence of obstacles.
Stereo cameras have complex architecture and need a minimum baseline to generate disparity map. By contrast, monocular cameras are simple and require less hardware resources. Benefiting from state-of-the-art Deep Learning (DL) techniques, especially Convolutional Neural Networks (CNNs), a monocular camera is sufficient to extrapolate mid-level visual representations such as depth maps and optical flow (OF) maps from the environment. Therefore, the objective of this thesis is to develop a real-time visual guidance method for UAVs in cluttered environments using a monocular camera and DL.
The three major tasks performed in this thesis are investigating the development of DL techniques and monocular depth estimation (MDE), developing real-time CNNs for MDE, and developing visual guidance methods on the basis of the developed MDE system. A comprehensive survey is conducted, which covers Structure from Motion (SfM)-based methods, traditional handcrafted feature-based methods, and state-of-the-art DL-based methods. More importantly, it also investigates the application of MDE in robotics. Based on the survey, two CNNs for MDE are developed. In addition to promising accuracy performance, these two CNNs run at high frame rates (126 fps and 90 fps respectively), on a single modest power Graphical Processing Unit (GPU).
As regards the third task, the visual guidance for UAVs is first developed on top of the designed MDE networks. To improve the robustness of UAV guidance, OF maps are integrated into the developed visual guidance method. A cross-attention module is applied to fuse the features learned from the depth maps and OF maps. The fused features are then passed through a deep reinforcement learning (DRL) network to generate the policy for guiding the flight of UAV. Additionally, a simulation framework is developed which integrates AirSim, Unreal Engine and PyTorch. The effectiveness of the developed visual guidance method is validated through extensive experiments in the simulation framework
Occlusion-Ordered Semantic Instance Segmentation
Conventional semantic ‘instance’ segmentation methods offer a segmentation mask for each object instance in an image along with its semantic class label. These methods excel in distinguishing instances, whether they belong to the same class or different classes, providing valuable information about the scene. However, these methods lack the ability to provide depth-related information, thus unable to capture the 3D geometry of the scene.
One option to derive 3D information about a scene is monocular depth estimation. It predicts the absolute distance from the camera to each pixel in an image. However, monocular depth estimation has limitations. It lacks semantic information about object classes. Furthermore, it is not precise enough to reliably detect instances or establish depth order for known instances.
Even a coarse 3D geometry, such as the relative depth or occlusion order of objects is useful to obtain rich 3D-informed scene analysis. Based on this, we address occlusion-ordered semantic instance segmentation (OOSIS), which augments standard semantic instance segmentation by incorporating a coarse 3D geometry of the scene. By leveraging occlusion as a strong depth cue, OOSIS estimates a partial relative depth ordering of instances based on their occlusion relations. OOSIS produces two outputs: instance masks and their classes, as well as the occlusion ordering of those predicted instances.
Existing works pre-date deep learning and rely on simple visual cues such as the y-coordinate of objects for occlusion ordering. This thesis introduces two deep learning-based approaches for OOSIS. The first approach, following a top-down strategy, determines pairwise occlusion order between instances obtained by a standard instance segmentation method. However, this approach lacks global occlusion ordering consistency, having undesired cyclic orderings. Our second approach is bottom-up. It simultaneously derives instances and their occlusion order by grouping pixels into instances and assigning occlusion order labels. This approach ensures a globally consistent occlusion ordering. As part of this approach, we develop a novel deep model that predicts the boundaries where occlusion occurs plus the orientation of occlusion at the boundary, indicating which side of it occludes the other. The output of this model is utilized to obtain instances and their corresponding ordering by our proposed discrete optimization formulation.
To assess the performance of OOSIS methods, we introduce a novel evaluation metric capable of simultaneously evaluating instance segmentation and occlusion ordering. In addition, we utilize standard metrics for evaluating the quality of instance masks. We also evaluate occlusion ordering consistency, and oriented occlusion boundaries. We conduct evaluations on KINS and COCOA datasets
Proceedings of the 8th Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE 2023)
This volume gathers the papers presented at the Detection and Classification of Acoustic Scenes and Events 2023 Workshop (DCASE2023), Tampere, Finland, during 21–22 September 2023
Spatial Modeling of Compact Polarimetric Synthetic Aperture Radar Imagery
The RADARSAT Constellation Mission (RCM) utilizes compact polarimetric (CP) mode to provide data with varying resolutions, supporting a wide range of applications including oil spill detection, sea ice mapping, and land cover analysis. However, the complexity and variability of CP data, influenced by factors such as weather conditions and satellite infrastructure, introduce signature ambiguity. This ambiguity poses challenges in accurate object classification, reducing discriminability and increasing uncertainty. To address these challenges, this thesis introduces tailored spatial models in CP SAR imagery through the utilization of machine learning techniques.
Firstly, to enhance oil spill monitoring, a novel conditional random field (CRF) is introduced. The CRF model leverages the statistical properties of CP SAR data and exploits similarities in labels and features among neighboring pixels to effectively model spatial interactions. By mitigating the impact of speckle noise and accurately distinguishing oil spill candidates from oil-free water, the CRF model achieves successful results even in scenarios where the availability of labeled samples is limited. This highlights the capability of CRF in handling situations with a scarcity of training data.
Secondly, to improve the accuracy of sea ice mapping, a region-based automated classification methodology is developed. This methodology incorporates learned features, spatial context, and statistical properties from various SAR modes, resulting in enhanced classification accuracy and improved algorithmic efficiency.
Thirdly, the presence of a high degree of heterogeneity in target distribution presents an additional challenge in land cover mapping tasks, further compounded by signature ambiguity. To address this, a novel transformer model is proposed. The transformer model incorporates both fine- and coarse-grained spatial dependencies between pixels and leverages different levels of features to enhance the accuracy of land cover type detection.
The proposed approaches have undergone extensive experimentation in various remote sensing tasks, validating their effectiveness. By introducing tailored spatial models and innovative algorithms, this thesis successfully addresses the inherent complexity and variability of CP data, thereby ensuring the accuracy and reliability of diverse applications in the field of remote sensing
Explain what you see:argumentation-based learning and robotic vision
In this thesis, we have introduced new techniques for the problems of open-ended learning, online incremental learning, and explainable learning. These methods have applications in the classification of tabular data, 3D object category recognition, and 3D object parts segmentation. We have utilized argumentation theory and probability theory to develop these methods. The first proposed open-ended online incremental learning approach is Argumentation-Based online incremental Learning (ABL). ABL works with tabular data and can learn with a small number of learning instances using an abstract argumentation framework and bipolar argumentation framework. It has a higher learning speed than state-of-the-art online incremental techniques. However, it has high computational complexity. We have addressed this problem by introducing Accelerated Argumentation-Based Learning (AABL). AABL uses only an abstract argumentation framework and uses two strategies to accelerate the learning process and reduce the complexity. The second proposed open-ended online incremental learning approach is the Local Hierarchical Dirichlet Process (Local-HDP). Local-HDP aims at addressing two problems of open-ended category recognition of 3D objects and segmenting 3D object parts. We have utilized Local-HDP for the task of object part segmentation in combination with AABL to achieve an interpretable model to explain why a certain 3D object belongs to a certain category. The explanations of this model tell a user that a certain object has specific object parts that look like a set of the typical parts of certain categories. Moreover, integrating AABL and Local-HDP leads to a model that can handle a high degree of occlusion
Generative Adversarial Network and Its Application in Aerial Vehicle Detection and Biometric Identification System
In recent years, generative adversarial networks (GANs) have shown great potential in advancing the state-of-the-art in many areas of computer vision, most notably in image synthesis and manipulation tasks. GAN is a generative model which simultaneously trains a generator and a discriminator in an adversarial manner to produce real-looking synthetic data by capturing the underlying data distribution. Due to its powerful ability to generate high-quality and visually pleasingresults, we apply it to super-resolution and image-to-image translation techniques to address vehicle detection in low-resolution aerial images and cross-spectral cross-resolution iris recognition. First, we develop a Multi-scale GAN (MsGAN) with multiple intermediate outputs, which progressively learns the details and features of the high-resolution aerial images at different scales. Then the upscaled super-resolved aerial images are fed to a You Only Look Once-version 3 (YOLO-v3) object detector and the detection loss is jointly optimized along with a super-resolution loss to emphasize target vehicles sensitive to the super-resolution process. There is another problem that remains unsolved when detection takes place at night or in a dark environment, which requires an IR detector. Training such a detector needs a lot of infrared (IR) images. To address these challenges, we develop a GAN-based joint cross-modal super-resolution framework where low-resolution (LR) IR images are translated and super-resolved to high-resolution (HR) visible (VIS) images before applying detection. This approach significantly improves the accuracy of aerial vehicle detection by leveraging the benefits of super-resolution techniques in a cross-modal domain. Second, to increase the performance and reliability of deep learning-based biometric identification systems, we focus on developing conditional GAN (cGAN) based cross-spectral cross-resolution iris recognition and offer two different frameworks. The first approach trains a cGAN to jointly translate and super-resolve LR near-infrared (NIR) iris images to HR VIS iris images to perform cross-spectral cross-resolution iris matching to the same resolution and within the same spectrum. In the second approach, we design a coupled GAN (cpGAN) architecture to project both VIS and NIR iris images into a low-dimensional embedding domain. The goal of this architecture is to ensure maximum pairwise similarity between the feature vectors from the two iris modalities of the same subject. We have also proposed a pose attention-guided coupled profile-to-frontal face recognition network to learn discriminative and pose-invariant features in an embedding subspace. To show that the feature vectors learned by this deep subspace can be used for other tasks beyond recognition, we implement a GAN architecture which is able to reconstruct a frontal face from its corresponding profile face. This capability can be used in various face analysis tasks, such as emotion detection and expression tracking, where having a frontal face image can improve accuracy and reliability. Overall, our research works have shown its efficacy by achieving new state-of-the-art results through extensive experiments on publicly available datasets reported in the literature
- …