1,549 research outputs found
Remote Sensing Object Detection Meets Deep Learning: A Meta-review of Challenges and Advances
Remote sensing object detection (RSOD), one of the most fundamental and
challenging tasks in the remote sensing field, has received longstanding
attention. In recent years, deep learning techniques have demonstrated robust
feature representation capabilities and led to a big leap in the development of
RSOD techniques. In this era of rapid technical evolution, this review aims to
present a comprehensive review of the recent achievements in deep learning
based RSOD methods. More than 300 papers are covered in this review. We
identify five main challenges in RSOD, including multi-scale object detection,
rotated object detection, weak object detection, tiny object detection, and
object detection with limited supervision, and systematically review the
corresponding methods developed in a hierarchical division manner. We also
review the widely used benchmark datasets and evaluation metrics within the
field of RSOD, as well as the application scenarios for RSOD. Future research
directions are provided for further promoting the research in RSOD.Comment: Accepted with IEEE Geoscience and Remote Sensing Magazine. More than
300 papers relevant to the RSOD filed were reviewed in this surve
Towards Urban General Intelligence: A Review and Outlook of Urban Foundation Models
Machine learning techniques are now integral to the advancement of
intelligent urban services, playing a crucial role in elevating the efficiency,
sustainability, and livability of urban environments. The recent emergence of
foundation models such as ChatGPT marks a revolutionary shift in the fields of
machine learning and artificial intelligence. Their unparalleled capabilities
in contextual understanding, problem solving, and adaptability across a wide
range of tasks suggest that integrating these models into urban domains could
have a transformative impact on the development of smart cities. Despite
growing interest in Urban Foundation Models~(UFMs), this burgeoning field faces
challenges such as a lack of clear definitions, systematic reviews, and
universalizable solutions. To this end, this paper first introduces the concept
of UFM and discusses the unique challenges involved in building them. We then
propose a data-centric taxonomy that categorizes current UFM-related works,
based on urban data modalities and types. Furthermore, to foster advancement in
this field, we present a promising framework aimed at the prospective
realization of UFMs, designed to overcome the identified challenges.
Additionally, we explore the application landscape of UFMs, detailing their
potential impact in various urban contexts. Relevant papers and open-source
resources have been collated and are continuously updated at
https://github.com/usail-hkust/Awesome-Urban-Foundation-Models
A Review of Landcover Classification with Very-High Resolution Remotely Sensed Optical ImagesâAnalysis Unit, Model Scalability and Transferability
As an important application in remote sensing, landcover classification remains one of the most challenging tasks in very-high-resolution (VHR) image analysis. As the rapidly increasing number of Deep Learning (DL) based landcover methods and training strategies are claimed to be the state-of-the-art, the already fragmented technical landscape of landcover mapping methods has been further complicated. Although there exists a plethora of literature review work attempting to guide researchers in making an informed choice of landcover mapping methods, the articles either focus on the review of applications in a specific area or revolve around general deep learning models, which lack a systematic view of the ever advancing landcover mapping methods. In addition, issues related to training samples and model transferability have become more critical than ever in an era dominated by data-driven approaches, but these issues were addressed to a lesser extent in previous review articles regarding remote sensing classification. Therefore, in this paper, we present a systematic overview of existing methods by starting from learning methods and varying basic analysis units for landcover mapping tasks, to challenges and solutions on three aspects of scalability and transferability with a remote sensing classification focus including (1) sparsity and imbalance of data; (2) domain gaps across different geographical regions; and (3) multi-source and multi-view fusion. We discuss in detail each of these categorical methods and draw concluding remarks in these developments and recommend potential directions for the continued endeavor
A Review of Landcover Classification with Very-High Resolution Remotely Sensed Optical ImagesâAnalysis Unit, Model Scalability and Transferability
As an important application in remote sensing, landcover classification remains one of the most challenging tasks in very-high-resolution (VHR) image analysis. As the rapidly increasing number of Deep Learning (DL) based landcover methods and training strategies are claimed to be the state-of-the-art, the already fragmented technical landscape of landcover mapping methods has been further complicated. Although there exists a plethora of literature review work attempting to guide researchers in making an informed choice of landcover mapping methods, the articles either focus on the review of applications in a specific area or revolve around general deep learning models, which lack a systematic view of the ever advancing landcover mapping methods. In addition, issues related to training samples and model transferability have become more critical than ever in an era dominated by data-driven approaches, but these issues were addressed to a lesser extent in previous review articles regarding remote sensing classification. Therefore, in this paper, we present a systematic overview of existing methods by starting from learning methods and varying basic analysis units for landcover mapping tasks, to challenges and solutions on three aspects of scalability and transferability with a remote sensing classification focus including (1) sparsity and imbalance of data; (2) domain gaps across different geographical regions; and (3) multi-source and multi-view fusion. We discuss in detail each of these categorical methods and draw concluding remarks in these developments and recommend potential directions for the continued endeavor
Improving semantic segmentation of high-resolution remote sensing images using Wasserstein generative adversarial network
Semantic segmentation of remote sensing images with high spatial resolution has many applications in a wide range of problems in this field. In recent years, the use of advanced techniques based on fully convolutional neural networks have achieved high and impressive accuracies. However, the labels of different classes are estimated independently in this method. In general, the segmentation effect is too coarse to take the relationship between pixels into account. On the other hand, due to the use of convolution filters and limitations of calculations, the field of view information of these filters will be limited in deep layers. In this study, a method based on generative adversarial network (GAN) is proposed to strengthen spatial vicinity in the output segmentation map. The segmentation model receive assistance from the GAN model in the form of a higher order potential loss. Furthermore, for better stability and performance in model training the Wasserstein GAN is used for optimization of the model. We successfully show an increase in semantic segmentation accuracy using the challenging ISPRS Vaihingen benchmark dataset
Multi-Modal Earth Observation and Deep Learning for Urban Scene Understanding
This research explores the nuances of semantic segmentation in remote sensing data through deep learning, with a focus on multi-modal data integration, the impact of label noise, and the need for diverse datasets in Earth Observation (EO). It introduces a novel model named TransFusion, designed to integrate 2D images and 3D point clouds directly, avoiding the complexities common with traditional fusion methods used in the conext of semantic segmentation. This approach led to improvements in segmentation accuracy, demonstrated by higher mean Intersection over Union (mIoU) scores for the Vaihingen and Potsdam datasets. This indicates the model's capability to better interpret spatial and structural information from multi-modal data.The study also investigates the effects of label noiseâincorrect annotations in training data, a prevalent issue in remote sensing. Through experiments involving high-resolution aerial images with intentionally inaccurate labels, it was discovered that label noise influences model performance differently across various object classes, with the size of an object significantly affecting the model's ability to handle errors. The research highlights that models are somewhat resilient to random noise, although accuracy decreases even with a small proportion of incorrect labels.Addressing the challenge of geographic bias in urban semantic segmentation datasets, primarily focused on Europe and North America, the research introduces the UAVPal dataset from Bhopal, India. This effort, along with the development of a new dense predictor head for semantic segmentation, aims to better represent the diverse urban landscapes globally. The new segmentation head, which efficiently leverages multi-scale features and notably reduces computational demands, showed improved mIoU scores across various classes and datasets.Overall, the study contributes to the field of semantic segmentation for EO by improving data fusion methods, offering insights into the effects of label noise, and encouraging the inclusion of diverse geographic data for broader representation. These efforts are steps toward more accurate and efficient remote sensing applications
Semantic data ingestion for intelligent, value-driven big data analytics
In this position paper we describe a conceptual
model for intelligent Big Data analytics based on both semantic
and machine learning AI techniques (called AI ensembles). These
processes are linked to business outcomes by explicitly modelling
data value and using semantic technologies as the underlying
mode for communication between the diverse processes and
organisations creating AI ensembles. Furthermore, we show
how data governance can direct and enhance these ensembles
by providing recommendations and insights that to ensure the
output generated produces the highest possible value for the
organisation
Toward Global Localization of Unmanned Aircraft Systems using Overhead Image Registration with Deep Learning Convolutional Neural Networks
Global localization, in which an unmanned aircraft system (UAS) estimates its unknown current location without access to its take-off location or other locational data from its flight path, is a challenging problem. This research brings together aspects from the remote sensing, geoinformatics, and machine learning disciplines by framing the global localization problem as a geospatial image registration problem in which overhead aerial and satellite imagery serve as a proxy for UAS imagery. A literature review is conducted covering the use of deep learning convolutional neural networks (DLCNN) with global localization and other related geospatial imagery applications. Differences between geospatial imagery taken from the overhead perspective and terrestrial imagery are discussed, as well as difficulties in using geospatial overhead imagery for image registration due to a lack of suitable machine learning datasets. Geospatial analysis is conducted to identify suitable areas for future UAS imagery collection. One of these areas, Jerusalem northeast (JNE) is selected as the area of interest (AOI) for this research. Multi-modal, multi-temporal, and multi-resolution geospatial overhead imagery is aggregated from a variety of publicly available sources and processed to create a controlled image dataset called Jerusalem northeast rural controlled imagery (JNE RCI). JNE RCI is tested with handcrafted feature-based methods SURF and SIFT and a non-handcrafted feature-based pre-trained fine-tuned VGG-16 DLCNN on coarse-grained image registration. Both handcrafted and non-handcrafted feature based methods had difficulty with the coarse-grained registration process. The format of JNE RCI is determined to be unsuitable for the coarse-grained registration process with DLCNNs and the process to create a new supervised machine learning dataset, Jerusalem northeast machine learning (JNE ML) is covered in detail. A multi-resolution grid based approach is used, where each grid cell ID is treated as the supervised training label for that respective resolution. Pre-trained fine-tuned VGG-16 DLCNNs, two custom architecture two-channel DLCNNs, and a custom chain DLCNN are trained on JNE ML for each spatial resolution of subimages in the dataset. All DLCNNs used could more accurately coarsely register the JNE ML subimages compared to the pre-trained fine-tuned VGG-16 DLCNN on JNE RCI. This shows the process for creating JNE ML is valid and is suitable for using machine learning with the coarse-grained registration problem. All custom architecture two-channel DLCNNs and the custom chain DLCNN were able to more accurately coarsely register the JNE ML subimages compared to the fine-tuned pre-trained VGG-16 approach. Both the two-channel custom DLCNNs and the chain DLCNN were able to generalize well to new imagery that these networks had not previously trained on. Through the contributions of this research, a foundation is laid for future work to be conducted on the UAS global localization problem within the rural forested JNE AOI
Spatial Object Recommendation with Hints: When Spatial Granularity Matters
Existing spatial object recommendation algorithms generally treat objects
identically when ranking them. However, spatial objects often cover different
levels of spatial granularity and thereby are heterogeneous. For example, one
user may prefer to be recommended a region (say Manhattan), while another user
might prefer a venue (say a restaurant). Even for the same user, preferences
can change at different stages of data exploration. In this paper, we study how
to support top-k spatial object recommendations at varying levels of spatial
granularity, enabling spatial objects at varying granularity, such as a city,
suburb, or building, as a Point of Interest (POI). To solve this problem, we
propose the use of a POI tree, which captures spatial containment relationships
between POIs. We design a novel multi-task learning model called MPR (short for
Multi-level POI Recommendation), where each task aims to return the top-k POIs
at a certain spatial granularity level. Each task consists of two subtasks: (i)
attribute-based representation learning; (ii) interaction-based representation
learning. The first subtask learns the feature representations for both users
and POIs, capturing attributes directly from their profiles. The second subtask
incorporates user-POI interactions into the model. Additionally, MPR can
provide insights into why certain recommendations are being made to a user
based on three types of hints: user-aspect, POI-aspect, and interaction-aspect.
We empirically validate our approach using two real-life datasets, and show
promising performance improvements over several state-of-the-art methods
- âŠ