786 research outputs found
Deep learning in remote sensing: a review
Standing at the paradigm shift towards data-intensive science, machine
learning techniques are becoming increasingly important. In particular, as a
major breakthrough in the field, deep learning has proven as an extremely
powerful tool in many fields. Shall we embrace deep learning as the key to all?
Or, should we resist a 'black-box' solution? There are controversial opinions
in the remote sensing community. In this article, we analyze the challenges of
using deep learning for remote sensing data analysis, review the recent
advances, and provide resources to make deep learning in remote sensing
ridiculously simple to start with. More importantly, we advocate remote sensing
scientists to bring their expertise into deep learning, and use it as an
implicit general model to tackle unprecedented large-scale influential
challenges, such as climate change and urbanization.Comment: Accepted for publication IEEE Geoscience and Remote Sensing Magazin
Object Detection in 20 Years: A Survey
Object detection, as of one the most fundamental and challenging problems in
computer vision, has received great attention in recent years. Its development
in the past two decades can be regarded as an epitome of computer vision
history. If we think of today's object detection as a technical aesthetics
under the power of deep learning, then turning back the clock 20 years we would
witness the wisdom of cold weapon era. This paper extensively reviews 400+
papers of object detection in the light of its technical evolution, spanning
over a quarter-century's time (from the 1990s to 2019). A number of topics have
been covered in this paper, including the milestone detectors in history,
detection datasets, metrics, fundamental building blocks of the detection
system, speed up techniques, and the recent state of the art detection methods.
This paper also reviews some important detection applications, such as
pedestrian detection, face detection, text detection, etc, and makes an in-deep
analysis of their challenges as well as technical improvements in recent years.Comment: This work has been submitted to the IEEE TPAMI for possible
publicatio
Remote Sensing Object Detection Meets Deep Learning: A Meta-review of Challenges and Advances
Remote sensing object detection (RSOD), one of the most fundamental and
challenging tasks in the remote sensing field, has received longstanding
attention. In recent years, deep learning techniques have demonstrated robust
feature representation capabilities and led to a big leap in the development of
RSOD techniques. In this era of rapid technical evolution, this review aims to
present a comprehensive review of the recent achievements in deep learning
based RSOD methods. More than 300 papers are covered in this review. We
identify five main challenges in RSOD, including multi-scale object detection,
rotated object detection, weak object detection, tiny object detection, and
object detection with limited supervision, and systematically review the
corresponding methods developed in a hierarchical division manner. We also
review the widely used benchmark datasets and evaluation metrics within the
field of RSOD, as well as the application scenarios for RSOD. Future research
directions are provided for further promoting the research in RSOD.Comment: Accepted with IEEE Geoscience and Remote Sensing Magazine. More than
300 papers relevant to the RSOD filed were reviewed in this surve
Self-Supervised Learning for Monocular Depth Estimation from Aerial Imagery
Supervised learning based methods for monocular depth estimation usually require large amounts of extensively annotated training data. In the case of aerial imagery, this ground truth is particularly difficult to acquire. Therefore, in this paper, we present a method for self-supervised learning for monocular depth estimation from aerial imagery that does not require annotated training data. For this, we only use an image sequence from a single moving camera and learn to simultaneously estimate depth and pose information. By sharing the weights between pose and depth estimation, we achieve a relatively small model, which favors real-time application. We evaluate our approach on three diverse datasets and compare the results to conventional methods that estimate depth maps based on multi-view geometry. We achieve an accuracy δ1:25 of up to 93.5 %. In addition, we have paid particular attention to the generalization of a trained model to unknown data and the self-improving capabilities of our approach. We conclude that, even though the results of monocular depth estimation are inferior to those achieved by conventional methods, they are well suited to provide a good initialization for methods that rely on image matching or to provide estimates in regions where image matching fails, e.g. occluded or texture-less regions
Self-Supervised Learning for Monocular Depth Estimation from Aerial Imagery
Supervised learning based methods for monocular depth estimation usually
require large amounts of extensively annotated training data. In the case of
aerial imagery, this ground truth is particularly difficult to acquire.
Therefore, in this paper, we present a method for self-supervised learning for
monocular depth estimation from aerial imagery that does not require annotated
training data. For this, we only use an image sequence from a single moving
camera and learn to simultaneously estimate depth and pose information. By
sharing the weights between pose and depth estimation, we achieve a relatively
small model, which favors real-time application. We evaluate our approach on
three diverse datasets and compare the results to conventional methods that
estimate depth maps based on multi-view geometry. We achieve an accuracy
{\delta}1.25 of up to 93.5 %. In addition, we have paid particular attention to
the generalization of a trained model to unknown data and the self-improving
capabilities of our approach. We conclude that, even though the results of
monocular depth estimation are inferior to those achieved by conventional
methods, they are well suited to provide a good initialization for methods that
rely on image matching or to provide estimates in regions where image matching
fails, e.g. occluded or texture-less regions
Survey on video anomaly detection in dynamic scenes with moving cameras
The increasing popularity of compact and inexpensive cameras, e.g.~dash
cameras, body cameras, and cameras equipped on robots, has sparked a growing
interest in detecting anomalies within dynamic scenes recorded by moving
cameras. However, existing reviews primarily concentrate on Video Anomaly
Detection (VAD) methods assuming static cameras. The VAD literature with moving
cameras remains fragmented, lacking comprehensive reviews to date. To address
this gap, we endeavor to present the first comprehensive survey on Moving
Camera Video Anomaly Detection (MC-VAD). We delve into the research papers
related to MC-VAD, critically assessing their limitations and highlighting
associated challenges. Our exploration encompasses three application domains:
security, urban transportation, and marine environments, which in turn cover
six specific tasks. We compile an extensive list of 25 publicly-available
datasets spanning four distinct environments: underwater, water surface,
ground, and aerial. We summarize the types of anomalies these datasets
correspond to or contain, and present five main categories of approaches for
detecting such anomalies. Lastly, we identify future research directions and
discuss novel contributions that could advance the field of MC-VAD. With this
survey, we aim to offer a valuable reference for researchers and practitioners
striving to develop and advance state-of-the-art MC-VAD methods.Comment: Under revie
Generalized Label-Efficient 3D Scene Parsing via Hierarchical Feature Aligned Pre-Training and Region-Aware Fine-tuning
Deep neural network models have achieved remarkable progress in 3D scene
understanding while trained in the closed-set setting and with full labels.
However, the major bottleneck for current 3D recognition approaches is that
they do not have the capacity to recognize any unseen novel classes beyond the
training categories in diverse kinds of real-world applications. In the
meantime, current state-of-the-art 3D scene understanding approaches primarily
require high-quality labels to train neural networks, which merely perform well
in a fully supervised manner. This work presents a generalized and simple
framework for dealing with 3D scene understanding when the labeled scenes are
quite limited. To extract knowledge for novel categories from the pre-trained
vision-language models, we propose a hierarchical feature-aligned pre-training
and knowledge distillation strategy to extract and distill meaningful
information from large-scale vision-language models, which helps benefit the
open-vocabulary scene understanding tasks. To leverage the boundary
information, we propose a novel energy-based loss with boundary awareness
benefiting from the region-level boundary predictions. To encourage latent
instance discrimination and to guarantee efficiency, we propose the
unsupervised region-level semantic contrastive learning scheme for point
clouds, using confident predictions of the neural network to discriminate the
intermediate feature embeddings at multiple stages. Extensive experiments with
both indoor and outdoor scenes demonstrated the effectiveness of our approach
in both data-efficient learning and open-world few-shot learning. All codes,
models, and data are made publicly available at:
https://drive.google.com/drive/folders/1M58V-PtR8DBEwD296zJkNg_m2qq-MTAP?usp=sharing.Comment: IEEE Transactions on Pattern Analysis and Machine Intelligence,
Manuscript Info: 22 Pages, 16 Figures, and 8 Table
- …