221 research outputs found
A Comprehensive Survey of Deep Learning in Remote Sensing: Theories, Tools and Challenges for the Community
In recent years, deep learning (DL), a re-branding of neural networks (NNs),
has risen to the top in numerous areas, namely computer vision (CV), speech
recognition, natural language processing, etc. Whereas remote sensing (RS)
possesses a number of unique challenges, primarily related to sensors and
applications, inevitably RS draws from many of the same theories as CV; e.g.,
statistics, fusion, and machine learning, to name a few. This means that the RS
community should be aware of, if not at the leading edge of, of advancements
like DL. Herein, we provide the most comprehensive survey of state-of-the-art
RS DL research. We also review recent new developments in the DL field that can
be used in DL for RS. Namely, we focus on theories, tools and challenges for
the RS community. Specifically, we focus on unsolved challenges and
opportunities as it relates to (i) inadequate data sets, (ii)
human-understandable solutions for modelling physical phenomena, (iii) Big
Data, (iv) non-traditional heterogeneous data sources, (v) DL architectures and
learning algorithms for spectral, spatial and temporal data, (vi) transfer
learning, (vii) an improved theoretical understanding of DL systems, (viii)
high barriers to entry, and (ix) training and optimizing the DL.Comment: 64 pages, 411 references. To appear in Journal of Applied Remote
Sensin
Occupancy-MAE: Self-supervised Pre-training Large-scale LiDAR Point Clouds with Masked Occupancy Autoencoders
Current perception models in autonomous driving heavily rely on large-scale
labelled 3D data, which is both costly and time-consuming to annotate. This
work proposes a solution to reduce the dependence on labelled 3D training data
by leveraging pre-training on large-scale unlabeled outdoor LiDAR point clouds
using masked autoencoders (MAE). While existing masked point autoencoding
methods mainly focus on small-scale indoor point clouds or pillar-based
large-scale outdoor LiDAR data, our approach introduces a new self-supervised
masked occupancy pre-training method called Occupancy-MAE, specifically
designed for voxel-based large-scale outdoor LiDAR point clouds. Occupancy-MAE
takes advantage of the gradually sparse voxel occupancy structure of outdoor
LiDAR point clouds and incorporates a range-aware random masking strategy and a
pretext task of occupancy prediction. By randomly masking voxels based on their
distance to the LiDAR and predicting the masked occupancy structure of the
entire 3D surrounding scene, Occupancy-MAE encourages the extraction of
high-level semantic information to reconstruct the masked voxel using only a
small number of visible voxels. Extensive experiments demonstrate the
effectiveness of Occupancy-MAE across several downstream tasks. For 3D object
detection, Occupancy-MAE reduces the labelled data required for car detection
on the KITTI dataset by half and improves small object detection by
approximately 2% in AP on the Waymo dataset. For 3D semantic segmentation,
Occupancy-MAE outperforms training from scratch by around 2% in mIoU. For
multi-object tracking, Occupancy-MAE enhances training from scratch by
approximately 1% in terms of AMOTA and AMOTP. Codes are publicly available at
https://github.com/chaytonmin/Occupancy-MAE.Comment: Accepted by TI
DC-SPP-YOLO: Dense Connection and Spatial Pyramid Pooling Based YOLO for Object Detection
Although YOLOv2 approach is extremely fast on object detection; its backbone
network has the low ability on feature extraction and fails to make full use of
multi-scale local region features, which restricts the improvement of object
detection accuracy. Therefore, this paper proposed a DC-SPP-YOLO (Dense
Connection and Spatial Pyramid Pooling Based YOLO) approach for ameliorating
the object detection accuracy of YOLOv2. Specifically, the dense connection of
convolution layers is employed in the backbone network of YOLOv2 to strengthen
the feature extraction and alleviate the vanishing-gradient problem. Moreover,
an improved spatial pyramid pooling is introduced to pool and concatenate the
multi-scale local region features, so that the network can learn the object
features more comprehensively. The DC-SPP-YOLO model is established and trained
based on a new loss function composed of mean square error and cross entropy,
and the object detection is realized. Experiments demonstrate that the mAP
(mean Average Precision) of DC-SPP-YOLO proposed on PASCAL VOC datasets and
UA-DETRAC datasets is higher than that of YOLOv2; the object detection accuracy
of DC-SPP-YOLO is superior to YOLOv2 by strengthening feature extraction and
using the multi-scale local region features.Comment: 23 pages, 9 figures, 9 table
Object Detection in 20 Years: A Survey
Object detection, as of one the most fundamental and challenging problems in
computer vision, has received great attention in recent years. Its development
in the past two decades can be regarded as an epitome of computer vision
history. If we think of today's object detection as a technical aesthetics
under the power of deep learning, then turning back the clock 20 years we would
witness the wisdom of cold weapon era. This paper extensively reviews 400+
papers of object detection in the light of its technical evolution, spanning
over a quarter-century's time (from the 1990s to 2019). A number of topics have
been covered in this paper, including the milestone detectors in history,
detection datasets, metrics, fundamental building blocks of the detection
system, speed up techniques, and the recent state of the art detection methods.
This paper also reviews some important detection applications, such as
pedestrian detection, face detection, text detection, etc, and makes an in-deep
analysis of their challenges as well as technical improvements in recent years.Comment: This work has been submitted to the IEEE TPAMI for possible
publicatio
- …