951 research outputs found
Object Detection in 20 Years: A Survey
Object detection, as of one the most fundamental and challenging problems in
computer vision, has received great attention in recent years. Its development
in the past two decades can be regarded as an epitome of computer vision
history. If we think of today's object detection as a technical aesthetics
under the power of deep learning, then turning back the clock 20 years we would
witness the wisdom of cold weapon era. This paper extensively reviews 400+
papers of object detection in the light of its technical evolution, spanning
over a quarter-century's time (from the 1990s to 2019). A number of topics have
been covered in this paper, including the milestone detectors in history,
detection datasets, metrics, fundamental building blocks of the detection
system, speed up techniques, and the recent state of the art detection methods.
This paper also reviews some important detection applications, such as
pedestrian detection, face detection, text detection, etc, and makes an in-deep
analysis of their challenges as well as technical improvements in recent years.Comment: This work has been submitted to the IEEE TPAMI for possible
publicatio
Towards Large-Scale Small Object Detection: Survey and Benchmarks
With the rise of deep convolutional neural networks, object detection has
achieved prominent advances in past years. However, such prosperity could not
camouflage the unsatisfactory situation of Small Object Detection (SOD), one of
the notoriously challenging tasks in computer vision, owing to the poor visual
appearance and noisy representation caused by the intrinsic structure of small
targets. In addition, large-scale dataset for benchmarking small object
detection methods remains a bottleneck. In this paper, we first conduct a
thorough review of small object detection. Then, to catalyze the development of
SOD, we construct two large-scale Small Object Detection dAtasets (SODA),
SODA-D and SODA-A, which focus on the Driving and Aerial scenarios
respectively. SODA-D includes 24828 high-quality traffic images and 278433
instances of nine categories. For SODA-A, we harvest 2513 high resolution
aerial images and annotate 872069 instances over nine classes. The proposed
datasets, as we know, are the first-ever attempt to large-scale benchmarks with
a vast collection of exhaustively annotated instances tailored for
multi-category SOD. Finally, we evaluate the performance of mainstream methods
on SODA. We expect the released benchmarks could facilitate the development of
SOD and spawn more breakthroughs in this field. Datasets and codes are
available at: \url{https://shaunyuan22.github.io/SODA}
Vehicle Detection of Multi-source Remote Sensing Data Using Active Fine-tuning Network
Vehicle detection in remote sensing images has attracted increasing interest
in recent years. However, its detection ability is limited due to lack of
well-annotated samples, especially in densely crowded scenes. Furthermore,
since a list of remotely sensed data sources is available, efficient
exploitation of useful information from multi-source data for better vehicle
detection is challenging. To solve the above issues, a multi-source active
fine-tuning vehicle detection (Ms-AFt) framework is proposed, which integrates
transfer learning, segmentation, and active classification into a unified
framework for auto-labeling and detection. The proposed Ms-AFt employs a
fine-tuning network to firstly generate a vehicle training set from an
unlabeled dataset. To cope with the diversity of vehicle categories, a
multi-source based segmentation branch is then designed to construct additional
candidate object sets. The separation of high quality vehicles is realized by a
designed attentive classifications network. Finally, all three branches are
combined to achieve vehicle detection. Extensive experimental results conducted
on two open ISPRS benchmark datasets, namely the Vaihingen village and Potsdam
city datasets, demonstrate the superiority and effectiveness of the proposed
Ms-AFt for vehicle detection. In addition, the generalization ability of Ms-AFt
in dense remote sensing scenes is further verified on stereo aerial imagery of
a large camping site
Automated High-resolution Earth Observation Image Interpretation: Outcome of the 2020 Gaofen Challenge
In this article, we introduce the 2020 Gaofen Challenge and relevant scientific outcomes. The 2020 Gaofen Challenge is an international competition, which is organized by the China High-Resolution Earth Observation Conference Committee and the Aerospace Information Research Institute, Chinese Academy of Sciences and technically cosponsored by the IEEE Geoscience and Remote Sensing Society and the International Society for Photogrammetry and Remote Sensing. It aims at promoting the academic development of automated high-resolution earth observation image interpretation. Six independent tracks have been organized in this challenge, which cover the challenging problems in the field of object detection and semantic segmentation. With the development of convolutional neural networks, deep-learning-based methods have achieved good performance on image interpretation. In this article, we report the details and the best-performing methods presented so far in the scope of this challenge
Automatic image annotation system using deep learning method to analyse ambiguous images
Image annotation has gotten a lot of attention recently because of how quickly picture data has expanded. Together with image analysis and interpretation, image annotation, which may semantically describe images, has a variety of uses in allied industries including urban planning engineering. Even without big data and image identification technologies, it is challenging to manually analyze a diverse variety of photos. The improvements to the Automated Image Annotation (AIA) label system have been the subject of several scholarly research. The authors will discuss how to use image databases and the AIA system in this essay. The proposed method extracts image features from photos using an improved VGG-19, and then uses nearby features to automatically forecast picture labels. The proposed study accounts for both correlations between labels and images as well as correlations within images. The number of labels is also estimated using a label quantity prediction (LQP) model, which improves label prediction precision. The suggested method addresses automatic annotation methodologies for pixel-level images of unusual things while incorporating supervisory information via interactive spherical skins. The genuine things that were converted into metadata and identified as being connected to pre-existing categories were categorized by the authors using a deep learning approach called a conventional neural network (CNN) - supervised. Certain object monitoring systems strive for a high item detection rate (true-positive), followed by a low availability rate (false-positive). The authors created a KD-tree based on k-nearest neighbors (KNN) to speed up annotating. In order to take into account for the collected image backdrop. The proposed method transforms the conventional two-class object detection problem into a multi-class classification problem, breaking the separated and identical distribution estimations on machine learning methodologies. It is also simple to use because it only requires pixel information and ignores any other supporting elements from various color schemes. The following factors are taken into consideration while comparing the five different AIA approaches: main idea, significant contribution, computational framework, computing speed, and annotation accuracy. A set of publicly accessible photos that serve as standards for assessing AIA methods is also provided, along with a brief description of the four common assessment signs
CAD-Net: A Context-Aware Detection Network for Objects in Remote Sensing Imagery
Accurate and robust detection of multi-class objects in optical remote
sensing images is essential to many real-world applications such as urban
planning, traffic control, searching and rescuing, etc. However,
state-of-the-art object detection techniques designed for images captured using
ground-level sensors usually experience a sharp performance drop when directly
applied to remote sensing images, largely due to the object appearance
differences in remote sensing images in term of sparse texture, low contrast,
arbitrary orientations, large scale variations, etc. This paper presents a
novel object detection network (CAD-Net) that exploits attention-modulated
features as well as global and local contexts to address the new challenges in
detecting objects from remote sensing images. The proposed CAD-Net learns
global and local contexts of objects by capturing their correlations with the
global scene (at scene-level) and the local neighboring objects or features (at
object-level), respectively. In addition, it designs a spatial-and-scale-aware
attention module that guides the network to focus on more informative regions
and features as well as more appropriate feature scales. Experiments over two
publicly available object detection datasets for remote sensing images
demonstrate that the proposed CAD-Net achieves superior detection performance.
The implementation codes will be made publicly available for facilitating
future researches
Remote Sensing Object Detection Meets Deep Learning: A Meta-review of Challenges and Advances
Remote sensing object detection (RSOD), one of the most fundamental and
challenging tasks in the remote sensing field, has received longstanding
attention. In recent years, deep learning techniques have demonstrated robust
feature representation capabilities and led to a big leap in the development of
RSOD techniques. In this era of rapid technical evolution, this review aims to
present a comprehensive review of the recent achievements in deep learning
based RSOD methods. More than 300 papers are covered in this review. We
identify five main challenges in RSOD, including multi-scale object detection,
rotated object detection, weak object detection, tiny object detection, and
object detection with limited supervision, and systematically review the
corresponding methods developed in a hierarchical division manner. We also
review the widely used benchmark datasets and evaluation metrics within the
field of RSOD, as well as the application scenarios for RSOD. Future research
directions are provided for further promoting the research in RSOD.Comment: Accepted with IEEE Geoscience and Remote Sensing Magazine. More than
300 papers relevant to the RSOD filed were reviewed in this surve
Boundary-semantic collaborative guidance network with dual-stream feedback mechanism for salient object detection in optical remote sensing imagery
With the increasing application of deep learning in various domains, salient
object detection in optical remote sensing images (ORSI-SOD) has attracted
significant attention. However, most existing ORSI-SOD methods predominantly
rely on local information from low-level features to infer salient boundary
cues and supervise them using boundary ground truth, but fail to sufficiently
optimize and protect the local information, and almost all approaches ignore
the potential advantages offered by the last layer of the decoder to maintain
the integrity of saliency maps. To address these issues, we propose a novel
method named boundary-semantic collaborative guidance network (BSCGNet) with
dual-stream feedback mechanism. First, we propose a boundary protection
calibration (BPC) module, which effectively reduces the loss of edge position
information during forward propagation and suppresses noise in low-level
features without relying on boundary ground truth. Second, based on the BPC
module, a dual feature feedback complementary (DFFC) module is proposed, which
aggregates boundary-semantic dual features and provides effective feedback to
coordinate features across different layers, thereby enhancing cross-scale
knowledge communication. Finally, to obtain more complete saliency maps, we
consider the uniqueness of the last layer of the decoder for the first time and
propose the adaptive feedback refinement (AFR) module, which further refines
feature representation and eliminates differences between features through a
unique feedback mechanism. Extensive experiments on three benchmark datasets
demonstrate that BSCGNet exhibits distinct advantages in challenging scenarios
and outperforms the 17 state-of-the-art (SOTA) approaches proposed in recent
years. Codes and results have been released on GitHub:
https://github.com/YUHsss/BSCGNet.Comment: Accepted by TGR
- …