1,235 research outputs found
Object Detection in 20 Years: A Survey
Object detection, as of one the most fundamental and challenging problems in
computer vision, has received great attention in recent years. Its development
in the past two decades can be regarded as an epitome of computer vision
history. If we think of today's object detection as a technical aesthetics
under the power of deep learning, then turning back the clock 20 years we would
witness the wisdom of cold weapon era. This paper extensively reviews 400+
papers of object detection in the light of its technical evolution, spanning
over a quarter-century's time (from the 1990s to 2019). A number of topics have
been covered in this paper, including the milestone detectors in history,
detection datasets, metrics, fundamental building blocks of the detection
system, speed up techniques, and the recent state of the art detection methods.
This paper also reviews some important detection applications, such as
pedestrian detection, face detection, text detection, etc, and makes an in-deep
analysis of their challenges as well as technical improvements in recent years.Comment: This work has been submitted to the IEEE TPAMI for possible
publicatio
LR-CNN: Local-aware Region CNN for Vehicle Detection in Aerial Imagery
State-of-the-art object detection approaches such as Fast/Faster R-CNN, SSD,
or YOLO have difficulties detecting dense, small targets with arbitrary
orientation in large aerial images. The main reason is that using interpolation
to align RoI features can result in a lack of accuracy or even loss of location
information. We present the Local-aware Region Convolutional Neural Network
(LR-CNN), a novel two-stage approach for vehicle detection in aerial imagery.
We enhance translation invariance to detect dense vehicles and address the
boundary quantization issue amongst dense vehicles by aggregating the
high-precision RoIs' features. Moreover, we resample high-level semantic pooled
features, making them regain location information from the features of a
shallower convolutional block. This strengthens the local feature invariance
for the resampled features and enables detecting vehicles in an arbitrary
orientation. The local feature invariance enhances the learning ability of the
focal loss function, and the focal loss further helps to focus on the hard
examples. Taken together, our method better addresses the challenges of aerial
imagery. We evaluate our approach on several challenging datasets (VEDAI,
DOTA), demonstrating a significant improvement over state-of-the-art methods.
We demonstrate the good generalization ability of our approach on the DLR 3K
dataset.Comment: 8 page
Small-Object Detection in Remote Sensing Images with End-to-End Edge-Enhanced GAN and Object Detector Network
The detection performance of small objects in remote sensing images is not
satisfactory compared to large objects, especially in low-resolution and noisy
images. A generative adversarial network (GAN)-based model called enhanced
super-resolution GAN (ESRGAN) shows remarkable image enhancement performance,
but reconstructed images miss high-frequency edge information. Therefore,
object detection performance degrades for small objects on recovered noisy and
low-resolution remote sensing images. Inspired by the success of edge enhanced
GAN (EEGAN) and ESRGAN, we apply a new edge-enhanced super-resolution GAN
(EESRGAN) to improve the image quality of remote sensing images and use
different detector networks in an end-to-end manner where detector loss is
backpropagated into the EESRGAN to improve the detection performance. We
propose an architecture with three components: ESRGAN, Edge Enhancement Network
(EEN), and Detection network. We use residual-in-residual dense blocks (RRDB)
for both the ESRGAN and EEN, and for the detector network, we use the faster
region-based convolutional network (FRCNN) (two-stage detector) and single-shot
multi-box detector (SSD) (one stage detector). Extensive experiments on a
public (car overhead with context) and a self-assembled (oil and gas storage
tank) satellite dataset show superior performance of our method compared to the
standalone state-of-the-art object detectors.Comment: This paper contains 27 pages and accepted for publication in MDPI
remote sensing journal. GitHub Repository:
https://github.com/Jakaria08/EESRGAN (Implementation
Automatic CNN channel selection and effective detection on face and rotated aerial objects
Balancing accuracy and computational cost is a challenging task in computer vision. This is especially true for convolutional neural networks (CNNs), which required far larger scale of processing power than traditional learning algorithms. This thesis is aimed at the development of new CNN structures and loss functions to tackle the unbalanced accuracy-effciency issue in image classification and object detection, which are two fundamental yet challenging tasks of computer vision. For a CNN based object detector, the main computational cost is caused by the feature extractor (backbone), which has been originally applied to image classification.;Optimising the structure of CNN applied to image classification will bring benefits when it is applied to object detection. Although the outputs of detectors may vary across detection tasks, the challenges and the design principles among detectors are similar. Therefore, this thesis will start with face detection (i.e. a single object detection task), which is a significant branch of objection detection and has been widely used in real life. After that, object detection on aerial image will be investigated, which is a more challenging detection task.;Specifically, the objectives of this thesis are: 1. Optimising the CNN structures for image classification; 2. Developing a face detector which enables a trade-off between computational cost and accuracy; and 3. Proposing an object detector for aerial images, which suppresses the background noise without damaging the inference efficiency.;For the first target, this thesis aims to automatically optimise the topology of CNNs to generate the structure of fixed-length models, in which unnecessary convolutional kernels are removed. Experimental results have demonstrated that the optimised model can achieve comparable accuracy to the state-of-the-art models, across a broad range of datasets, whilst significantly reducing the number of parameters.;To tackle the unbalanced accuracy-effciency challenge in face detection, a novel context enhanced approach is proposed which improves the performance of the face detector in terms of both loss function and structure. For loss function optimisation, a hierarchical loss, referred to as 'triple loss' in this thesis, is introduced to optimise the feature pyramid network (FPN) based face detector. For structural optimisation, this thesis proposes a context-sensitive structure to increase the capacity of the network prediction. Experimental results indicate that the proposed method achieves a good balance between the accuracy and computational cost of face detection.;To suppress the background noise in aerial image object detection, this thesis presents a two-stage detector, named as 'SAFDet'. To be more specific, a rotation anchor-free-branch (RAFB) is proposed to regress the precise rectangle boundary. Asthe RAFB is anchor free, the computational cost is negligible during training. Meanwhile,a centre prediction module (CPM) is introduced to enhance the capabilities oftarget localisation and noise suppression from the background. As the CPM is only deployed during training, it does not increase the computational cost of inference. Experimental results indicate that the proposed method achieves a good balance between the accuracy and computational cost, and it effectively suppresses the background noise at the same time.Balancing accuracy and computational cost is a challenging task in computer vision. This is especially true for convolutional neural networks (CNNs), which required far larger scale of processing power than traditional learning algorithms. This thesis is aimed at the development of new CNN structures and loss functions to tackle the unbalanced accuracy-effciency issue in image classification and object detection, which are two fundamental yet challenging tasks of computer vision. For a CNN based object detector, the main computational cost is caused by the feature extractor (backbone), which has been originally applied to image classification.;Optimising the structure of CNN applied to image classification will bring benefits when it is applied to object detection. Although the outputs of detectors may vary across detection tasks, the challenges and the design principles among detectors are similar. Therefore, this thesis will start with face detection (i.e. a single object detection task), which is a significant branch of objection detection and has been widely used in real life. After that, object detection on aerial image will be investigated, which is a more challenging detection task.;Specifically, the objectives of this thesis are: 1. Optimising the CNN structures for image classification; 2. Developing a face detector which enables a trade-off between computational cost and accuracy; and 3. Proposing an object detector for aerial images, which suppresses the background noise without damaging the inference efficiency.;For the first target, this thesis aims to automatically optimise the topology of CNNs to generate the structure of fixed-length models, in which unnecessary convolutional kernels are removed. Experimental results have demonstrated that the optimised model can achieve comparable accuracy to the state-of-the-art models, across a broad range of datasets, whilst significantly reducing the number of parameters.;To tackle the unbalanced accuracy-effciency challenge in face detection, a novel context enhanced approach is proposed which improves the performance of the face detector in terms of both loss function and structure. For loss function optimisation, a hierarchical loss, referred to as 'triple loss' in this thesis, is introduced to optimise the feature pyramid network (FPN) based face detector. For structural optimisation, this thesis proposes a context-sensitive structure to increase the capacity of the network prediction. Experimental results indicate that the proposed method achieves a good balance between the accuracy and computational cost of face detection.;To suppress the background noise in aerial image object detection, this thesis presents a two-stage detector, named as 'SAFDet'. To be more specific, a rotation anchor-free-branch (RAFB) is proposed to regress the precise rectangle boundary. Asthe RAFB is anchor free, the computational cost is negligible during training. Meanwhile,a centre prediction module (CPM) is introduced to enhance the capabilities oftarget localisation and noise suppression from the background. As the CPM is only deployed during training, it does not increase the computational cost of inference. Experimental results indicate that the proposed method achieves a good balance between the accuracy and computational cost, and it effectively suppresses the background noise at the same time
- …