2,467 research outputs found
A CNN based hybrid approach towards automatic image registration
Image registration is a key component of spatial analyses that involve different data sets of the same area. Automatic approaches in this domain have witnessed the application of several intelligent methodologies over the past decade; however accuracy of these approaches have been limited due to the inability to properly model shape as well as contextual information. In this paper, we investigate the possibility of an evolutionary computing based framework towards automatic image registration. Cellular Neural Network has been found to be effective in improving feature matching as well as resampling stages of registration, and complexity of the approach has been considerably reduced using corset optimization. CNN-prolog based approach has been adopted to dynamically use spectral and spatial information for representing contextual knowledge. The salient features of this work are feature point optimisation, adaptive resampling and intelligent object modelling. Investigations over various satellite images revealed that considerable success has been achieved with the procedure. Methodology also illustrated to be effective in providing intelligent interpretation and adaptive resampling
Object Detection in 20 Years: A Survey
Object detection, as of one the most fundamental and challenging problems in
computer vision, has received great attention in recent years. Its development
in the past two decades can be regarded as an epitome of computer vision
history. If we think of today's object detection as a technical aesthetics
under the power of deep learning, then turning back the clock 20 years we would
witness the wisdom of cold weapon era. This paper extensively reviews 400+
papers of object detection in the light of its technical evolution, spanning
over a quarter-century's time (from the 1990s to 2019). A number of topics have
been covered in this paper, including the milestone detectors in history,
detection datasets, metrics, fundamental building blocks of the detection
system, speed up techniques, and the recent state of the art detection methods.
This paper also reviews some important detection applications, such as
pedestrian detection, face detection, text detection, etc, and makes an in-deep
analysis of their challenges as well as technical improvements in recent years.Comment: This work has been submitted to the IEEE TPAMI for possible
publicatio
CPSeg: Finer-grained Image Semantic Segmentation via Chain-of-Thought Language Prompting
Natural scene analysis and remote sensing imagery offer immense potential for
advancements in large-scale language-guided context-aware data utilization.
This potential is particularly significant for enhancing performance in
downstream tasks such as object detection and segmentation with designed
language prompting. In light of this, we introduce the CPSeg, Chain-of-Thought
Language Prompting for Finer-grained Semantic Segmentation), an innovative
framework designed to augment image segmentation performance by integrating a
novel "Chain-of-Thought" process that harnesses textual information associated
with images. This groundbreaking approach has been applied to a flood disaster
scenario. CPSeg encodes prompt texts derived from various sentences to
formulate a coherent chain-of-thought. We propose a new vision-language
dataset, FloodPrompt, which includes images, semantic masks, and corresponding
text information. This not only strengthens the semantic understanding of the
scenario but also aids in the key task of semantic segmentation through an
interplay of pixel and text matching maps. Our qualitative and quantitative
analyses validate the effectiveness of CPSeg.Comment: WACV 202
From Blurry to Brilliant Detection: YOLOv5-Based Aerial Object Detection with Super Resolution
The demand for accurate object detection in aerial imagery has surged with
the widespread use of drones and satellite technology. Traditional object
detection models, trained on datasets biased towards large objects, struggle to
perform optimally in aerial scenarios where small, densely clustered objects
are prevalent. To address this challenge, we present an innovative approach
that combines super-resolution and an adapted lightweight YOLOv5 architecture.
We employ a range of datasets, including VisDrone-2023, SeaDroneSee, VEDAI, and
NWPU VHR-10, to evaluate our model's performance. Our Super Resolved YOLOv5
architecture features Transformer encoder blocks, allowing the model to capture
global context and context information, leading to improved detection results,
especially in high-density, occluded conditions. This lightweight model not
only delivers improved accuracy but also ensures efficient resource
utilization, making it well-suited for real-time applications. Our experimental
results demonstrate the model's superior performance in detecting small and
densely clustered objects, underlining the significance of dataset choice and
architectural adaptation for this specific task. In particular, the method
achieves 52.5% mAP on VisDrone, exceeding top prior works. This approach
promises to significantly advance object detection in aerial imagery,
contributing to more accurate and reliable results in a variety of real-world
applications
Towards Large-Scale Small Object Detection: Survey and Benchmarks
With the rise of deep convolutional neural networks, object detection has
achieved prominent advances in past years. However, such prosperity could not
camouflage the unsatisfactory situation of Small Object Detection (SOD), one of
the notoriously challenging tasks in computer vision, owing to the poor visual
appearance and noisy representation caused by the intrinsic structure of small
targets. In addition, large-scale dataset for benchmarking small object
detection methods remains a bottleneck. In this paper, we first conduct a
thorough review of small object detection. Then, to catalyze the development of
SOD, we construct two large-scale Small Object Detection dAtasets (SODA),
SODA-D and SODA-A, which focus on the Driving and Aerial scenarios
respectively. SODA-D includes 24828 high-quality traffic images and 278433
instances of nine categories. For SODA-A, we harvest 2513 high resolution
aerial images and annotate 872069 instances over nine classes. The proposed
datasets, as we know, are the first-ever attempt to large-scale benchmarks with
a vast collection of exhaustively annotated instances tailored for
multi-category SOD. Finally, we evaluate the performance of mainstream methods
on SODA. We expect the released benchmarks could facilitate the development of
SOD and spawn more breakthroughs in this field. Datasets and codes are
available at: \url{https://shaunyuan22.github.io/SODA}
- …