21 research outputs found
DecideNet: Counting Varying Density Crowds Through Attention Guided Detection and Density Estimation
In real-world crowd counting applications, the crowd densities vary greatly
in spatial and temporal domains. A detection based counting method will
estimate crowds accurately in low density scenes, while its reliability in
congested areas is downgraded. A regression based approach, on the other hand,
captures the general density information in crowded regions. Without knowing
the location of each person, it tends to overestimate the count in low density
areas. Thus, exclusively using either one of them is not sufficient to handle
all kinds of scenes with varying densities. To address this issue, a novel
end-to-end crowd counting framework, named DecideNet (DEteCtIon and Density
Estimation Network) is proposed. It can adaptively decide the appropriate
counting mode for different locations on the image based on its real density
conditions. DecideNet starts with estimating the crowd density by generating
detection and regression based density maps separately. To capture inevitable
variation in densities, it incorporates an attention module, meant to
adaptively assess the reliability of the two types of estimations. The final
crowd counts are obtained with the guidance of the attention module to adopt
suitable estimations from the two kinds of density maps. Experimental results
show that our method achieves state-of-the-art performance on three challenging
crowd counting datasets.Comment: CVPR 201
DAMSDet: Dynamic Adaptive Multispectral Detection Transformer with Competitive Query Selection and Adaptive Feature Fusion
Infrared-visible object detection aims to achieve robust even full-day object
detection by fusing the complementary information of infrared and visible
images. However, highly dynamically variable complementary characteristics and
commonly existing modality misalignment make the fusion of complementary
information difficult. In this paper, we propose a Dynamic Adaptive
Multispectral Detection Transformer (DAMSDet) to simultaneously address these
two challenges. Specifically, we propose a Modality Competitive Query Selection
strategy to provide useful prior information. This strategy can dynamically
select basic salient modality feature representation for each object. To
effectively mine the complementary information and adapt to misalignment
situations, we propose a Multispectral Deformable Cross-attention module to
adaptively sample and aggregate multi-semantic level features of infrared and
visible images for each object. In addition, we further adopt the cascade
structure of DETR to better mine complementary information. Experiments on four
public datasets of different scenes demonstrate significant improvements
compared to other state-of-the-art methods. The code will be released at
https://github.com/gjj45/DAMSDet
Two-Stream Contextualized CNN for Fine-Grained Image Classification
Human's cognition system prompts that context information provides potentially powerful clue while recognizing objects. However, for fine-grained image classification, the contribution of context may vary over different images, and sometimes the context even confuses the classification result. To alleviate this problem, in our work, we develop a novel approach, two-stream contextualized Convolutional Neural Network, which provides a simple but efficient context-content joint classification model under deep learning framework. The network merely requires the raw image and a coarse segmentation as input to extract both content and context features without need of human interaction. Moreover, our network adopts a weighted fusion scheme to combine the content and the context classifiers, while a subnetwork is introduced to adaptively determine the weight for each image. According to our experiments on public datasets, our approach achieves considerable high recognition accuracy without any tedious human's involvements, as compared with the state-of-the-art approaches
Grayscale-Inversion and Rotation Invariant Texture Description Using Sorted Local Gradient Pattern
Few-Shot Remote Sensing Scene Classification via Subspace Based on Multiscale Feature Learning
Because of the challenges associated with the difficulty of accurately labeling the remote sensing (RS) scene images and the need to identify new scene classes, few-shot learning has shown significant advantages in addressing the remote sensing scene classification (RSSC) tasks, leading to a growing interest. However, due to the scale variations of targets and irrelevant complex background in scene images, the current few-shot methods exist the following problems: the problem of the extraction capability of feature extractor in the few-shot mechanism; the problem of the separability of few-shot RS scene images classifier. To solve the above problems, an approach, called few-shot RSSC via subspace based on multiscale feature learning is introduced in this work. We first design a multiscale feature learning technique to address scale variations of the targets in the scene images. Concretely, different branches are utilized to learn scene features at various scales. The self-attention mechanism is embedded in each branch to incorporate the understanding of the global information in the different scale features. After that, a multiscale feature fusion operation, incorporating channel attention, will be devised to effectively merge the different scale features, so as to obtain a more precise feature representation of RS scene images. Furthermore, the subspace is utilized to capture the shared characteristics of each category, to reduce the impact of the complex irrelevant backgrounds in the scene images. The results of our experiments conducted on the public available RS scene datasets demonstrate the strong competitiveness of our approach