21 research outputs found

    DecideNet: Counting Varying Density Crowds Through Attention Guided Detection and Density Estimation

    Full text link
    In real-world crowd counting applications, the crowd densities vary greatly in spatial and temporal domains. A detection based counting method will estimate crowds accurately in low density scenes, while its reliability in congested areas is downgraded. A regression based approach, on the other hand, captures the general density information in crowded regions. Without knowing the location of each person, it tends to overestimate the count in low density areas. Thus, exclusively using either one of them is not sufficient to handle all kinds of scenes with varying densities. To address this issue, a novel end-to-end crowd counting framework, named DecideNet (DEteCtIon and Density Estimation Network) is proposed. It can adaptively decide the appropriate counting mode for different locations on the image based on its real density conditions. DecideNet starts with estimating the crowd density by generating detection and regression based density maps separately. To capture inevitable variation in densities, it incorporates an attention module, meant to adaptively assess the reliability of the two types of estimations. The final crowd counts are obtained with the guidance of the attention module to adopt suitable estimations from the two kinds of density maps. Experimental results show that our method achieves state-of-the-art performance on three challenging crowd counting datasets.Comment: CVPR 201

    DAMSDet: Dynamic Adaptive Multispectral Detection Transformer with Competitive Query Selection and Adaptive Feature Fusion

    Full text link
    Infrared-visible object detection aims to achieve robust even full-day object detection by fusing the complementary information of infrared and visible images. However, highly dynamically variable complementary characteristics and commonly existing modality misalignment make the fusion of complementary information difficult. In this paper, we propose a Dynamic Adaptive Multispectral Detection Transformer (DAMSDet) to simultaneously address these two challenges. Specifically, we propose a Modality Competitive Query Selection strategy to provide useful prior information. This strategy can dynamically select basic salient modality feature representation for each object. To effectively mine the complementary information and adapt to misalignment situations, we propose a Multispectral Deformable Cross-attention module to adaptively sample and aggregate multi-semantic level features of infrared and visible images for each object. In addition, we further adopt the cascade structure of DETR to better mine complementary information. Experiments on four public datasets of different scenes demonstrate significant improvements compared to other state-of-the-art methods. The code will be released at https://github.com/gjj45/DAMSDet

    Two-Stream Contextualized CNN for Fine-Grained Image Classification

    No full text
    Human's cognition system prompts that context information provides potentially powerful clue while recognizing objects. However, for fine-grained image classification, the contribution of context may vary over different images, and sometimes the context even confuses the classification result. To alleviate this problem, in our work, we develop a novel approach, two-stream contextualized Convolutional Neural Network, which provides a simple but efficient context-content joint classification model under deep learning framework. The network merely requires the raw image and a coarse segmentation as input to extract both content and context features without need of human interaction. Moreover, our network adopts a weighted fusion scheme to combine the content and the context classifiers, while a subnetwork is introduced to adaptively determine the weight for each image. According to our experiments on public datasets, our approach achieves considerable high recognition accuracy without any tedious human's involvements, as compared with the state-of-the-art approaches

    Few-Shot Remote Sensing Scene Classification via Subspace Based on Multiscale Feature Learning

    No full text
    Because of the challenges associated with the difficulty of accurately labeling the remote sensing (RS) scene images and the need to identify new scene classes, few-shot learning has shown significant advantages in addressing the remote sensing scene classification (RSSC) tasks, leading to a growing interest. However, due to the scale variations of targets and irrelevant complex background in scene images, the current few-shot methods exist the following problems: the problem of the extraction capability of feature extractor in the few-shot mechanism; the problem of the separability of few-shot RS scene images classifier. To solve the above problems, an approach, called few-shot RSSC via subspace based on multiscale feature learning is introduced in this work. We first design a multiscale feature learning technique to address scale variations of the targets in the scene images. Concretely, different branches are utilized to learn scene features at various scales. The self-attention mechanism is embedded in each branch to incorporate the understanding of the global information in the different scale features. After that, a multiscale feature fusion operation, incorporating channel attention, will be devised to effectively merge the different scale features, so as to obtain a more precise feature representation of RS scene images. Furthermore, the subspace is utilized to capture the shared characteristics of each category, to reduce the impact of the complex irrelevant backgrounds in the scene images. The results of our experiments conducted on the public available RS scene datasets demonstrate the strong competitiveness of our approach
    corecore