4 research outputs found

    Improving Pedestrian Attribute Recognition With Weakly-Supervised Multi-Scale Attribute-Specific Localization

    Full text link
    Pedestrian attribute recognition has been an emerging research topic in the area of video surveillance. To predict the existence of a particular attribute, it is demanded to localize the regions related to the attribute. However, in this task, the region annotations are not available. How to carve out these attribute-related regions remains challenging. Existing methods applied attribute-agnostic visual attention or heuristic body-part localization mechanisms to enhance the local feature representations, while neglecting to employ attributes to define local feature areas. We propose a flexible Attribute Localization Module (ALM) to adaptively discover the most discriminative regions and learns the regional features for each attribute at multiple levels. Moreover, a feature pyramid architecture is also introduced to enhance the attribute-specific localization at low-levels with high-level semantic guidance. The proposed framework does not require additional region annotations and can be trained end-to-end with multi-level deep supervision. Extensive experiments show that the proposed method achieves state-of-the-art results on three pedestrian attribute datasets, including PETA, RAP, and PA-100K.Comment: Accepted by ICCV 201

    Hierarchical Visual Primitive Experts for Compositional Zero-Shot Learning

    Full text link
    Compositional zero-shot learning (CZSL) aims to recognize unseen compositions with prior knowledge of known primitives (attribute and object). Previous works for CZSL often suffer from grasping the contextuality between attribute and object, as well as the discriminability of visual features, and the long-tailed distribution of real-world compositional data. We propose a simple and scalable framework called Composition Transformer (CoT) to address these issues. CoT employs object and attribute experts in distinctive manners to generate representative embeddings, using the visual network hierarchically. The object expert extracts representative object embeddings from the final layer in a bottom-up manner, while the attribute expert makes attribute embeddings in a top-down manner with a proposed object-guided attention module that models contextuality explicitly. To remedy biased prediction caused by imbalanced data distribution, we develop a simple minority attribute augmentation (MAA) that synthesizes virtual samples by mixing two images and oversampling minority attribute classes. Our method achieves SoTA performance on several benchmarks, including MIT-States, C-GQA, and VAW-CZSL. We also demonstrate the effectiveness of CoT in improving visual discrimination and addressing the model bias from the imbalanced data distribution. The code is available at https://github.com/HanjaeKim98/CoT.Comment: ICCV 202

    CAM-PAR

    Get PDF
    Graduate School of Artificial IntelligenceAs a sub-task of multi-label classification, a pedestrian attribute recognition (PAR) task aims to train a model to detect various attributes for a given image. To achieve better model performance, It is necessary to understand the characteristic of the pedestrian image. Inevitably, most of the pedestrian images have a low resolution because their source is from surveillance cameras, and it is known that some of the pedestrian attributes are highly correlated with each other. To reflect these characteristics, a number of previous methods are proposed. J.Jia et al., propose disentangled attribute feature learning (DAFL) framework for robust training against noisy pedestrian images. DAFL disentangles one-shared encoder feature to attribute specific features using multi-head attention and achieves significant improvements in model performance. But as additional modules are used for disentanglement, the model becomes more complicated. To address this, we propose Class Activation Map guided Pedestrian Attribute Recognition (CAM-PAR) that disentangle features with no need for additional parameters and explore the use of class activation map in multi-label classification domain. On the other hand, other works focus on relations in pedestrian attributes and propose methods that utilize this prior to predicting attributes. But these previous works are limited to modeling pairwise correlation of pedestrian attributes. We propose a Collaborative Filtering for Attribute Recognition (CFAR) module that models correlation of attribute sets using collaborative filtering and utilizes it for attribute prediction. Experiments on PA100K and RAPv1 datasets show that our proposed model surpasses the baseline method and has achieved competitive results against previous state-of-the-art methods.clos
    corecore