4 research outputs found
Improving Pedestrian Attribute Recognition With Weakly-Supervised Multi-Scale Attribute-Specific Localization
Pedestrian attribute recognition has been an emerging research topic in the
area of video surveillance. To predict the existence of a particular attribute,
it is demanded to localize the regions related to the attribute. However, in
this task, the region annotations are not available. How to carve out these
attribute-related regions remains challenging. Existing methods applied
attribute-agnostic visual attention or heuristic body-part localization
mechanisms to enhance the local feature representations, while neglecting to
employ attributes to define local feature areas. We propose a flexible
Attribute Localization Module (ALM) to adaptively discover the most
discriminative regions and learns the regional features for each attribute at
multiple levels. Moreover, a feature pyramid architecture is also introduced to
enhance the attribute-specific localization at low-levels with high-level
semantic guidance. The proposed framework does not require additional region
annotations and can be trained end-to-end with multi-level deep supervision.
Extensive experiments show that the proposed method achieves state-of-the-art
results on three pedestrian attribute datasets, including PETA, RAP, and
PA-100K.Comment: Accepted by ICCV 201
Hierarchical Visual Primitive Experts for Compositional Zero-Shot Learning
Compositional zero-shot learning (CZSL) aims to recognize unseen compositions
with prior knowledge of known primitives (attribute and object). Previous works
for CZSL often suffer from grasping the contextuality between attribute and
object, as well as the discriminability of visual features, and the long-tailed
distribution of real-world compositional data. We propose a simple and scalable
framework called Composition Transformer (CoT) to address these issues. CoT
employs object and attribute experts in distinctive manners to generate
representative embeddings, using the visual network hierarchically. The object
expert extracts representative object embeddings from the final layer in a
bottom-up manner, while the attribute expert makes attribute embeddings in a
top-down manner with a proposed object-guided attention module that models
contextuality explicitly. To remedy biased prediction caused by imbalanced data
distribution, we develop a simple minority attribute augmentation (MAA) that
synthesizes virtual samples by mixing two images and oversampling minority
attribute classes. Our method achieves SoTA performance on several benchmarks,
including MIT-States, C-GQA, and VAW-CZSL. We also demonstrate the
effectiveness of CoT in improving visual discrimination and addressing the
model bias from the imbalanced data distribution. The code is available at
https://github.com/HanjaeKim98/CoT.Comment: ICCV 202
CAM-PAR
Graduate School of Artificial IntelligenceAs a sub-task of multi-label classification, a pedestrian attribute recognition (PAR) task aims to train a model to detect various attributes for a given image. To achieve better model performance, It is necessary to understand the characteristic of the pedestrian image. Inevitably, most of the pedestrian images have a low resolution because their source is from surveillance cameras, and it is known that some of the pedestrian attributes are highly correlated with each other. To reflect these characteristics, a number of previous methods are proposed. J.Jia et al., propose disentangled attribute feature learning (DAFL) framework for robust training against noisy pedestrian images. DAFL disentangles one-shared encoder feature to attribute specific features using multi-head attention and achieves significant improvements in model performance. But as additional modules are used for disentanglement, the model becomes more complicated. To address this, we propose Class Activation Map guided Pedestrian Attribute Recognition (CAM-PAR) that disentangle features with no need for additional parameters and explore the use of class activation map in multi-label classification domain. On the other hand, other works focus on relations in pedestrian attributes and propose methods that utilize this prior to predicting attributes. But these previous works are limited to modeling pairwise correlation of pedestrian attributes. We propose a Collaborative Filtering for Attribute Recognition (CFAR) module that models correlation of attribute sets using collaborative filtering and utilizes it for attribute prediction. Experiments on PA100K and RAPv1 datasets show that our proposed model surpasses the baseline method and has achieved competitive results against previous state-of-the-art methods.clos