15,257 research outputs found
Re-mine, Learn and Reason: Exploring the Cross-modal Semantic Correlations for Language-guided HOI detection
Human-Object Interaction (HOI) detection is a challenging computer vision
task that requires visual models to address the complex interactive
relationship between humans and objects and predict HOI triplets. Despite the
challenges posed by the numerous interaction combinations, they also offer
opportunities for multimodal learning of visual texts. In this paper, we
present a systematic and unified framework (RmLR) that enhances HOI detection
by incorporating structured text knowledge. Firstly, we qualitatively and
quantitatively analyze the loss of interaction information in the two-stage HOI
detector and propose a re-mining strategy to generate more comprehensive visual
representation.Secondly, we design more fine-grained sentence- and word-level
alignment and knowledge transfer strategies to effectively address the
many-to-many matching problem between multiple interactions and multiple
texts.These strategies alleviate the matching confusion problem that arises
when multiple interactions occur simultaneously, thereby improving the
effectiveness of the alignment process. Finally, HOI reasoning by visual
features augmented with textual knowledge substantially improves the
understanding of interactions. Experimental results illustrate the
effectiveness of our approach, where state-of-the-art performance is achieved
on public benchmarks. We further analyze the effects of different components of
our approach to provide insights into its efficacy.Comment: ICCV202
Pedestrian Attribute Recognition: A Survey
Recognizing pedestrian attributes is an important task in computer vision
community due to it plays an important role in video surveillance. Many
algorithms has been proposed to handle this task. The goal of this paper is to
review existing works using traditional methods or based on deep learning
networks. Firstly, we introduce the background of pedestrian attributes
recognition (PAR, for short), including the fundamental concepts of pedestrian
attributes and corresponding challenges. Secondly, we introduce existing
benchmarks, including popular datasets and evaluation criterion. Thirdly, we
analyse the concept of multi-task learning and multi-label learning, and also
explain the relations between these two learning algorithms and pedestrian
attribute recognition. We also review some popular network architectures which
have widely applied in the deep learning community. Fourthly, we analyse
popular solutions for this task, such as attributes group, part-based,
\emph{etc}. Fifthly, we shown some applications which takes pedestrian
attributes into consideration and achieve better performance. Finally, we
summarized this paper and give several possible research directions for
pedestrian attributes recognition. The project page of this paper can be found
from the following website:
\url{https://sites.google.com/view/ahu-pedestrianattributes/}.Comment: Check our project page for High Resolution version of this survey:
https://sites.google.com/view/ahu-pedestrianattributes
- …