120 research outputs found
Mining Cross-Person Cues for Body-Part Interactiveness Learning in HOI Detection
Human-Object Interaction (HOI) detection plays a crucial role in activity
understanding. Though significant progress has been made, interactiveness
learning remains a challenging problem in HOI detection: existing methods
usually generate redundant negative H-O pair proposals and fail to effectively
extract interactive pairs. Though interactiveness has been studied in both
whole body- and part- level and facilitates the H-O pairing, previous works
only focus on the target person once (i.e., in a local perspective) and
overlook the information of the other persons. In this paper, we argue that
comparing body-parts of multi-person simultaneously can afford us more useful
and supplementary interactiveness cues. That said, to learn body-part
interactiveness from a global perspective: when classifying a target person's
body-part interactiveness, visual cues are explored not only from
herself/himself but also from other persons in the image. We construct
body-part saliency maps based on self-attention to mine cross-person
informative cues and learn the holistic relationships between all the
body-parts. We evaluate the proposed method on widely-used benchmarks HICO-DET
and V-COCO. With our new perspective, the holistic global-local body-part
interactiveness learning achieves significant improvements over
state-of-the-art. Our code is available at
https://github.com/enlighten0707/Body-Part-Map-for-Interactiveness.Comment: To appear in ECCV 202
Detecting Human-Object Interactions via Functional Generalization
We present an approach for detecting human-object interactions (HOIs) in
images, based on the idea that humans interact with functionally similar
objects in a similar manner. The proposed model is simple and efficiently uses
the data, visual features of the human, relative spatial orientation of the
human and the object, and the knowledge that functionally similar objects take
part in similar interactions with humans. We provide extensive experimental
validation for our approach and demonstrate state-of-the-art results for HOI
detection. On the HICO-Det dataset our method achieves a gain of over 2.5%
absolute points in mean average precision (mAP) over state-of-the-art. We also
show that our approach leads to significant performance gains for zero-shot HOI
detection in the seen object setting. We further demonstrate that using a
generic object detector, our model can generalize to interactions involving
previously unseen objects.Comment: AAAI 202
Human-Object Interaction Detection:A Quick Survey and Examination of Methods
Human-object interaction detection is a relatively new task in the world of
computer vision and visual semantic information extraction. With the goal of
machines identifying interactions that humans perform on objects, there are
many real-world use cases for the research in this field. To our knowledge,
this is the first general survey of the state-of-the-art and milestone works in
this field. We provide a basic survey of the developments in the field of
human-object interaction detection. Many works in this field use multi-stream
convolutional neural network architectures, which combine features from
multiple sources in the input image. Most commonly these are the humans and
objects in question, as well as the spatial quality of the two. As far as we
are aware, there have not been in-depth studies performed that look into the
performance of each component individually. In order to provide insight to
future researchers, we perform an individualized study that examines the
performance of each component of a multi-stream convolutional neural network
architecture for human-object interaction detection. Specifically, we examine
the HORCNN architecture as it is a foundational work in the field. In addition,
we provide an in-depth look at the HICO-DET dataset, a popular benchmark in the
field of human-object interaction detection. Code and papers can be found at
https://github.com/SHI-Labs/Human-Object-Interaction-Detection.Comment: Published at The 1st International Workshop On Human-Centric
Multimedia Analysis, at ACM Multimedia Conference 202
DecAug: Augmenting HOI Detection via Decomposition
Human-object interaction (HOI) detection requires a large amount of annotated
data. Current algorithms suffer from insufficient training samples and category
imbalance within datasets. To increase data efficiency, in this paper, we
propose an efficient and effective data augmentation method called DecAug for
HOI detection. Based on our proposed object state similarity metric, object
patterns across different HOIs are shared to augment local object appearance
features without changing their state. Further, we shift spatial correlation
between humans and objects to other feasible configurations with the aid of a
pose-guided Gaussian Mixture Model while preserving their interactions.
Experiments show that our method brings up to 3.3 mAP and 1.6 mAP improvements
on V-COCO and HICODET dataset for two advanced models. Specifically,
interactions with fewer samples enjoy more notable improvement. Our method can
be easily integrated into various HOI detection models with negligible extra
computational consumption. Our code will be made publicly available
- …