18 research outputs found
Diagnosing Rarity in Human-Object Interaction Detection
Human-object interaction (HOI) detection is a core task in computer vision.
The goal is to localize all human-object pairs and recognize their
interactions. An interaction defined by a tuple leads to a
long-tailed visual recognition challenge since many combinations are rarely
represented. The performance of the proposed models is limited especially for
the tail categories, but little has been done to understand the reason. To that
end, in this paper, we propose to diagnose rarity in HOI detection. We propose
a three-step strategy, namely Detection, Identification and Recognition where
we carefully analyse the limiting factors by studying state-of-the-art models.
Our findings indicate that detection and identification steps are altered by
the interaction signals like occlusion and relative location, as a result
limiting the recognition accuracy.Comment: Accepted at CVPR'20 Workshop on Learning from Limited Label
Detecting Human-Object Interactions via Functional Generalization
We present an approach for detecting human-object interactions (HOIs) in
images, based on the idea that humans interact with functionally similar
objects in a similar manner. The proposed model is simple and efficiently uses
the data, visual features of the human, relative spatial orientation of the
human and the object, and the knowledge that functionally similar objects take
part in similar interactions with humans. We provide extensive experimental
validation for our approach and demonstrate state-of-the-art results for HOI
detection. On the HICO-Det dataset our method achieves a gain of over 2.5%
absolute points in mean average precision (mAP) over state-of-the-art. We also
show that our approach leads to significant performance gains for zero-shot HOI
detection in the seen object setting. We further demonstrate that using a
generic object detector, our model can generalize to interactions involving
previously unseen objects.Comment: AAAI 202
Visual Compositional Learning for Human-Object Interaction Detection
Human-Object interaction (HOI) detection aims to localize and infer
relationships between human and objects in an image. It is challenging because
an enormous number of possible combinations of objects and verbs types forms a
long-tail distribution. We devise a deep Visual Compositional Learning (VCL)
framework, which is a simple yet efficient framework to effectively address
this problem. VCL first decomposes an HOI representation into object and verb
specific features, and then composes new interaction samples in the feature
space via stitching the decomposed features. The integration of decomposition
and composition enables VCL to share object and verb features among different
HOI samples and images, and to generate new interaction samples and new types
of HOI, and thus largely alleviates the long-tail distribution problem and
benefits low-shot or zero-shot HOI detection. Extensive experiments demonstrate
that the proposed VCL can effectively improve the generalization of HOI
detection on HICO-DET and V-COCO and outperforms the recent state-of-the-art
methods on HICO-DET. Code is available at https://github.com/zhihou7/VCL.Comment: Accepted in ECCV202
CPARR: Category-based Proposal Analysis for Referring Relationships
The task of referring relationships is to localize subject and object
entities in an image satisfying a relationship query, which is given in the
form of \texttt{}. This requires simultaneous
localization of the subject and object entities in a specified relationship. We
introduce a simple yet effective proposal-based method for referring
relationships. Different from the existing methods such as SSAS, our method can
generate a high-resolution result while reducing its complexity and ambiguity.
Our method is composed of two modules: a category-based proposal generation
module to select the proposals related to the entities and a predicate analysis
module to score the compatibility of pairs of selected proposals. We show
state-of-the-art performance on the referring relationship task on two public
datasets: Visual Relationship Detection and Visual Genome.Comment: CVPR 2020 Workshop on Multimodal Learnin