Although we have witnessed significant progress in human-object interaction
(HOI) detection with increasingly high mAP (mean Average Precision), a single
mAP score is too concise to obtain an informative summary of a model's
performance and to understand why one approach is better than another. In this
paper, we introduce a diagnosis toolbox for analyzing the error sources of the
existing HOI detection models. We first conduct holistic investigations in the
pipeline of HOI detection, consisting of human-object pair detection and then
interaction classification. We define a set of errors and the oracles to fix
each of them. By measuring the mAP improvement obtained from fixing an error
using its oracle, we can have a detailed analysis of the significance of
different errors. We then delve into the human-object detection and interaction
classification, respectively, and check the model's behavior. For the first
detection task, we investigate both recall and precision, measuring the
coverage of ground-truth human-object pairs as well as the noisiness level in
the detections. For the second classification task, we compute mAP for
interaction classification only, without considering the detection scores. We
also measure the performance of the models in differentiating human-object
pairs with and without actual interactions using the AP (Average Precision)
score. Our toolbox is applicable for different methods across different
datasets and available at https://github.com/neu-vi/Diag-HOI