10 research outputs found
Learning to Relate from Captions and Bounding Boxes
In this work, we propose a novel approach that predicts the relationships
between various entities in an image in a weakly supervised manner by relying
on image captions and object bounding box annotations as the sole source of
supervision. Our proposed approach uses a top-down attention mechanism to align
entities in captions to objects in the image, and then leverage the syntactic
structure of the captions to align the relations. We use these alignments to
train a relation classification network, thereby obtaining both grounded
captions and dense relationships. We demonstrate the effectiveness of our model
on the Visual Genome dataset by achieving a recall@50 of 15% and recall@100 of
25% on the relationships present in the image. We also show that the model
successfully predicts relations that are not present in the corresponding
captions.Comment: ACL 201
Salvage of Supervision in Weakly Supervised Detection
Weakly supervised object detection (WSOD) has recently attracted much
attention. However, the method, performance and speed gaps between WSOD and
fully supervised detection prevent WSOD from being applied in real-world tasks.
To bridge the gaps, this paper proposes a new framework, Salvage of Supervision
(SoS), with the key idea being to harness every potentially useful supervisory
signal in WSOD: the weak image-level labels, the pseudo-labels, and the power
of semi-supervised object detection. This paper shows that each type of
supervisory signal brings in notable improvements, outperforms existing WSOD
methods (which mainly use only the weak labels) by large margins. The proposed
SoS-WSOD method achieves 64.4 on VOC2007, 61.9
on VOC2012 and 16.4 on MS-COCO, and also
has fast inference speed. Ablations and visualization further verify the
effectiveness of SoS
Dissimilarity coefficient based weakly supervised object detection
We consider the problem of weakly supervised object detection, where the training samples are annotated using only image-level labels that indicate the presence or absence of an object category. In order to model the uncertainty in the location of the objects, we employ a dissimilarity coefficient based probabilistic learning objective. The learning objective minimizes the difference between an annotation agnostic prediction distribution and an annotation aware conditional distribution. The main computational challenge is the complex nature of the conditional distribution, which consists of terms over hundreds or thousands of variables. The complexity of the conditional distribution rules out the possibility of explicitly modeling it. Instead, we exploit the fact that deep learning frameworks rely on stochastic optimization. This allows us to use a state of the art discrete generative model that can provide annotation consistent samples from the conditional distribution. Extensive experiments on PASCAL VOC 2007 and 2012 data sets demonstrate the efficacy of our proposed approach
Dissimilarity coefficient based weakly supervised object detection
We consider the problem of weakly supervised object detection, where the training samples are annotated using only image-level labels that indicate the presence or absence of an object category. In order to model the uncertainty in the location of the objects, we employ a dissimilarity coefficient based probabilistic learning objective. The learning objective minimizes the difference between an annotation agnostic prediction distribution and an annotation aware conditional distribution. The main computational challenge is the complex nature of the conditional distribution, which consists of terms over hundreds or thousands of variables. The complexity of the conditional distribution rules out the possibility of explicitly modeling it. Instead, we exploit the fact that deep learning frameworks rely on stochastic optimization. This allows us to use a state of the art discrete generative model that can provide annotation consistent samples from the conditional distribution. Extensive experiments on PASCAL VOC 2007 and 2012 data sets demonstrate the efficacy of our proposed approach