Search CORE

10 research outputs found

Learning to Relate from Captions and Bounding Boxes

Author: Aviral Anshu
Bollimpalli Priyatham
Garg Sarthak
Moniz Joel Ruben Antony
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2019
Field of study

In this work, we propose a novel approach that predicts the relationships between various entities in an image in a weakly supervised manner by relying on image captions and object bounding box annotations as the sole source of supervision. Our proposed approach uses a top-down attention mechanism to align entities in captions to objects in the image, and then leverage the syntactic structure of the captions to align the relations. We use these alignments to train a relation classification network, thereby obtaining both grounded captions and dense relationships. We demonstrate the effectiveness of our model on the Visual Genome dataset by achieving a recall@50 of 15% and recall@100 of 25% on the relationships present in the image. We also show that the model successfully predicts relations that are not present in the corresponding captions.Comment: ACL 201

arXiv.org e-Print Archive

Crossref

Salvage of Supervision in Weakly Supervised Detection

Author: Sui Lin
Wu Jianxin
Zhang Chen-Lin
Publication venue
Publication date: 07/06/2021
Field of study

Weakly supervised object detection (WSOD) has recently attracted much attention. However, the method, performance and speed gaps between WSOD and fully supervised detection prevent WSOD from being applied in real-world tasks. To bridge the gaps, this paper proposes a new framework, Salvage of Supervision (SoS), with the key idea being to harness every potentially useful supervisory signal in WSOD: the weak image-level labels, the pseudo-labels, and the power of semi-supervised object detection. This paper shows that each type of supervisory signal brings in notable improvements, outperforms existing WSOD methods (which mainly use only the weak labels) by large margins. The proposed SoS-WSOD method achieves 64.4

m\text{AP}_{50}

on VOC2007, 61.9

m\text{AP}_{50}

on VOC2012 and 16.4

m\text{AP}_{50:95}

on MS-COCO, and also has fast inference speed. Ablations and visualization further verify the effectiveness of SoS

arXiv.org e-Print Archive

Dissimilarity coefficient based weakly supervised object detection

Author: Arun A
Jawahar CV
Mudigonda P
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

We consider the problem of weakly supervised object detection, where the training samples are annotated using only image-level labels that indicate the presence or absence of an object category. In order to model the uncertainty in the location of the objects, we employ a dissimilarity coefficient based probabilistic learning objective. The learning objective minimizes the difference between an annotation agnostic prediction distribution and an annotation aware conditional distribution. The main computational challenge is the complex nature of the conditional distribution, which consists of terms over hundreds or thousands of variables. The complexity of the conditional distribution rules out the possibility of explicitly modeling it. Instead, we exploit the fact that deep learning frameworks rely on stochastic optimization. This allows us to use a state of the art discrete generative model that can provide annotation consistent samples from the conditional distribution. Extensive experiments on PASCAL VOC 2007 and 2012 data sets demonstrate the efficacy of our proposed approach

Oxford University Research Archive

Dissimilarity coefficient based weakly supervised object detection

Author: Arun A
Jawahar CV
Mudigonda P
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

Oxford University Research Archive