3,820 research outputs found

    Category-Aware Spatial Constraint for Weakly Supervised Object Detection

    Get PDF
    目标检测(objectdetection)是计算机视觉领域里一个非常重要的研究问题。随着近年来深度卷积神经网络的发展,其中基于深度学习的目标检测算法在性能上取得了巨大的进步。但是目前最先进的目标检测算法需要带有精确目标物体位置标签的数据来训练模型,而这种标签信息需要花费大量人力物力来标注,同时也会引入人工标注偏差。 本文的研究内容是基于弱监督学习的目标检测问题,即没有精确的目标物体位置标签,只用图像的类别标签来学习目标检测器。基于弱监督学习的目标检测有着广泛的应用和重要的意义,也是近年来的计算机视觉领域的热门研究点。当前基于弱监督学习的目标检测算法大多是基于局部、候选区域层次的信息。对此,本...Visual object detection is sitting on the core of computer vision research. Recently with the development of deep convolutional neural networks, object detections based on deep learning have achieved great progresses. However, current the state-of-the-art object detection algorithms require training dataset with object-level label to learn the models. It is time-consuming and labor-intensive to ob...学位:工程硕士院系专业:信息科学与技术学院_工程硕士(计算机技术)学号:3152014115330

    Weakly-supervised Visual Grounding of Phrases with Linguistic Structures

    Full text link
    We propose a weakly-supervised approach that takes image-sentence pairs as input and learns to visually ground (i.e., localize) arbitrary linguistic phrases, in the form of spatial attention masks. Specifically, the model is trained with images and their associated image-level captions, without any explicit region-to-phrase correspondence annotations. To this end, we introduce an end-to-end model which learns visual groundings of phrases with two types of carefully designed loss functions. In addition to the standard discriminative loss, which enforces that attended image regions and phrases are consistently encoded, we propose a novel structural loss which makes use of the parse tree structures induced by the sentences. In particular, we ensure complementarity among the attention masks that correspond to sibling noun phrases, and compositionality of attention masks among the children and parent phrases, as defined by the sentence parse tree. We validate the effectiveness of our approach on the Microsoft COCO and Visual Genome datasets.Comment: CVPR 201
    corecore