3,385 research outputs found

    Extreme clicking for efficient object annotation

    Get PDF
    Manually annotating object bounding boxes is central to building computer vision datasets, and it is very time consuming (annotating ILSVRC [53] took 35s for one high-quality box [62]). It involves clicking on imaginary corners of a tight box around the object. This is difficult as these corners are often outside the actual object and several adjustments are required to obtain a tight box. We propose extreme clicking instead: we ask the annotator to click on four physical points on the object: the top, bottom, left- and right-most points. This task is more natural and these points are easy to find. We crowd-source extreme point annotations for PASCAL VOC 2007 and 2012 and show that (1) annotation time is only 7s per box, 5x faster than the traditional way of drawing boxes [62]; (2) the quality of the boxes is as good as the original ground-truth drawn the traditional way; (3) detectors trained on our annotations are as accurate as those trained on the original ground-truth. Moreover, our extreme clicking strategy not only yields box coordinates, but also four accurate boundary points. We show (4) how to incorporate them into GrabCut to obtain more accurate segmentations than those delivered when initializing it from bounding boxes; (5) semantic segmentations models trained on these segmentations outperform those trained on segmentations derived from bounding boxes.Comment: ICCV 201

    Deep Extreme Cut: From Extreme Points to Object Segmentation

    Full text link
    This paper explores the use of extreme points in an object (left-most, right-most, top, bottom pixels) as input to obtain precise object segmentation for images and videos. We do so by adding an extra channel to the image in the input of a convolutional neural network (CNN), which contains a Gaussian centered in each of the extreme points. The CNN learns to transform this information into a segmentation of an object that matches those extreme points. We demonstrate the usefulness of this approach for guided segmentation (grabcut-style), interactive segmentation, video object segmentation, and dense segmentation annotation. We show that we obtain the most precise results to date, also with less user input, in an extensive and varied selection of benchmarks and datasets. All our models and code are publicly available on http://www.vision.ee.ethz.ch/~cvlsegmentation/dextr/.Comment: CVPR 2018 camera ready. Project webpage and code: http://www.vision.ee.ethz.ch/~cvlsegmentation/dextr

    Learning Intelligent Dialogs for Bounding Box Annotation

    Get PDF
    We introduce Intelligent Annotation Dialogs for bounding box annotation. We train an agent to automatically choose a sequence of actions for a human annotator to produce a bounding box in a minimal amount of time. Specifically, we consider two actions: box verification, where the annotator verifies a box generated by an object detector, and manual box drawing. We explore two kinds of agents, one based on predicting the probability that a box will be positively verified, and the other based on reinforcement learning. We demonstrate that (1) our agents are able to learn efficient annotation strategies in several scenarios, automatically adapting to the image difficulty, the desired quality of the boxes, and the detector strength; (2) in all scenarios the resulting annotation dialogs speed up annotation compared to manual box drawing alone and box verification alone, while also outperforming any fixed combination of verification and drawing in most scenarios; (3) in a realistic scenario where the detector is iteratively re-trained, our agents evolve a series of strategies that reflect the shifting trade-off between verification and drawing as the detector grows stronger.Comment: This paper appeared at CVPR 201

    Efficient human annotation schemes for training object class detectors

    Get PDF
    A central task in computer vision is detecting object classes such as cars and horses in complex scenes. Training an object class detector typically requires a large set of images labeled with tight bounding boxes around every object instance. Obtaining such data requires human annotation, which is very expensive and time consuming. Alternatively, researchers have tried to train models in a weakly supervised setting (i.e., given only image-level labels), which is much cheaper but leads to weaker detectors. In this thesis, we propose new and efficient human annotation schemes for training object class detectors that bypass the need for drawing bounding boxes and reduce the annotation cost while still obtaining high quality object detectors. First, we propose to train object class detectors from eye tracking data. Instead of drawing tight bounding boxes, the annotators only need to look at the image and find the target object. We track the eye movements of annotators while they perform this visual search task and we propose a technique for deriving object bounding boxes from these eye fixations. To validate our idea, we augment an existing object detection dataset with eye tracking data. Second, we propose a scheme for training object class detectors, which only requires annotators to verify bounding-boxes produced automatically by the learning algorithm. Our scheme introduces human verification as a new step into a standard weakly supervised framework which typically iterates between re-training object detectors and re-localizing objects in the training images. We use the verification signal to improve both re-training and re-localization. Third, we propose another scheme where annotators are asked to click on the center of an imaginary bounding box, which tightly encloses the object. We then incorporate these clicks into a weakly supervised object localization technique, to jointly localize object bounding boxes over all training images. Both our center-clicking and human verification schemes deliver detectors performing almost as well as those trained in a fully supervised setting. Finally, we propose extreme clicking. We ask the annotator to click on four physical points on the object: the top, bottom, left- and right-most points. This task is more natural than the traditional way of drawing boxes and these points are easy to find. Our experiments show that annotating objects with extreme clicking is 5 X faster than the traditional way of drawing boxes and it leads to boxes of the same quality as the original ground-truth drawn the traditional way. Moreover, we use the resulting extreme points to obtain more accurate segmentations than those derived from bounding boxes
    • …
    corecore