3,467 research outputs found
Extreme clicking for efficient object annotation
Manually annotating object bounding boxes is central to building computer
vision datasets, and it is very time consuming (annotating ILSVRC [53] took 35s
for one high-quality box [62]). It involves clicking on imaginary corners of a
tight box around the object. This is difficult as these corners are often
outside the actual object and several adjustments are required to obtain a
tight box. We propose extreme clicking instead: we ask the annotator to click
on four physical points on the object: the top, bottom, left- and right-most
points. This task is more natural and these points are easy to find. We
crowd-source extreme point annotations for PASCAL VOC 2007 and 2012 and show
that (1) annotation time is only 7s per box, 5x faster than the traditional way
of drawing boxes [62]; (2) the quality of the boxes is as good as the original
ground-truth drawn the traditional way; (3) detectors trained on our
annotations are as accurate as those trained on the original ground-truth.
Moreover, our extreme clicking strategy not only yields box coordinates, but
also four accurate boundary points. We show (4) how to incorporate them into
GrabCut to obtain more accurate segmentations than those delivered when
initializing it from bounding boxes; (5) semantic segmentations models trained
on these segmentations outperform those trained on segmentations derived from
bounding boxes.Comment: ICCV 201
Deep Extreme Cut: From Extreme Points to Object Segmentation
This paper explores the use of extreme points in an object (left-most,
right-most, top, bottom pixels) as input to obtain precise object segmentation
for images and videos. We do so by adding an extra channel to the image in the
input of a convolutional neural network (CNN), which contains a Gaussian
centered in each of the extreme points. The CNN learns to transform this
information into a segmentation of an object that matches those extreme points.
We demonstrate the usefulness of this approach for guided segmentation
(grabcut-style), interactive segmentation, video object segmentation, and dense
segmentation annotation. We show that we obtain the most precise results to
date, also with less user input, in an extensive and varied selection of
benchmarks and datasets. All our models and code are publicly available on
http://www.vision.ee.ethz.ch/~cvlsegmentation/dextr/.Comment: CVPR 2018 camera ready. Project webpage and code:
http://www.vision.ee.ethz.ch/~cvlsegmentation/dextr
Learning Intelligent Dialogs for Bounding Box Annotation
We introduce Intelligent Annotation Dialogs for bounding box annotation. We
train an agent to automatically choose a sequence of actions for a human
annotator to produce a bounding box in a minimal amount of time. Specifically,
we consider two actions: box verification, where the annotator verifies a box
generated by an object detector, and manual box drawing. We explore two kinds
of agents, one based on predicting the probability that a box will be
positively verified, and the other based on reinforcement learning. We
demonstrate that (1) our agents are able to learn efficient annotation
strategies in several scenarios, automatically adapting to the image
difficulty, the desired quality of the boxes, and the detector strength; (2) in
all scenarios the resulting annotation dialogs speed up annotation compared to
manual box drawing alone and box verification alone, while also outperforming
any fixed combination of verification and drawing in most scenarios; (3) in a
realistic scenario where the detector is iteratively re-trained, our agents
evolve a series of strategies that reflect the shifting trade-off between
verification and drawing as the detector grows stronger.Comment: This paper appeared at CVPR 201
Efficient human annotation schemes for training object class detectors
A central task in computer vision is detecting object classes such as cars and horses
in complex scenes. Training an object class detector typically requires a large set of
images labeled with tight bounding boxes around every object instance. Obtaining
such data requires human annotation, which is very expensive and time consuming.
Alternatively, researchers have tried to train models in a weakly supervised setting (i.e.,
given only image-level labels), which is much cheaper but leads to weaker detectors.
In this thesis, we propose new and efficient human annotation schemes for training
object class detectors that bypass the need for drawing bounding boxes and reduce the
annotation cost while still obtaining high quality object detectors.
First, we propose to train object class detectors from eye tracking data. Instead
of drawing tight bounding boxes, the annotators only need to look at the image and
find the target object. We track the eye movements of annotators while they perform
this visual search task and we propose a technique for deriving object bounding boxes
from these eye fixations. To validate our idea, we augment an existing object detection
dataset with eye tracking data.
Second, we propose a scheme for training object class detectors, which only requires
annotators to verify bounding-boxes produced automatically by the learning
algorithm. Our scheme introduces human verification as a new step into a standard
weakly supervised framework which typically iterates between re-training object detectors
and re-localizing objects in the training images. We use the verification signal
to improve both re-training and re-localization.
Third, we propose another scheme where annotators are asked to click on the center
of an imaginary bounding box, which tightly encloses the object. We then incorporate
these clicks into a weakly supervised object localization technique, to jointly localize
object bounding boxes over all training images. Both our center-clicking and human
verification schemes deliver detectors performing almost as well as those trained in a
fully supervised setting.
Finally, we propose extreme clicking. We ask the annotator to click on four physical
points on the object: the top, bottom, left- and right-most points. This task is more
natural than the traditional way of drawing boxes and these points are easy to find. Our
experiments show that annotating objects with extreme clicking is 5 X faster than the
traditional way of drawing boxes and it leads to boxes of the same quality as the original
ground-truth drawn the traditional way. Moreover, we use the resulting extreme
points to obtain more accurate segmentations than those derived from bounding boxes
- …