886 research outputs found
Click Carving: Segmenting Objects in Video with Point Clicks
We present a novel form of interactive video object segmentation where a few
clicks by the user helps the system produce a full spatio-temporal segmentation
of the object of interest. Whereas conventional interactive pipelines take the
user's initialization as a starting point, we show the value in the system
taking the lead even in initialization. In particular, for a given video frame,
the system precomputes a ranked list of thousands of possible segmentation
hypotheses (also referred to as object region proposals) using image and motion
cues. Then, the user looks at the top ranked proposals, and clicks on the
object boundary to carve away erroneous ones. This process iterates (typically
2-3 times), and each time the system revises the top ranked proposal set, until
the user is satisfied with a resulting segmentation mask. Finally, the mask is
propagated across the video to produce a spatio-temporal object tube. On three
challenging datasets, we provide extensive comparisons with both existing work
and simpler alternative methods. In all, the proposed Click Carving approach
strikes an excellent balance of accuracy and human effort. It outperforms all
similarly fast methods, and is competitive or better than those requiring 2 to
12 times the effort.Comment: A preliminary version of the material in this document was filed as
University of Texas technical report no. UT AI16-0
What makes for effective detection proposals?
Current top performing object detectors employ detection proposals to guide
the search for objects, thereby avoiding exhaustive sliding window search
across images. Despite the popularity and widespread use of detection
proposals, it is unclear which trade-offs are made when using them during
object detection. We provide an in-depth analysis of twelve proposal methods
along with four baselines regarding proposal repeatability, ground truth
annotation recall on PASCAL, ImageNet, and MS COCO, and their impact on DPM,
R-CNN, and Fast R-CNN detection performance. Our analysis shows that for object
detection improving proposal localisation accuracy is as important as improving
recall. We introduce a novel metric, the average recall (AR), which rewards
both high recall and good localisation and correlates surprisingly well with
detection performance. Our findings show common strengths and weaknesses of
existing methods, and provide insights and metrics for selecting and tuning
proposal methods.Comment: TPAMI final version, duplicate proposals removed in experiment
Measuring the Accuracy of Object Detectors and Trackers
The accuracy of object detectors and trackers is most commonly evaluated by
the Intersection over Union (IoU) criterion. To date, most approaches are
restricted to axis-aligned or oriented boxes and, as a consequence, many
datasets are only labeled with boxes. Nevertheless, axis-aligned or oriented
boxes cannot accurately capture an object's shape. To address this, a number of
densely segmented datasets has started to emerge in both the object detection
and the object tracking communities. However, evaluating the accuracy of object
detectors and trackers that are restricted to boxes on densely segmented data
is not straightforward. To close this gap, we introduce the relative
Intersection over Union (rIoU) accuracy measure. The measure normalizes the IoU
with the optimal box for the segmentation to generate an accuracy measure that
ranges between 0 and 1 and allows a more precise measurement of accuracies.
Furthermore, it enables an efficient and easy way to understand scenes and the
strengths and weaknesses of an object detection or tracking approach. We
display how the new measure can be efficiently calculated and present an
easy-to-use evaluation framework. The framework is tested on the DAVIS and the
VOT2016 segmentations and has been made available to the community.Comment: 10 pages, 7 Figure
Crowdsourcing in Computer Vision
Computer vision systems require large amounts of manually annotated data to
properly learn challenging visual concepts. Crowdsourcing platforms offer an
inexpensive method to capture human knowledge and understanding, for a vast
number of visual perception tasks. In this survey, we describe the types of
annotations computer vision researchers have collected using crowdsourcing, and
how they have ensured that this data is of high quality while annotation effort
is minimized. We begin by discussing data collection on both classic (e.g.,
object recognition) and recent (e.g., visual story-telling) vision tasks. We
then summarize key design decisions for creating effective data collection
interfaces and workflows, and present strategies for intelligently selecting
the most important data instances to annotate. Finally, we conclude with some
thoughts on the future of crowdsourcing in computer vision.Comment: A 69-page meta review of the field, Foundations and Trends in
Computer Graphics and Vision, 201
- …