77,874 research outputs found
Crowdsourcing in Computer Vision
Computer vision systems require large amounts of manually annotated data to
properly learn challenging visual concepts. Crowdsourcing platforms offer an
inexpensive method to capture human knowledge and understanding, for a vast
number of visual perception tasks. In this survey, we describe the types of
annotations computer vision researchers have collected using crowdsourcing, and
how they have ensured that this data is of high quality while annotation effort
is minimized. We begin by discussing data collection on both classic (e.g.,
object recognition) and recent (e.g., visual story-telling) vision tasks. We
then summarize key design decisions for creating effective data collection
interfaces and workflows, and present strategies for intelligently selecting
the most important data instances to annotate. Finally, we conclude with some
thoughts on the future of crowdsourcing in computer vision.Comment: A 69-page meta review of the field, Foundations and Trends in
Computer Graphics and Vision, 201
Recommended from our members
Changeable Context of the New Technology Artefact and the Changeable Research Outcomes
Computer Aided Drawing (vector based) and painting (raster based) packages, allow the mock-up of designs in virtual space. Whilst this is beneficial for visualising the end product, both methods of drawing have been applied to new and existing machining technologies so that some aspect of the product is derived from a computer file. Today, the applied artist now has an abundance of CAD/CAM (Computer Aided Drawing / Computer Aided Manufacturing) technologies awaiting them. So much so, that in the past fifteen years, many makers have embarked upon practice-led research to find out what a particular technology can do with regard to their design interests. Within such research, the object is the manifestation of what has been discovered through the research activity. This paper considers the relationship between the content of the research object and the context for the object’s reception. This is examined with regard to the author’s research with new technologies for the applied arts. Examples will highlight how the characteristics of artefacts that arise from the research can help determine who the audience for the work is, and how the technology might be used by different kinds of craft practitioners. References will also be made to the work of other designer-makers working with and researching similar technologies. Evidence from practical examples will also be supported by a more theoretical discussion. The implications of supplying, or not supplying, background information for an audience within a variety of settings on the perceived content/context of the object, and the communication of the research, will also be discussed. It is concluded that when developing new processes, keeping the work open to a number of audiences can maximise the outcomes and increase the chances of the process being integrated within practice. The discussion also highlights a trend of positioning the consumer/viewer at the forefront of the research, and a need to evaluate their experiences
View-Invariant Object Category Learning, Recognition, and Search: How Spatial and Object Attention Are Coordinated Using Surface-Based Attentional Shrouds
Air Force Office of Scientific Research (F49620-01-1-0397); National Science Foundation (SBE-0354378); Office of Naval Research (N00014-01-1-0624
Matterport3D: Learning from RGB-D Data in Indoor Environments
Access to large, diverse RGB-D datasets is critical for training RGB-D scene
understanding algorithms. However, existing datasets still cover only a limited
number of views or a restricted scale of spaces. In this paper, we introduce
Matterport3D, a large-scale RGB-D dataset containing 10,800 panoramic views
from 194,400 RGB-D images of 90 building-scale scenes. Annotations are provided
with surface reconstructions, camera poses, and 2D and 3D semantic
segmentations. The precise global alignment and comprehensive, diverse
panoramic set of views over entire buildings enable a variety of supervised and
self-supervised computer vision tasks, including keypoint matching, view
overlap prediction, normal prediction from color, semantic segmentation, and
region classification
Pose Induction for Novel Object Categories
We address the task of predicting pose for objects of unannotated object
categories from a small seed set of annotated object classes. We present a
generalized classifier that can reliably induce pose given a single instance of
a novel category. In case of availability of a large collection of novel
instances, our approach then jointly reasons over all instances to improve the
initial estimates. We empirically validate the various components of our
algorithm and quantitatively show that our method produces reliable pose
estimates. We also show qualitative results on a diverse set of classes and
further demonstrate the applicability of our system for learning shape models
of novel object classes
- …