16,830 research outputs found
Advanced content-based semantic scene analysis and information retrieval: the SCHEMA project
The aim of the SCHEMA Network of Excellence is to bring together a critical mass of universities, research centers, industrial partners and end users, in order to design a reference system for content-based semantic scene analysis, interpretation and understanding. Relevant research areas include: content-based multimedia analysis and automatic annotation of semantic multimedia content, combined textual and multimedia information retrieval, semantic -web, MPEG-7 and MPEG-21 standards, user interfaces and human factors. In this paper, recent advances in content-based analysis, indexing and retrieval of digital media within the SCHEMA Network are presented. These advances will be integrated in the SCHEMA module-based, expandable reference system
Multiresolution hierarchy co-clustering for semantic segmentation in sequences with small variations
This paper presents a co-clustering technique that, given a collection of
images and their hierarchies, clusters nodes from these hierarchies to obtain a
coherent multiresolution representation of the image collection. We formalize
the co-clustering as a Quadratic Semi-Assignment Problem and solve it with a
linear programming relaxation approach that makes effective use of information
from hierarchies. Initially, we address the problem of generating an optimal,
coherent partition per image and, afterwards, we extend this method to a
multiresolution framework. Finally, we particularize this framework to an
iterative multiresolution video segmentation algorithm in sequences with small
variations. We evaluate the algorithm on the Video Occlusion/Object Boundary
Detection Dataset, showing that it produces state-of-the-art results in these
scenarios.Comment: International Conference on Computer Vision (ICCV) 201
Predicting Deeper into the Future of Semantic Segmentation
The ability to predict and therefore to anticipate the future is an important
attribute of intelligence. It is also of utmost importance in real-time
systems, e.g. in robotics or autonomous driving, which depend on visual scene
understanding for decision making. While prediction of the raw RGB pixel values
in future video frames has been studied in previous work, here we introduce the
novel task of predicting semantic segmentations of future frames. Given a
sequence of video frames, our goal is to predict segmentation maps of not yet
observed video frames that lie up to a second or further in the future. We
develop an autoregressive convolutional neural network that learns to
iteratively generate multiple frames. Our results on the Cityscapes dataset
show that directly predicting future segmentations is substantially better than
predicting and then segmenting future RGB frames. Prediction results up to half
a second in the future are visually convincing and are much more accurate than
those of a baseline based on warping semantic segmentations using optical flow.Comment: Accepted to ICCV 2017. Supplementary material available on the
authors' webpage
Crowdsourcing in Computer Vision
Computer vision systems require large amounts of manually annotated data to
properly learn challenging visual concepts. Crowdsourcing platforms offer an
inexpensive method to capture human knowledge and understanding, for a vast
number of visual perception tasks. In this survey, we describe the types of
annotations computer vision researchers have collected using crowdsourcing, and
how they have ensured that this data is of high quality while annotation effort
is minimized. We begin by discussing data collection on both classic (e.g.,
object recognition) and recent (e.g., visual story-telling) vision tasks. We
then summarize key design decisions for creating effective data collection
interfaces and workflows, and present strategies for intelligently selecting
the most important data instances to annotate. Finally, we conclude with some
thoughts on the future of crowdsourcing in computer vision.Comment: A 69-page meta review of the field, Foundations and Trends in
Computer Graphics and Vision, 201
- âŠ