971 research outputs found
Recurrent Convolutional Neural Networks for Scene Parsing
Scene parsing is a technique that consist on giving a label to all pixels in
an image according to the class they belong to. To ensure a good visual
coherence and a high class accuracy, it is essential for a scene parser to
capture image long range dependencies. In a feed-forward architecture, this can
be simply achieved by considering a sufficiently large input context patch,
around each pixel to be labeled. We propose an approach consisting of a
recurrent convolutional neural network which allows us to consider a large
input context, while limiting the capacity of the model. Contrary to most
standard approaches, our method does not rely on any segmentation methods, nor
any task-specific features. The system is trained in an end-to-end manner over
raw pixels, and models complex spatial dependencies with low inference cost. As
the context size increases with the built-in recurrence, the system identifies
and corrects its own errors. Our approach yields state-of-the-art performance
on both the Stanford Background Dataset and the SIFT Flow Dataset, while
remaining very fast at test time
Visual Chunking: A List Prediction Framework for Region-Based Object Detection
We consider detecting objects in an image by iteratively selecting from a set
of arbitrarily shaped candidate regions. Our generic approach, which we term
visual chunking, reasons about the locations of multiple object instances in an
image while expressively describing object boundaries. We design an
optimization criterion for measuring the performance of a list of such
detections as a natural extension to a common per-instance metric. We present
an efficient algorithm with provable performance for building a high-quality
list of detections from any candidate set of region-based proposals. We also
develop a simple class-specific algorithm to generate a candidate region
instance in near-linear time in the number of low-level superpixels that
outperforms other region generating methods. In order to make predictions on
novel images at testing time without access to ground truth, we develop
learning approaches to emulate these algorithms' behaviors. We demonstrate that
our new approach outperforms sophisticated baselines on benchmark datasets.Comment: to appear at ICRA 201
Finding Temporally Consistent Occlusion Boundaries in Videos using Geometric Context
We present an algorithm for finding temporally consistent occlusion
boundaries in videos to support segmentation of dynamic scenes. We learn
occlusion boundaries in a pairwise Markov random field (MRF) framework. We
first estimate the probability of an spatio-temporal edge being an occlusion
boundary by using appearance, flow, and geometric features. Next, we enforce
occlusion boundary continuity in a MRF model by learning pairwise occlusion
probabilities using a random forest. Then, we temporally smooth boundaries to
remove temporal inconsistencies in occlusion boundary estimation. Our proposed
framework provides an efficient approach for finding temporally consistent
occlusion boundaries in video by utilizing causality, redundancy in videos, and
semantic layout of the scene. We have developed a dataset with fully annotated
ground-truth occlusion boundaries of over 30 videos ($5000 frames). This
dataset is used to evaluate temporal occlusion boundaries and provides a much
needed baseline for future studies. We perform experiments to demonstrate the
role of scene layout, and temporal information for occlusion reasoning in
dynamic scenes.Comment: Applications of Computer Vision (WACV), 2015 IEEE Winter Conference
o
Sample and Filter: Nonparametric Scene Parsing via Efficient Filtering
Scene parsing has attracted a lot of attention in computer vision. While
parametric models have proven effective for this task, they cannot easily
incorporate new training data. By contrast, nonparametric approaches, which
bypass any learning phase and directly transfer the labels from the training
data to the query images, can readily exploit new labeled samples as they
become available. Unfortunately, because of the computational cost of their
label transfer procedures, state-of-the-art nonparametric methods typically
filter out most training images to only keep a few relevant ones to label the
query. As such, these methods throw away many images that still contain
valuable information and generally obtain an unbalanced set of labeled samples.
In this paper, we introduce a nonparametric approach to scene parsing that
follows a sample-and-filter strategy. More specifically, we propose to sample
labeled superpixels according to an image similarity score, which allows us to
obtain a balanced set of samples. We then formulate label transfer as an
efficient filtering procedure, which lets us exploit more labeled samples than
existing techniques. Our experiments evidence the benefits of our approach over
state-of-the-art nonparametric methods on two benchmark datasets.Comment: Please refer to the CVPR-2016 version of this manuscrip
Lifting GIS Maps into Strong Geometric Context for Scene Understanding
Contextual information can have a substantial impact on the performance of
visual tasks such as semantic segmentation, object detection, and geometric
estimation. Data stored in Geographic Information Systems (GIS) offers a rich
source of contextual information that has been largely untapped by computer
vision. We propose to leverage such information for scene understanding by
combining GIS resources with large sets of unorganized photographs using
Structure from Motion (SfM) techniques. We present a pipeline to quickly
generate strong 3D geometric priors from 2D GIS data using SfM models aligned
with minimal user input. Given an image resectioned against this model, we
generate robust predictions of depth, surface normals, and semantic labels. We
show that the precision of the predicted geometry is substantially more
accurate other single-image depth estimation methods. We then demonstrate the
utility of these contextual constraints for re-scoring pedestrian detections,
and use these GIS contextual features alongside object detection score maps to
improve a CRF-based semantic segmentation framework, boosting accuracy over
baseline models
- …