1,299 research outputs found
Efficient MRF Energy Propagation for Video Segmentation via Bilateral Filters
Segmentation of an object from a video is a challenging task in multimedia
applications. Depending on the application, automatic or interactive methods
are desired; however, regardless of the application type, efficient computation
of video object segmentation is crucial for time-critical applications;
specifically, mobile and interactive applications require near real-time
efficiencies. In this paper, we address the problem of video segmentation from
the perspective of efficiency. We initially redefine the problem of video
object segmentation as the propagation of MRF energies along the temporal
domain. For this purpose, a novel and efficient method is proposed to propagate
MRF energies throughout the frames via bilateral filters without using any
global texture, color or shape model. Recently presented bi-exponential filter
is utilized for efficiency, whereas a novel technique is also developed to
dynamically solve graph-cuts for varying, non-lattice graphs in general linear
filtering scenario. These improvements are experimented for both automatic and
interactive video segmentation scenarios. Moreover, in addition to the
efficiency, segmentation quality is also tested both quantitatively and
qualitatively. Indeed, for some challenging examples, significant time
efficiency is observed without loss of segmentation quality.Comment: Multimedia, IEEE Transactions on (Volume:16, Issue: 5, Aug. 2014
Structured learning of sum-of-submodular higher order energy functions
Submodular functions can be exactly minimized in polynomial time, and the
special case that graph cuts solve with max flow \cite{KZ:PAMI04} has had
significant impact in computer vision
\cite{BVZ:PAMI01,Kwatra:SIGGRAPH03,Rother:GrabCut04}. In this paper we address
the important class of sum-of-submodular (SoS) functions
\cite{Arora:ECCV12,Kolmogorov:DAM12}, which can be efficiently minimized via a
variant of max flow called submodular flow \cite{Edmonds:ADM77}. SoS functions
can naturally express higher order priors involving, e.g., local image patches;
however, it is difficult to fully exploit their expressive power because they
have so many parameters. Rather than trying to formulate existing higher order
priors as an SoS function, we take a discriminative learning approach,
effectively searching the space of SoS functions for a higher order prior that
performs well on our training set. We adopt a structural SVM approach
\cite{Joachims/etal/09a,Tsochantaridis/etal/04} and formulate the training
problem in terms of quadratic programming; as a result we can efficiently
search the space of SoS priors via an extended cutting-plane algorithm. We also
show how the state-of-the-art max flow method for vision problems
\cite{Goldberg:ESA11} can be modified to efficiently solve the submodular flow
problem. Experimental comparisons are made against the OpenCV implementation of
the GrabCut interactive segmentation technique \cite{Rother:GrabCut04}, which
uses hand-tuned parameters instead of machine learning. On a standard dataset
\cite{Gulshan:CVPR10} our method learns higher order priors with hundreds of
parameter values, and produces significantly better segmentations. While our
focus is on binary labeling problems, we show that our techniques can be
naturally generalized to handle more than two labels
Fast Multi-frame Stereo Scene Flow with Motion Segmentation
We propose a new multi-frame method for efficiently computing scene flow
(dense depth and optical flow) and camera ego-motion for a dynamic scene
observed from a moving stereo camera rig. Our technique also segments out
moving objects from the rigid scene. In our method, we first estimate the
disparity map and the 6-DOF camera motion using stereo matching and visual
odometry. We then identify regions inconsistent with the estimated camera
motion and compute per-pixel optical flow only at these regions. This flow
proposal is fused with the camera motion-based flow proposal using fusion moves
to obtain the final optical flow and motion segmentation. This unified
framework benefits all four tasks - stereo, optical flow, visual odometry and
motion segmentation leading to overall higher accuracy and efficiency. Our
method is currently ranked third on the KITTI 2015 scene flow benchmark.
Furthermore, our CPU implementation runs in 2-3 seconds per frame which is 1-3
orders of magnitude faster than the top six methods. We also report a thorough
evaluation on challenging Sintel sequences with fast camera and object motion,
where our method consistently outperforms OSF [Menze and Geiger, 2015], which
is currently ranked second on the KITTI benchmark.Comment: 15 pages. To appear at IEEE Conference on Computer Vision and Pattern
Recognition (CVPR 2017). Our results were submitted to KITTI 2015 Stereo
Scene Flow Benchmark in November 201
NEFI: Network Extraction From Images
Networks and network-like structures are amongst the central building blocks
of many technological and biological systems. Given a mathematical graph
representation of a network, methods from graph theory enable a precise
investigation of its properties. Software for the analysis of graphs is widely
available and has been applied to graphs describing large scale networks such
as social networks, protein-interaction networks, etc. In these applications,
graph acquisition, i.e., the extraction of a mathematical graph from a network,
is relatively simple. However, for many network-like structures, e.g. leaf
venations, slime molds and mud cracks, data collection relies on images where
graph extraction requires domain-specific solutions or even manual. Here we
introduce Network Extraction From Images, NEFI, a software tool that
automatically extracts accurate graphs from images of a wide range of networks
originating in various domains. While there is previous work on graph
extraction from images, theoretical results are fully accessible only to an
expert audience and ready-to-use implementations for non-experts are rarely
available or insufficiently documented. NEFI provides a novel platform allowing
practitioners from many disciplines to easily extract graph representations
from images by supplying flexible tools from image processing, computer vision
and graph theory bundled in a convenient package. Thus, NEFI constitutes a
scalable alternative to tedious and error-prone manual graph extraction and
special purpose tools. We anticipate NEFI to enable the collection of larger
datasets by reducing the time spent on graph extraction. The analysis of these
new datasets may open up the possibility to gain new insights into the
structure and function of various types of networks. NEFI is open source and
available http://nefi.mpi-inf.mpg.de
Pseudo Mask Augmented Object Detection
In this work, we present a novel and effective framework to facilitate object
detection with the instance-level segmentation information that is only
supervised by bounding box annotation. Starting from the joint object detection
and instance segmentation network, we propose to recursively estimate the
pseudo ground-truth object masks from the instance-level object segmentation
network training, and then enhance the detection network with top-down
segmentation feedbacks. The pseudo ground truth mask and network parameters are
optimized alternatively to mutually benefit each other. To obtain the promising
pseudo masks in each iteration, we embed a graphical inference that
incorporates the low-level image appearance consistency and the bounding box
annotations to refine the segmentation masks predicted by the segmentation
network. Our approach progressively improves the object detection performance
by incorporating the detailed pixel-wise information learned from the
weakly-supervised segmentation network. Extensive evaluation on the detection
task in PASCAL VOC 2007 and 2012 [12] verifies that the proposed approach is
effective
- …