238 research outputs found
Structured learning of sum-of-submodular higher order energy functions
Submodular functions can be exactly minimized in polynomial time, and the
special case that graph cuts solve with max flow \cite{KZ:PAMI04} has had
significant impact in computer vision
\cite{BVZ:PAMI01,Kwatra:SIGGRAPH03,Rother:GrabCut04}. In this paper we address
the important class of sum-of-submodular (SoS) functions
\cite{Arora:ECCV12,Kolmogorov:DAM12}, which can be efficiently minimized via a
variant of max flow called submodular flow \cite{Edmonds:ADM77}. SoS functions
can naturally express higher order priors involving, e.g., local image patches;
however, it is difficult to fully exploit their expressive power because they
have so many parameters. Rather than trying to formulate existing higher order
priors as an SoS function, we take a discriminative learning approach,
effectively searching the space of SoS functions for a higher order prior that
performs well on our training set. We adopt a structural SVM approach
\cite{Joachims/etal/09a,Tsochantaridis/etal/04} and formulate the training
problem in terms of quadratic programming; as a result we can efficiently
search the space of SoS priors via an extended cutting-plane algorithm. We also
show how the state-of-the-art max flow method for vision problems
\cite{Goldberg:ESA11} can be modified to efficiently solve the submodular flow
problem. Experimental comparisons are made against the OpenCV implementation of
the GrabCut interactive segmentation technique \cite{Rother:GrabCut04}, which
uses hand-tuned parameters instead of machine learning. On a standard dataset
\cite{Gulshan:CVPR10} our method learns higher order priors with hundreds of
parameter values, and produces significantly better segmentations. While our
focus is on binary labeling problems, we show that our techniques can be
naturally generalized to handle more than two labels
Learning sparse representations of depth
This paper introduces a new method for learning and inferring sparse
representations of depth (disparity) maps. The proposed algorithm relaxes the
usual assumption of the stationary noise model in sparse coding. This enables
learning from data corrupted with spatially varying noise or uncertainty,
typically obtained by laser range scanners or structured light depth cameras.
Sparse representations are learned from the Middlebury database disparity maps
and then exploited in a two-layer graphical model for inferring depth from
stereo, by including a sparsity prior on the learned features. Since they
capture higher-order dependencies in the depth structure, these priors can
complement smoothness priors commonly used in depth inference based on Markov
Random Field (MRF) models. Inference on the proposed graph is achieved using an
alternating iterative optimization technique, where the first layer is solved
using an existing MRF-based stereo matching algorithm, then held fixed as the
second layer is solved using the proposed non-stationary sparse coding
algorithm. This leads to a general method for improving solutions of state of
the art MRF-based depth estimation algorithms. Our experimental results first
show that depth inference using learned representations leads to state of the
art denoising of depth maps obtained from laser range scanners and a time of
flight camera. Furthermore, we show that adding sparse priors improves the
results of two depth estimation methods: the classical graph cut algorithm by
Boykov et al. and the more recent algorithm of Woodford et al.Comment: 12 page
Parsimonious Labeling
We propose a new family of discrete energy minimization problems, which we
call parsimonious labeling. Specifically, our energy functional consists of
unary potentials and high-order clique potentials. While the unary potentials
are arbitrary, the clique potentials are proportional to the {\em diversity} of
set of the unique labels assigned to the clique. Intuitively, our energy
functional encourages the labeling to be parsimonious, that is, use as few
labels as possible. This in turn allows us to capture useful cues for important
computer vision applications such as stereo correspondence and image denoising.
Furthermore, we propose an efficient graph-cuts based algorithm for the
parsimonious labeling problem that provides strong theoretical guarantees on
the quality of the solution. Our algorithm consists of three steps. First, we
approximate a given diversity using a mixture of a novel hierarchical
Potts model. Second, we use a divide-and-conquer approach for each mixture
component, where each subproblem is solved using an effficient
-expansion algorithm. This provides us with a small number of putative
labelings, one for each mixture component. Third, we choose the best putative
labeling in terms of the energy value. Using both sythetic and standard real
datasets, we show that our algorithm significantly outperforms other graph-cuts
based approaches
Filter-Based Probabilistic Markov Random Field Image Priors: Learning, Evaluation, and Image Analysis
Markov random fields (MRF) based on linear filter responses are one of the most popular forms for modeling image priors due to their rigorous probabilistic interpretations and versatility in various applications. In this dissertation, we propose an application-independent method to quantitatively evaluate MRF image priors using model samples. To this end, we developed an efficient auxiliary-variable Gibbs samplers for a general class of MRFs with flexible potentials. We found that the popular pairwise and high-order MRF priors capture image statistics quite roughly and exhibit poor generative properties. We further developed new learning strategies and obtained high-order MRFs that well capture the statistics of the inbuilt features, thus being real maximum-entropy models, and other important statistical properties of natural images, outlining the capabilities of MRFs. We suggest a multi-modal extension of MRF potentials which not only allows to train more expressive priors, but also helps to reveal more insights of MRF variants, based on which we are able to train compact, fully-convolutional restricted Boltzmann machines (RBM) that can model visual repetitive textures even better than more complex and deep models.
The learned high-order MRFs allow us to develop new methods for various real-world image analysis problems. For denoising of natural images and deconvolution of microscopy images, the MRF priors are employed in a pure generative setting. We propose efficient sampling-based methods to infer Bayesian minimum mean squared error (MMSE) estimates, which substantially outperform maximum a-posteriori (MAP) estimates and can compete with state-of-the-art discriminative methods. For non-rigid registration of live cell nuclei in time-lapse microscopy images, we propose a global optical flow-based method. The statistics of noise in fluorescence microscopy images are studied to derive an adaptive weighting scheme for increasing model robustness. High-order MRFs are also employed to train image filters for extracting important features of cell nuclei and the deformation of nuclei are then estimated in the learned feature spaces. The developed method outperforms previous approaches in terms of both registration accuracy and computational efficiency
Structured learning of sum-of-submodular higher order energy functions
Submodular functions can be exactly minimized in polynomial time, and the special case that graph cuts solve with max flow [18] has had significant impact in computer vision [5, 20, 27]. In this paper we address the important class of sum-of-submodular (SoS) functions [2, 17], which can be efficiently minimized via a variant of max flow called submodular flow [6]. SoS functions can naturally express higher order priors involving, e.g., local image patches; however, it is difficult to fully exploit their expressive power because they have so many parameters. Rather than trying to formulate existing higher order priors as an SoS function, we take a discriminative learning approach, effectively searching the space of SoS functions for a higher order prior that performs well on our training set. We adopt a structural SVM approach [14, 33] and formulate the training problem in terms of quadratic programming; as a result we can efficiently search the space of SoS priors via an extended cutting-plane algorithm. We also show how the state-of-the-art max flow method for vision problems [10] can be modified to efficiently solve the submodular flow problem. Experimental comparisons are made against the OpenCV implementation of the GrabCut interactive segmentation technique [27], which uses hand-tuned parameters instead of machine learning. On a standard dataset [11] our method learns higher order priors with hundreds of parameter values, and produces significantly better segmentations. While our focus is on binary labeling problems, we show that our techniques can be naturally generalized to handle more than two labels. 1
Online Mutual Foreground Segmentation for Multispectral Stereo Videos
The segmentation of video sequences into foreground and background regions is
a low-level process commonly used in video content analysis and smart
surveillance applications. Using a multispectral camera setup can improve this
process by providing more diverse data to help identify objects despite adverse
imaging conditions. The registration of several data sources is however not
trivial if the appearance of objects produced by each sensor differs
substantially. This problem is further complicated when parallax effects cannot
be ignored when using close-range stereo pairs. In this work, we present a new
method to simultaneously tackle multispectral segmentation and stereo
registration. Using an iterative procedure, we estimate the labeling result for
one problem using the provisional result of the other. Our approach is based on
the alternating minimization of two energy functions that are linked through
the use of dynamic priors. We rely on the integration of shape and appearance
cues to find proper multispectral correspondences, and to properly segment
objects in low contrast regions. We also formulate our model as a frame
processing pipeline using higher order terms to improve the temporal coherence
of our results. Our method is evaluated under different configurations on
multiple multispectral datasets, and our implementation is available online.Comment: Preprint accepted for publication in IJCV (December 2018
Rich probabilistic models for semantic labeling
Das Ziel dieser Monographie ist es die Methoden und Anwendungen des semantischen Labelings zu erforschen. Unsere Beiträge zu diesem sich rasch entwickelten Thema sind bestimmte Aspekte der Modellierung und der Inferenz in probabilistischen Modellen und ihre Anwendungen in den interdisziplinären Bereichen der Computer Vision sowie medizinischer Bildverarbeitung und Fernerkundung
- …