277 research outputs found
Discriminative Training of Deep Fully-connected Continuous CRF with Task-specific Loss
Recent works on deep conditional random fields (CRF) have set new records on
many vision tasks involving structured predictions. Here we propose a
fully-connected deep continuous CRF model for both discrete and continuous
labelling problems. We exemplify the usefulness of the proposed model on
multi-class semantic labelling (discrete) and the robust depth estimation
(continuous) problems.
In our framework, we model both the unary and the pairwise potential
functions as deep convolutional neural networks (CNN), which are jointly
learned in an end-to-end fashion. The proposed method possesses the main
advantage of continuously-valued CRF, which is a closed-form solution for the
Maximum a posteriori (MAP) inference.
To better adapt to different tasks, instead of using the commonly employed
maximum likelihood CRF parameter learning protocol, we propose task-specific
loss functions for learning the CRF parameters.
It enables direct optimization of the quality of the MAP estimates during the
course of learning.
Specifically, we optimize the multi-class classification loss for the
semantic labelling task and the Turkey's biweight loss for the robust depth
estimation problem.
Experimental results on the semantic labelling and robust depth estimation
tasks demonstrate that the proposed method compare favorably against both
baseline and state-of-the-art methods.
In particular, we show that although the proposed deep CRF model is
continuously valued, with the equipment of task-specific loss, it achieves
impressive results even on discrete labelling tasks
Superpixel Convolutional Networks using Bilateral Inceptions
In this paper we propose a CNN architecture for semantic image segmentation.
We introduce a new 'bilateral inception' module that can be inserted in
existing CNN architectures and performs bilateral filtering, at multiple
feature-scales, between superpixels in an image. The feature spaces for
bilateral filtering and other parameters of the module are learned end-to-end
using standard backpropagation techniques. The bilateral inception module
addresses two issues that arise with general CNN segmentation architectures.
First, this module propagates information between (super) pixels while
respecting image edges, thus using the structured information of the problem
for improved results. Second, the layer recovers a full resolution segmentation
result from the lower resolution solution of a CNN. In the experiments, we
modify several existing CNN architectures by inserting our inception module
between the last CNN (1x1 convolution) layers. Empirical results on three
different datasets show reliable improvements not only in comparison to the
baseline networks, but also in comparison to several dense-pixel prediction
techniques such as CRFs, while being competitive in time.Comment: European Conference on Computer Vision (ECCV), 201
Incorporating Boltzmann Machine Priors for Semantic Labeling in Images and Videos
Semantic labeling is the task of assigning category labels to regions in an image. For example, a scene may consist of regions corresponding to categories such as sky, water, and ground, or parts of a face such as eyes, nose, and mouth. Semantic labeling is an important mid-level vision task for grouping and organizing image regions into coherent parts. Labeling these regions allows us to better understand the scene itself as well as properties of the objects in the scene, such as their parts, location, and interaction within the scene. Typical approaches for this task include the conditional random field (CRF), which is well-suited to modeling local interactions among adjacent image regions. However the CRF is limited in dealing with complex, global (long-range) interactions between regions in an image, and between frames in a video. This thesis presents approaches to modeling long-range interactions within images and videos, for use in semantic labeling.
In order to model these long-range interactions, we incorporate priors based on the restricted Boltzmann machine (RBM). The RBM is a generative model which has demonstrated the ability to learn the shape of an object and the CRBM is a temporal extension which can learn the motion of an object. Although the CRF is a good baseline labeler, we show how the RBM and CRBM can be added to the architecture to model both the global object shape within an image and the temporal dependencies of the object from previous frames in a video. We demonstrate the labeling performance of our models for the parts of complex face images from the Labeled Faces in the Wild database (for images) and the YouTube Faces Database (for videos). Our hybrid models produce results that are both quantitatively and qualitatively better than the baseline CRF alone for both images and videos
Integrated Inference and Learning of Neural Factors in Structural Support Vector Machines
Tackling pattern recognition problems in areas such as computer vision,
bioinformatics, speech or text recognition is often done best by taking into
account task-specific statistical relations between output variables. In
structured prediction, this internal structure is used to predict multiple
outputs simultaneously, leading to more accurate and coherent predictions.
Structural support vector machines (SSVMs) are nonprobabilistic models that
optimize a joint input-output function through margin-based learning. Because
SSVMs generally disregard the interplay between unary and interaction factors
during the training phase, final parameters are suboptimal. Moreover, its
factors are often restricted to linear combinations of input features, limiting
its generalization power. To improve prediction accuracy, this paper proposes:
(i) Joint inference and learning by integration of back-propagation and
loss-augmented inference in SSVM subgradient descent; (ii) Extending SSVM
factors to neural networks that form highly nonlinear functions of input
features. Image segmentation benchmark results demonstrate improvements over
conventional SSVM training methods in terms of accuracy, highlighting the
feasibility of end-to-end SSVM training with neural factors
Rich probabilistic models for semantic labeling
Das Ziel dieser Monographie ist es die Methoden und Anwendungen des semantischen Labelings zu erforschen. Unsere Beiträge zu diesem sich rasch entwickelten Thema sind bestimmte Aspekte der Modellierung und der Inferenz in probabilistischen Modellen und ihre Anwendungen in den interdisziplinären Bereichen der Computer Vision sowie medizinischer Bildverarbeitung und Fernerkundung
- …