24,004 research outputs found
Multi-object segmentation using coupled nonparametric shape and relative pose priors
We present a new method for multi-object segmentation in a maximum a posteriori estimation framework. Our method is motivated by the observation that neighboring or coupling objects in images generate configurations and co-dependencies which could potentially aid in segmentation if properly exploited. Our approach employs coupled shape and inter-shape pose priors that are computed using training images in a nonparametric multi-variate kernel density estimation framework. The coupled shape prior is obtained by estimating the joint shape distribution of multiple objects and the inter-shape pose priors are modeled via standard moments. Based on such statistical models, we formulate an optimization problem for segmentation, which we solve by an algorithm based on active contours. Our technique provides significant improvements in the segmentation of weakly contrasted objects in a number of applications. In particular for medical image analysis, we use our method to extract brain Basal Ganglia structures, which are members of a complex multi-object system posing a challenging segmentation problem. We also apply our technique to the problem of handwritten character segmentation. Finally, we use our method to segment cars in urban scenes
Recurrent Pixel Embedding for Instance Grouping
We introduce a differentiable, end-to-end trainable framework for solving
pixel-level grouping problems such as instance segmentation consisting of two
novel components. First, we regress pixels into a hyper-spherical embedding
space so that pixels from the same group have high cosine similarity while
those from different groups have similarity below a specified margin. We
analyze the choice of embedding dimension and margin, relating them to
theoretical results on the problem of distributing points uniformly on the
sphere. Second, to group instances, we utilize a variant of mean-shift
clustering, implemented as a recurrent neural network parameterized by kernel
bandwidth. This recurrent grouping module is differentiable, enjoys convergent
dynamics and probabilistic interpretability. Backpropagating the group-weighted
loss through this module allows learning to focus on only correcting embedding
errors that won't be resolved during subsequent clustering. Our framework,
while conceptually simple and theoretically abundant, is also practically
effective and computationally efficient. We demonstrate substantial
improvements over state-of-the-art instance segmentation for object proposal
generation, as well as demonstrating the benefits of grouping loss for
classification tasks such as boundary detection and semantic segmentation
What Can Help Pedestrian Detection?
Aggregating extra features has been considered as an effective approach to
boost traditional pedestrian detection methods. However, there is still a lack
of studies on whether and how CNN-based pedestrian detectors can benefit from
these extra features. The first contribution of this paper is exploring this
issue by aggregating extra features into CNN-based pedestrian detection
framework. Through extensive experiments, we evaluate the effects of different
kinds of extra features quantitatively. Moreover, we propose a novel network
architecture, namely HyperLearner, to jointly learn pedestrian detection as
well as the given extra feature. By multi-task training, HyperLearner is able
to utilize the information of given features and improve detection performance
without extra inputs in inference. The experimental results on multiple
pedestrian benchmarks validate the effectiveness of the proposed HyperLearner.Comment: Accepted to IEEE International Conference on Computer Vision and
Pattern Recognition (CVPR) 201
Task Decomposition and Synchronization for Semantic Biomedical Image Segmentation
Semantic segmentation is essentially important to biomedical image analysis.
Many recent works mainly focus on integrating the Fully Convolutional Network
(FCN) architecture with sophisticated convolution implementation and deep
supervision. In this paper, we propose to decompose the single segmentation
task into three subsequent sub-tasks, including (1) pixel-wise image
segmentation, (2) prediction of the class labels of the objects within the
image, and (3) classification of the scene the image belonging to. While these
three sub-tasks are trained to optimize their individual loss functions of
different perceptual levels, we propose to let them interact by the task-task
context ensemble. Moreover, we propose a novel sync-regularization to penalize
the deviation between the outputs of the pixel-wise segmentation and the class
prediction tasks. These effective regularizations help FCN utilize context
information comprehensively and attain accurate semantic segmentation, even
though the number of the images for training may be limited in many biomedical
applications. We have successfully applied our framework to three diverse 2D/3D
medical image datasets, including Robotic Scene Segmentation Challenge 18
(ROBOT18), Brain Tumor Segmentation Challenge 18 (BRATS18), and Retinal Fundus
Glaucoma Challenge (REFUGE18). We have achieved top-tier performance in all
three challenges.Comment: IEEE Transactions on Medical Imagin
Multiresolution hierarchy co-clustering for semantic segmentation in sequences with small variations
This paper presents a co-clustering technique that, given a collection of
images and their hierarchies, clusters nodes from these hierarchies to obtain a
coherent multiresolution representation of the image collection. We formalize
the co-clustering as a Quadratic Semi-Assignment Problem and solve it with a
linear programming relaxation approach that makes effective use of information
from hierarchies. Initially, we address the problem of generating an optimal,
coherent partition per image and, afterwards, we extend this method to a
multiresolution framework. Finally, we particularize this framework to an
iterative multiresolution video segmentation algorithm in sequences with small
variations. We evaluate the algorithm on the Video Occlusion/Object Boundary
Detection Dataset, showing that it produces state-of-the-art results in these
scenarios.Comment: International Conference on Computer Vision (ICCV) 201
Cross Pixel Optical Flow Similarity for Self-Supervised Learning
We propose a novel method for learning convolutional neural image
representations without manual supervision. We use motion cues in the form of
optical flow, to supervise representations of static images. The obvious
approach of training a network to predict flow from a single image can be
needlessly difficult due to intrinsic ambiguities in this prediction task. We
instead propose a much simpler learning goal: embed pixels such that the
similarity between their embeddings matches that between their optical flow
vectors. At test time, the learned deep network can be used without access to
video or flow information and transferred to tasks such as image
classification, detection, and segmentation. Our method, which significantly
simplifies previous attempts at using motion for self-supervision, achieves
state-of-the-art results in self-supervision using motion cues, competitive
results for self-supervision in general, and is overall state of the art in
self-supervised pretraining for semantic image segmentation, as demonstrated on
standard benchmarks
Reconstructive Sparse Code Transfer for Contour Detection and Semantic Labeling
We frame the task of predicting a semantic labeling as a sparse
reconstruction procedure that applies a target-specific learned transfer
function to a generic deep sparse code representation of an image. This
strategy partitions training into two distinct stages. First, in an
unsupervised manner, we learn a set of generic dictionaries optimized for
sparse coding of image patches. We train a multilayer representation via
recursive sparse dictionary learning on pooled codes output by earlier layers.
Second, we encode all training images with the generic dictionaries and learn a
transfer function that optimizes reconstruction of patches extracted from
annotated ground-truth given the sparse codes of their corresponding image
patches. At test time, we encode a novel image using the generic dictionaries
and then reconstruct using the transfer function. The output reconstruction is
a semantic labeling of the test image.
Applying this strategy to the task of contour detection, we demonstrate
performance competitive with state-of-the-art systems. Unlike almost all prior
work, our approach obviates the need for any form of hand-designed features or
filters. To illustrate general applicability, we also show initial results on
semantic part labeling of human faces.
The effectiveness of our approach opens new avenues for research on deep
sparse representations. Our classifiers utilize this representation in a novel
manner. Rather than acting on nodes in the deepest layer, they attach to nodes
along a slice through multiple layers of the network in order to make
predictions about local patches. Our flexible combination of a generatively
learned sparse representation with discriminatively trained transfer
classifiers extends the notion of sparse reconstruction to encompass arbitrary
semantic labeling tasks.Comment: to appear in Asian Conference on Computer Vision (ACCV), 201
- …