1,745 research outputs found
Error Correction for Dense Semantic Image Labeling
Pixelwise semantic image labeling is an important, yet challenging, task with
many applications. Typical approaches to tackle this problem involve either the
training of deep networks on vast amounts of images to directly infer the
labels or the use of probabilistic graphical models to jointly model the
dependencies of the input (i.e. images) and output (i.e. labels). Yet, the
former approaches do not capture the structure of the output labels, which is
crucial for the performance of dense labeling, and the latter rely on carefully
hand-designed priors that require costly parameter tuning via optimization
techniques, which in turn leads to long inference times. To alleviate these
restrictions, we explore how to arrive at dense semantic pixel labels given
both the input image and an initial estimate of the output labels. We propose a
parallel architecture that: 1) exploits the context information through a
LabelPropagation network to propagate correct labels from nearby pixels to
improve the object boundaries, 2) uses a LabelReplacement network to directly
replace possibly erroneous, initial labels with new ones, and 3) combines the
different intermediate results via a Fusion network to obtain the final
per-pixel label. We experimentally validate our approach on two different
datasets for the semantic segmentation and face parsing tasks respectively,
where we show improvements over the state-of-the-art. We also provide both a
quantitative and qualitative analysis of the generated results
Parameter Efficient Local Implicit Image Function Network for Face Segmentation
Face parsing is defined as the per-pixel labeling of images containing human
faces. The labels are defined to identify key facial regions like eyes, lips,
nose, hair, etc. In this work, we make use of the structural consistency of the
human face to propose a lightweight face-parsing method using a Local Implicit
Function network, FP-LIIF. We propose a simple architecture having a
convolutional encoder and a pixel MLP decoder that uses 1/26th number of
parameters compared to the state-of-the-art models and yet matches or
outperforms state-of-the-art models on multiple datasets, like CelebAMask-HQ
and LaPa. We do not use any pretraining, and compared to other works, our
network can also generate segmentation at different resolutions without any
changes in the input resolution. This work enables the use of facial
segmentation on low-compute or low-bandwidth devices because of its higher FPS
and smaller model size.Comment: Accepted at CVPR 202
Bottom-Up and Top-Down Reasoning with Hierarchical Rectified Gaussians
Convolutional neural nets (CNNs) have demonstrated remarkable performance in
recent history. Such approaches tend to work in a unidirectional bottom-up
feed-forward fashion. However, practical experience and biological evidence
tells us that feedback plays a crucial role, particularly for detailed spatial
understanding tasks. This work explores bidirectional architectures that also
reason with top-down feedback: neural units are influenced by both lower and
higher-level units.
We do so by treating units as rectified latent variables in a quadratic
energy function, which can be seen as a hierarchical Rectified Gaussian model
(RGs). We show that RGs can be optimized with a quadratic program (QP), that
can in turn be optimized with a recurrent neural network (with rectified linear
units). This allows RGs to be trained with GPU-optimized gradient descent. From
a theoretical perspective, RGs help establish a connection between CNNs and
hierarchical probabilistic models. From a practical perspective, RGs are well
suited for detailed spatial tasks that can benefit from top-down reasoning. We
illustrate them on the challenging task of keypoint localization under
occlusions, where local bottom-up evidence may be misleading. We demonstrate
state-of-the-art results on challenging benchmarks.Comment: To appear in CVPR 201
Deep Learning for Semantic Part Segmentation with High-Level Guidance
In this work we address the task of segmenting an object into its parts, or
semantic part segmentation. We start by adapting a state-of-the-art semantic
segmentation system to this task, and show that a combination of a
fully-convolutional Deep CNN system coupled with Dense CRF labelling provides
excellent results for a broad range of object categories. Still, this approach
remains agnostic to high-level constraints between object parts. We introduce
such prior information by means of the Restricted Boltzmann Machine, adapted to
our task and train our model in an discriminative fashion, as a hidden CRF,
demonstrating that prior information can yield additional improvements. We also
investigate the performance of our approach ``in the wild'', without
information concerning the objects' bounding boxes, using an object detector to
guide a multi-scale segmentation scheme. We evaluate the performance of our
approach on the Penn-Fudan and LFW datasets for the tasks of pedestrian parsing
and face labelling respectively. We show superior performance with respect to
competitive methods that have been extensively engineered on these benchmarks,
as well as realistic qualitative results on part segmentation, even for
occluded or deformable objects. We also provide quantitative and extensive
qualitative results on three classes from the PASCAL Parts dataset. Finally, we
show that our multi-scale segmentation scheme can boost accuracy, recovering
segmentations for finer parts.Comment: 11 pages (including references), 3 figures, 2 table
DEEP FULLY RESIDUAL CONVOLUTIONAL NEURAL NETWORK FOR SEMANTIC IMAGE SEGMENTATION
Department of Computer Science and EngineeringThe goal of semantic image segmentation is to partition the pixels of an image into semantically meaningful parts and classifying those parts according to a predefined label set. Although object recognition
models achieved remarkable performance recently and they even surpass human???s ability to recognize
objects, but semantic segmentation models are still behind. One of the reason that makes semantic
segmentation relatively a hard problem is the image understanding at pixel level by considering global
context as oppose to object recognition. One other challenge is transferring the knowledge of an object
recognition model for the task of semantic segmentation. In this thesis, we are delineating some of the
main challenges we faced approaching semantic image segmentation with machine learning algorithms.
Our main focus was how we can use deep learning algorithms for this task since they require the
least amount of feature engineering and also it was shown that such models can be applied to large scale
datasets and exhibit remarkable performance. More precisely, we worked on a variation of convolutional
neural networks (CNN) suitable for the semantic segmentation task. We proposed a model called deep
fully residual convolutional networks (DFRCN) to tackle this problem. Utilizing residual learning makes
training of deep models feasible which ultimately leads to having a rich powerful visual representation.
Our model also benefits from skip-connections which ease the propagation of information from the
encoder module to the decoder module. This would enable our model to have less parameters in the
decoder module while it also achieves better performance. We also benchmarked the effective variation
of the proposed model on a semantic segmentation benchmark.
We first make a thorough review of current high-performance models and the problems one might
face when trying to replicate such models which mainly arose from the lack of sufficient provided
information. Then, we describe our own novel method which we called deep fully residual convolutional
network (DFRCN). We showed that our method exhibits state of the art performance on a challenging
benchmark for aerial image segmentation.clos
- …