1,673 research outputs found
Graph-FCN for image semantic segmentation
Semantic segmentation with deep learning has achieved great progress in
classifying the pixels in the image. However, the local location information is
usually ignored in the high-level feature extraction by the deep learning,
which is important for image semantic segmentation. To avoid this problem, we
propose a graph model initialized by a fully convolutional network (FCN) named
Graph-FCN for image semantic segmentation. Firstly, the image grid data is
extended to graph structure data by a convolutional network, which transforms
the semantic segmentation problem into a graph node classification problem.
Then we apply graph convolutional network to solve this graph node
classification problem. As far as we know, it is the first time that we apply
the graph convolutional network in image semantic segmentation. Our method
achieves competitive performance in mean intersection over union (mIOU) on the
VOC dataset(about 1.34% improvement), compared to the original FCN model
Deep Interactive Object Selection
Interactive object selection is a very important research problem and has
many applications. Previous algorithms require substantial user interactions to
estimate the foreground and background distributions. In this paper, we present
a novel deep learning based algorithm which has a much better understanding of
objectness and thus can reduce user interactions to just a few clicks. Our
algorithm transforms user provided positive and negative clicks into two
Euclidean distance maps which are then concatenated with the RGB channels of
images to compose (image, user interactions) pairs. We generate many of such
pairs by combining several random sampling strategies to model user click
patterns and use them to fine tune deep Fully Convolutional Networks (FCNs).
Finally the output probability maps of our FCN 8s model is integrated with
graph cut optimization to refine the boundary segments. Our model is trained on
the PASCAL segmentation dataset and evaluated on other datasets with different
object classes. Experimental results on both seen and unseen objects clearly
demonstrate that our algorithm has a good generalization ability and is
superior to all existing interactive object selection approaches.Comment: Computer Vision and Pattern Recognitio
High-Quality Correspondence and Segmentation Estimation for Dual-Lens Smart-Phone Portraits
Estimating correspondence between two images and extracting the foreground
object are two challenges in computer vision. With dual-lens smart phones, such
as iPhone 7Plus and Huawei P9, coming into the market, two images of slightly
different views provide us new information to unify the two topics. We propose
a joint method to tackle them simultaneously via a joint fully connected
conditional random field (CRF) framework. The regional correspondence is used
to handle textureless regions in matching and make our CRF system
computationally efficient. Our method is evaluated over 2,000 new image pairs,
and produces promising results on challenging portrait images
Automatic Real-time Background Cut for Portrait Videos
We in this paper solve the problem of high-quality automatic real-time
background cut for 720p portrait videos. We first handle the background
ambiguity issue in semantic segmentation by proposing a global background
attenuation model. A spatial-temporal refinement network is developed to
further refine the segmentation errors in each frame and ensure temporal
coherence in the segmentation map. We form an end-to-end network for training
and testing. Each module is designed considering efficiency and accuracy. We
build a portrait dataset, which includes 8,000 images with high-quality labeled
map for training and testing. To further improve the performance, we build a
portrait video dataset with 50 sequences to fine-tune video segmentation. Our
framework benefits many video processing applications
Joint Multi-Person Pose Estimation and Semantic Part Segmentation
Human pose estimation and semantic part segmentation are two complementary
tasks in computer vision. In this paper, we propose to solve the two tasks
jointly for natural multi-person images, in which the estimated pose provides
object-level shape prior to regularize part segments while the part-level
segments constrain the variation of pose locations. Specifically, we first
train two fully convolutional neural networks (FCNs), namely Pose FCN and Part
FCN, to provide initial estimation of pose joint potential and semantic part
potential. Then, to refine pose joint location, the two types of potentials are
fused with a fully-connected conditional random field (FCRF), where a novel
segment-joint smoothness term is used to encourage semantic and spatial
consistency between parts and joints. To refine part segments, the refined pose
and the original part potential are integrated through a Part FCN, where the
skeleton feature from pose serves as additional regularization cues for part
segments. Finally, to reduce the complexity of the FCRF, we induce human
detection boxes and infer the graph inside each box, making the inference forty
times faster.
Since there's no dataset that contains both part segments and pose labels, we
extend the PASCAL VOC part dataset with human pose joints and perform extensive
experiments to compare our method against several most recent strategies. We
show that on this dataset our algorithm surpasses competing methods by a large
margin in both tasks.Comment: This paper has been accepted by CVPR 201
Keypoint Based Weakly Supervised Human Parsing
Fully convolutional networks (FCN) have achieved great success in human
parsing in recent years. In conventional human parsing tasks, pixel-level
labeling is required for guiding the training, which usually involves enormous
human labeling efforts. To ease the labeling efforts, we propose a novel weakly
supervised human parsing method which only requires simple object keypoint
annotations for learning. We develop an iterative learning method to generate
pseudo part segmentation masks from keypoint labels. With these pseudo masks,
we train an FCN network to output pixel-level human parsing predictions.
Furthermore, we develop a correlation network to perform joint prediction of
part and object segmentation masks and improve the segmentation performance.
The experiment results show that our weakly supervised method is able to
achieve very competitive human parsing results. Despite our method only uses
simple keypoint annotations for learning, we are able to achieve comparable
performance with fully supervised methods which use the expensive pixel-level
annotations
Scene Parsing via Dense Recurrent Neural Networks with Attentional Selection
Recurrent neural networks (RNNs) have shown the ability to improve scene
parsing through capturing long-range dependencies among image units. In this
paper, we propose dense RNNs for scene labeling by exploring various long-range
semantic dependencies among image units. Different from existing RNN based
approaches, our dense RNNs are able to capture richer contextual dependencies
for each image unit by enabling immediate connections between each pair of
image units, which significantly enhances their discriminative power. Besides,
to select relevant dependencies and meanwhile to restrain irrelevant ones for
each unit from dense connections, we introduce an attention model into dense
RNNs. The attention model allows automatically assigning more importance to
helpful dependencies while less weight to unconcerned dependencies. Integrating
with convolutional neural networks (CNNs), we develop an end-to-end scene
labeling system. Extensive experiments on three large-scale benchmarks
demonstrate that the proposed approach can improve the baselines by large
margins and outperform other state-of-the-art algorithms.Comment: 10 pages. arXiv admin note: substantial text overlap with
arXiv:1801.0683
A fully end-to-end deep learning approach for real-time simultaneous 3D reconstruction and material recognition
This paper addresses the problem of simultaneous 3D reconstruction and
material recognition and segmentation. Enabling robots to recognise different
materials (concrete, metal etc.) in a scene is important for many tasks, e.g.
robotic interventions in nuclear decommissioning. Previous work on 3D semantic
reconstruction has predominantly focused on recognition of everyday domestic
objects (tables, chairs etc.), whereas previous work on material recognition
has largely been confined to single 2D images without any 3D reconstruction.
Meanwhile, most 3D semantic reconstruction methods rely on computationally
expensive post-processing, using Fully-Connected Conditional Random Fields
(CRFs), to achieve consistent segmentations. In contrast, we propose a deep
learning method which performs 3D reconstruction while simultaneously
recognising different types of materials and labelling them at the pixel level.
Unlike previous methods, we propose a fully end-to-end approach, which does not
require hand-crafted features or CRF post-processing. Instead, we use only
learned features, and the CRF segmentation constraints are incorporated inside
the fully end-to-end learned system. We present the results of experiments, in
which we trained our system to perform real-time 3D semantic reconstruction for
23 different materials in a real-world application. The run-time performance of
the system can be boosted to around 10Hz, using a conventional GPU, which is
enough to achieve real-time semantic reconstruction using a 30fps RGB-D camera.
To the best of our knowledge, this work is the first real-time end-to-end
system for simultaneous 3D reconstruction and material recognition.Comment: 8 pages, 7 figures, 4 table
A Review on Deep Learning Techniques Applied to Semantic Segmentation
Image semantic segmentation is more and more being of interest for computer
vision and machine learning researchers. Many applications on the rise need
accurate and efficient segmentation mechanisms: autonomous driving, indoor
navigation, and even virtual or augmented reality systems to name a few. This
demand coincides with the rise of deep learning approaches in almost every
field or application target related to computer vision, including semantic
segmentation or scene understanding. This paper provides a review on deep
learning methods for semantic segmentation applied to various application
areas. Firstly, we describe the terminology of this field as well as mandatory
background concepts. Next, the main datasets and challenges are exposed to help
researchers decide which are the ones that best suit their needs and their
targets. Then, existing methods are reviewed, highlighting their contributions
and their significance in the field. Finally, quantitative results are given
for the described methods and the datasets in which they were evaluated,
following up with a discussion of the results. At last, we point out a set of
promising future works and draw our own conclusions about the state of the art
of semantic segmentation using deep learning techniques.Comment: Submitted to TPAMI on Apr. 22, 201
Dense Recurrent Neural Networks for Scene Labeling
Recently recurrent neural networks (RNNs) have demonstrated the ability to
improve scene labeling through capturing long-range dependencies among image
units. In this paper, we propose dense RNNs for scene labeling by exploring
various long-range semantic dependencies among image units. In comparison with
existing RNN based approaches, our dense RNNs are able to capture richer
contextual dependencies for each image unit via dense connections between each
pair of image units, which significantly enhances their discriminative power.
Besides, to select relevant and meanwhile restrain irrelevant dependencies for
each unit from dense connections, we introduce an attention model into dense
RNNs. The attention model enables automatically assigning more importance to
helpful dependencies while less weight to unconcerned dependencies. Integrating
with convolutional neural networks (CNNs), our method achieves state-of-the-art
performances on the PASCAL Context, MIT ADE20K and SiftFlow benchmarks.Comment: Tech. Repor
- …