18,839 research outputs found
Matching-CNN Meets KNN: Quasi-Parametric Human Parsing
Both parametric and non-parametric approaches have demonstrated encouraging
performances in the human parsing task, namely segmenting a human image into
several semantic regions (e.g., hat, bag, left arm, face). In this work, we aim
to develop a new solution with the advantages of both methodologies, namely
supervision from annotated data and the flexibility to use newly annotated
(possibly uncommon) images, and present a quasi-parametric human parsing model.
Under the classic K Nearest Neighbor (KNN)-based nonparametric framework, the
parametric Matching Convolutional Neural Network (M-CNN) is proposed to predict
the matching confidence and displacements of the best matched region in the
testing image for a particular semantic region in one KNN image. Given a
testing image, we first retrieve its KNN images from the
annotated/manually-parsed human image corpus. Then each semantic region in each
KNN image is matched with confidence to the testing image using M-CNN, and the
matched regions from all KNN images are further fused, followed by a superpixel
smoothing procedure to obtain the ultimate human parsing result. The M-CNN
differs from the classic CNN in that the tailored cross image matching filters
are introduced to characterize the matching between the testing image and the
semantic region of a KNN image. The cross image matching filters are defined at
different convolutional layers, each aiming to capture a particular range of
displacements. Comprehensive evaluations over a large dataset with 7,700
annotated human images well demonstrate the significant performance gain from
the quasi-parametric model over the state-of-the-arts, for the human parsing
task.Comment: This manuscript is the accepted version for CVPR 201
Driving Scene Perception Network: Real-time Joint Detection, Depth Estimation and Semantic Segmentation
As the demand for enabling high-level autonomous driving has increased in
recent years and visual perception is one of the critical features to enable
fully autonomous driving, in this paper, we introduce an efficient approach for
simultaneous object detection, depth estimation and pixel-level semantic
segmentation using a shared convolutional architecture. The proposed network
model, which we named Driving Scene Perception Network (DSPNet), uses
multi-level feature maps and multi-task learning to improve the accuracy and
efficiency of object detection, depth estimation and image segmentation tasks
from a single input image. Hence, the resulting network model uses less than
850 MiB of GPU memory and achieves 14.0 fps on NVIDIA GeForce GTX 1080 with a
1024x512 input image, and both precision and efficiency have been improved over
combination of single tasks.Comment: 9 pages, 7 figures, WACV'1
Multi-view Face Detection Using Deep Convolutional Neural Networks
In this paper we consider the problem of multi-view face detection. While
there has been significant research on this problem, current state-of-the-art
approaches for this task require annotation of facial landmarks, e.g. TSM [25],
or annotation of face poses [28, 22]. They also require training dozens of
models to fully capture faces in all orientations, e.g. 22 models in HeadHunter
method [22]. In this paper we propose Deep Dense Face Detector (DDFD), a method
that does not require pose/landmark annotation and is able to detect faces in a
wide range of orientations using a single model based on deep convolutional
neural networks. The proposed method has minimal complexity; unlike other
recent deep learning object detection methods [9], it does not require
additional components such as segmentation, bounding-box regression, or SVM
classifiers. Furthermore, we analyzed scores of the proposed face detector for
faces in different orientations and found that 1) the proposed method is able
to detect faces from different angles and can handle occlusion to some extent,
2) there seems to be a correlation between dis- tribution of positive examples
in the training set and scores of the proposed face detector. The latter
suggests that the proposed methods performance can be further improved by using
better sampling strategies and more sophisticated data augmentation techniques.
Evaluations on popular face detection benchmark datasets show that our
single-model face detector algorithm has similar or better performance compared
to the previous methods, which are more complex and require annotations of
either different poses or facial landmarks.Comment: in International Conference on Multimedia Retrieval 2015 (ICMR
Mask-guided Style Transfer Network for Purifying Real Images
Recently, the progress of learning-by-synthesis has proposed a training model
for synthetic images, which can effectively reduce the cost of human and
material resources. However, due to the different distribution of synthetic
images compared with real images, the desired performance cannot be achieved.
To solve this problem, the previous method learned a model to improve the
realism of the synthetic images. Different from the previous methods, this
paper try to purify real image by extracting discriminative and robust features
to convert outdoor real images to indoor synthetic images. In this paper, we
first introduce the segmentation masks to construct RGB-mask pairs as inputs,
then we design a mask-guided style transfer network to learn style features
separately from the attention and bkgd(background) regions and learn content
features from full and attention region. Moreover, we propose a novel
region-level task-guided loss to restrain the features learnt from style and
content. Experiments were performed using mixed studies (qualitative and
quantitative) methods to demonstrate the possibility of purifying real images
in complex directions. We evaluate the proposed method on various public
datasets, including LPW, COCO and MPIIGaze. Experimental results show that the
proposed method is effective and achieves the state-of-the-art results.Comment: arXiv admin note: substantial text overlap with arXiv:1903.0582
- …