55,481 research outputs found
Structure propagation for zero-shot learning
The key of zero-shot learning (ZSL) is how to find the information transfer
model for bridging the gap between images and semantic information (texts or
attributes). Existing ZSL methods usually construct the compatibility function
between images and class labels with the consideration of the relevance on the
semantic classes (the manifold structure of semantic classes). However, the
relationship of image classes (the manifold structure of image classes) is also
very important for the compatibility model construction. It is difficult to
capture the relationship among image classes due to unseen classes, so that the
manifold structure of image classes often is ignored in ZSL. To complement each
other between the manifold structure of image classes and that of semantic
classes information, we propose structure propagation (SP) for improving the
performance of ZSL for classification. SP can jointly consider the manifold
structure of image classes and that of semantic classes for approximating to
the intrinsic structure of object classes. Moreover, the SP can describe the
constrain condition between the compatibility function and these manifold
structures for balancing the influence of the structure propagation iteration.
The SP solution provides not only unseen class labels but also the relationship
of two manifold structures that encode the positive transfer in structure
propagation. Experimental results demonstrate that SP can attain the promising
results on the AwA, CUB, Dogs and SUN databases
Semi-supervised Segmentation Fusion of Multi-spectral and Aerial Images
A Semi-supervised Segmentation Fusion algorithm is proposed using consensus
and distributed learning. The aim of Unsupervised Segmentation Fusion (USF) is
to achieve a consensus among different segmentation outputs obtained from
different segmentation algorithms by computing an approximate solution to the
NP problem with less computational complexity. Semi-supervision is incorporated
in USF using a new algorithm called Semi-supervised Segmentation Fusion (SSSF).
In SSSF, side information about the co-occurrence of pixels in the same or
different segments is formulated as the constraints of a convex optimization
problem. The results of the experiments employed on artificial and real-world
benchmark multi-spectral and aerial images show that the proposed algorithms
perform better than the individual state-of-the art segmentation algorithms.Comment: A version of the manuscript was published in ICPR 201
Unsupervised Domain Adaptation for Face Recognition in Unlabeled Videos
Despite rapid advances in face recognition, there remains a clear gap between
the performance of still image-based face recognition and video-based face
recognition, due to the vast difference in visual quality between the domains
and the difficulty of curating diverse large-scale video datasets. This paper
addresses both of those challenges, through an image to video feature-level
domain adaptation approach, to learn discriminative video frame
representations. The framework utilizes large-scale unlabeled video data to
reduce the gap between different domains while transferring discriminative
knowledge from large-scale labeled still images. Given a face recognition
network that is pretrained in the image domain, the adaptation is achieved by
(i) distilling knowledge from the network to a video adaptation network through
feature matching, (ii) performing feature restoration through synthetic data
augmentation and (iii) learning a domain-invariant feature through a domain
adversarial discriminator. We further improve performance through a
discriminator-guided feature fusion that boosts high-quality frames while
eliminating those degraded by video domain-specific factors. Experiments on the
YouTube Faces and IJB-A datasets demonstrate that each module contributes to
our feature-level domain adaptation framework and substantially improves video
face recognition performance to achieve state-of-the-art accuracy. We
demonstrate qualitatively that the network learns to suppress diverse artifacts
in videos such as pose, illumination or occlusion without being explicitly
trained for them.Comment: accepted for publication at International Conference on Computer
Vision (ICCV) 201
Integrated Deep and Shallow Networks for Salient Object Detection
Deep convolutional neural network (CNN) based salient object detection
methods have achieved state-of-the-art performance and outperform those
unsupervised methods with a wide margin. In this paper, we propose to integrate
deep and unsupervised saliency for salient object detection under a unified
framework. Specifically, our method takes results of unsupervised saliency
(Robust Background Detection, RBD) and normalized color images as inputs, and
directly learns an end-to-end mapping between inputs and the corresponding
saliency maps. The color images are fed into a Fully Convolutional Neural
Networks (FCNN) adapted from semantic segmentation to exploit high-level
semantic cues for salient object detection. Then the results from deep FCNN and
RBD are concatenated to feed into a shallow network to map the concatenated
feature maps to saliency maps. Finally, to obtain a spatially consistent
saliency map with sharp object boundaries, we fuse superpixel level saliency
map at multi-scale. Extensive experimental results on 8 benchmark datasets
demonstrate that the proposed method outperforms the state-of-the-art
approaches with a margin.Comment: Accepted by IEEE International Conference on Image Processing (ICIP)
201
- …
