6 research outputs found
Cross-Domain Visual Recognition via Domain Adaptive Dictionary Learning
In real-world visual recognition problems, the assumption that the training
data (source domain) and test data (target domain) are sampled from the same
distribution is often violated. This is known as the domain adaptation problem.
In this work, we propose a novel domain-adaptive dictionary learning framework
for cross-domain visual recognition. Our method generates a set of intermediate
domains. These intermediate domains form a smooth path and bridge the gap
between the source and target domains. Specifically, we not only learn a common
dictionary to encode the domain-shared features, but also learn a set of
domain-specific dictionaries to model the domain shift. The separation of the
common and domain-specific dictionaries enables us to learn more compact and
reconstructive dictionaries for domain adaptation. These dictionaries are
learned by alternating between domain-adaptive sparse coding and dictionary
updating steps. Meanwhile, our approach gradually recovers the feature
representations of both source and target data along the domain path. By
aligning all the recovered domain data, we derive the final domain-adaptive
features for cross-domain visual recognition. Extensive experiments on three
public datasets demonstrates that our approach outperforms most
state-of-the-art methods.Comment: Submitted to IEEE TIP Journa
Disentangling 3D Pose in A Dendritic CNN for Unconstrained 2D Face Alignment
Heatmap regression has been used for landmark localization for quite a while
now. Most of the methods use a very deep stack of bottleneck modules for
heatmap classification stage, followed by heatmap regression to extract the
keypoints. In this paper, we present a single dendritic CNN, termed as Pose
Conditioned Dendritic Convolution Neural Network (PCD-CNN), where a
classification network is followed by a second and modular classification
network, trained in an end to end fashion to obtain accurate landmark points.
Following a Bayesian formulation, we disentangle the 3D pose of a face image
explicitly by conditioning the landmark estimation on pose, making it different
from multi-tasking approaches. Extensive experimentation shows that
conditioning on pose reduces the localization error by making it agnostic to
face pose. The proposed model can be extended to yield variable number of
landmark points and hence broadening its applicability to other datasets.
Instead of increasing depth or width of the network, we train the CNN
efficiently with Mask-Softmax Loss and hard sample mining to achieve upto
reduction in error compared to state-of-the-art methods for extreme and
medium pose face images from challenging datasets including AFLW, AFW, COFW and
IBUG.Comment: CVPR'1
Learning Simple Thresholded Features with Sparse Support Recovery
The thresholded feature has recently emerged as an extremely efficient, yet
rough empirical approximation, of the time-consuming sparse coding inference
process. Such an approximation has not yet been rigorously examined, and
standard dictionaries often lead to non-optimal performance when used for
computing thresholded features. In this paper, we first present two theoretical
recovery guarantees for the thresholded feature to exactly recover the nonzero
support of the sparse code. Motivated by them, we then formulate the Dictionary
Learning for Thresholded Features (DLTF) model, which learns an optimized
dictionary for applying the thresholded feature. In particular, for the norm involved, a novel proximal operator with log-linear time complexity
is derived. We evaluate the performance of DLTF on a vast range of
synthetic and real-data tasks, where DLTF demonstrates remarkable efficiency,
effectiveness and robustness in all experiments. In addition, we briefly
discuss the potential link between DLTF and deep learning building blocks.Comment: Accepted by IEEE Transactions on Circuits and Systems for Video
Technology (TCSVT
Synthesis of High-Quality Visible Faces from Polarimetric Thermal Faces using Generative Adversarial Networks
The large domain discrepancy between faces captured in polarimetric (or
conventional) thermal and visible domain makes cross-domain face verification a
highly challenging problem for human examiners as well as computer vision
algorithms. Previous approaches utilize either a two-step procedure (visible
feature estimation and visible image reconstruction) or an input-level fusion
technique, where different Stokes images are concatenated and used as a
multi-channel input to synthesize the visible image given the corresponding
polarimetric signatures. Although these methods have yielded improvements, we
argue that input-level fusion alone may not be sufficient to realize the full
potential of the available Stokes images. We propose a Generative Adversarial
Networks (GAN) based multi-stream feature-level fusion technique to synthesize
high-quality visible images from prolarimetric thermal images. The proposed
network consists of a generator sub-network, constructed using an
encoder-decoder network based on dense residual blocks, and a multi-scale
discriminator sub-network. The generator network is trained by optimizing an
adversarial loss in addition to a perceptual loss and an identity preserving
loss to enable photo realistic generation of visible images while preserving
discriminative characteristics. An extended dataset consisting of polarimetric
thermal facial signatures of 111 subjects is also introduced. Multiple
experiments evaluated on different experimental protocols demonstrate that the
proposed method achieves state-of-the-art performance. Code will be made
available at https://github.com/hezhangsprinter.Comment: Note that the extended dataset is available upon request. Researchers
can contact Dr. Sean Hu from ARL at [email protected] to obtain the
datase
Crystal Loss and Quality Pooling for Unconstrained Face Verification and Recognition
In recent years, the performance of face verification and recognition systems
based on deep convolutional neural networks (DCNNs) has significantly improved.
A typical pipeline for face verification includes training a deep network for
subject classification with softmax loss, using the penultimate layer output as
the feature descriptor, and generating a cosine similarity score given a pair
of face images or videos. The softmax loss function does not optimize the
features to have higher similarity score for positive pairs and lower
similarity score for negative pairs, which leads to a performance gap. In this
paper, we propose a new loss function, called Crystal Loss, that restricts the
features to lie on a hypersphere of a fixed radius. The loss can be easily
implemented using existing deep learning frameworks. We show that integrating
this simple step in the training pipeline significantly improves the
performance of face verification and recognition systems. We achieve
state-of-the-art performance for face verification and recognition on
challenging LFW, IJB-A, IJB-B and IJB-C datasets over a large range of false
alarm rates (10-1 to 10-7).Comment: Previously portions of this work appeared in arXiv:1703.09507, which
was a conference version. This version is an extended journal version of i
Deep Regionlets: Blended Representation and Deep Learning for Generic Object Detection
In this paper, we propose a novel object detection algorithm named "Deep
Regionlets" by integrating deep neural networks and a conventional detection
schema for accurate generic object detection. Motivated by the effectiveness of
regionlets for modeling object deformations and multiple aspect ratios, we
incorporate regionlets into an end-to-end trainable deep learning framework.
The deep regionlets framework consists of a region selection network and a deep
regionlet learning module. Specifically, given a detection bounding box
proposal, the region selection network provides guidance on where to select
sub-regions from which features can be learned from. An object proposal
typically contains 3-16 sub-regions. The regionlet learning module focuses on
local feature selection and transformations to alleviate the effects of
appearance variations. To this end, we first realize non-rectangular region
selection within the detection framework to accommodate variations in object
appearance. Moreover, we design a "gating network" within the regionlet leaning
module to enable instance dependent soft feature selection and pooling. The
Deep Regionlets framework is trained end-to-end without additional efforts. We
present ablation studies and extensive experiments on the PASCAL VOC dataset
and the Microsoft COCO dataset. The proposed method yields competitive
performance over state-of-the-art algorithms, such as RetinaNet and Mask R-CNN,
even without additional segmentation labels.Comment: arXiv admin note: substantial text overlap with arXiv:1712.0240