206 research outputs found
Translation-Invariant Representation for Cumulative Foot Pressure Images
Human can be distinguished by different limb movements and unique ground
reaction force. Cumulative foot pressure image is a 2-D cumulative ground
reaction force during one gait cycle. Although it contains pressure spatial
distribution information and pressure temporal distribution information, it
suffers from several problems including different shoes and noise, when putting
it into practice as a new biometric for pedestrian identification. In this
paper, we propose a hierarchical translation-invariant representation for
cumulative foot pressure images, inspired by the success of Convolutional deep
belief network for digital classification. Key contribution in our approach is
discriminative hierarchical sparse coding scheme which helps to learn useful
discriminative high-level visual features. Based on the feature representation
of cumulative foot pressure images, we develop a pedestrian recognition system
which is invariant to three different shoes and slight local shape change.
Experiments are conducted on a proposed open dataset that contains more than
2800 cumulative foot pressure images from 118 subjects. Evaluations suggest the
effectiveness of the proposed method and the potential of cumulative foot
pressure images as a biometric.Comment: 6 page
A Light CNN for Deep Face Representation with Noisy Labels
The volume of convolutional neural network (CNN) models proposed for face
recognition has been continuously growing larger to better fit large amount of
training data. When training data are obtained from internet, the labels are
likely to be ambiguous and inaccurate. This paper presents a Light CNN
framework to learn a compact embedding on the large-scale face data with
massive noisy labels. First, we introduce a variation of maxout activation,
called Max-Feature-Map (MFM), into each convolutional layer of CNN. Different
from maxout activation that uses many feature maps to linearly approximate an
arbitrary convex activation function, MFM does so via a competitive
relationship. MFM can not only separate noisy and informative signals but also
play the role of feature selection between two feature maps. Second, three
networks are carefully designed to obtain better performance meanwhile reducing
the number of parameters and computational costs. Lastly, a semantic
bootstrapping method is proposed to make the prediction of the networks more
consistent with noisy labels. Experimental results show that the proposed
framework can utilize large-scale noisy data to learn a Light model that is
efficient in computational costs and storage spaces. The learned single network
with a 256-D representation achieves state-of-the-art results on various face
benchmarks without fine-tuning. The code is released on
https://github.com/AlfredXiangWu/LightCNN.Comment: arXiv admin note: text overlap with arXiv:1507.04844. The models are
released on https://github.com/AlfredXiangWu/LightCNN, IEEE Transactions on
Information Forensics and Security, 201
Deep Semantic Ranking Based Hashing for Multi-Label Image Retrieval
With the rapid growth of web images, hashing has received increasing
interests in large scale image retrieval. Research efforts have been devoted to
learning compact binary codes that preserve semantic similarity based on
labels. However, most of these hashing methods are designed to handle simple
binary similarity. The complex multilevel semantic structure of images
associated with multiple labels have not yet been well explored. Here we
propose a deep semantic ranking based method for learning hash functions that
preserve multilevel semantic similarity between multi-label images. In our
approach, deep convolutional neural network is incorporated into hash functions
to jointly learn feature representations and mappings from them to hash codes,
which avoids the limitation of semantic representation power of hand-crafted
features. Meanwhile, a ranking list that encodes the multilevel similarity
information is employed to guide the learning of such deep hash functions. An
effective scheme based on surrogate loss is used to solve the intractable
optimization problem of nonsmooth and multivariate ranking measures involved in
the learning procedure. Experimental results show the superiority of our
proposed approach over several state-of-the-art hashing methods in term of
ranking evaluation metrics when tested on multi-label image datasets.Comment: CVPR 201
Coupled Deep Learning for Heterogeneous Face Recognition
Heterogeneous face matching is a challenge issue in face recognition due to
large domain difference as well as insufficient pairwise images in different
modalities during training. This paper proposes a coupled deep learning (CDL)
approach for the heterogeneous face matching. CDL seeks a shared feature space
in which the heterogeneous face matching problem can be approximately treated
as a homogeneous face matching problem. The objective function of CDL mainly
includes two parts. The first part contains a trace norm and a block-diagonal
prior as relevance constraints, which not only make unpaired images from
multiple modalities be clustered and correlated, but also regularize the
parameters to alleviate overfitting. An approximate variational formulation is
introduced to deal with the difficulties of optimizing low-rank constraint
directly. The second part contains a cross modal ranking among triplet domain
specific images to maximize the margin for different identities and increase
data for a small amount of training samples. Besides, an alternating
minimization method is employed to iteratively update the parameters of CDL.
Experimental results show that CDL achieves better performance on the
challenging CASIA NIR-VIS 2.0 face recognition database, the IIIT-D Sketch
database, the CUHK Face Sketch (CUFS), and the CUHK Face Sketch FERET (CUFSF),
which significantly outperforms state-of-the-art heterogeneous face recognition
methods.Comment: AAAI 201
Deep Steganalysis: End-to-End Learning with Supervisory Information beyond Class Labels
Recently, deep learning has shown its power in steganalysis. However, the
proposed deep models have been often learned from pre-calculated noise
residuals with fixed high-pass filters rather than from raw images. In this
paper, we propose a new end-to-end learning framework that can learn
steganalytic features directly from pixels. In the meantime, the high-pass
filters are also automatically learned. Besides class labels, we make use of
additional pixel level supervision of cover-stego image pair to jointly and
iteratively train the proposed network which consists of a residual calculation
network and a steganalysis network. The experimental results prove the
effectiveness of the proposed architecture
Deep Supervised Discrete Hashing
With the rapid growth of image and video data on the web, hashing has been
extensively studied for image or video search in recent years. Benefit from
recent advances in deep learning, deep hashing methods have achieved promising
results for image retrieval. However, there are some limitations of previous
deep hashing methods (e.g., the semantic information is not fully exploited).
In this paper, we develop a deep supervised discrete hashing algorithm based on
the assumption that the learned binary codes should be ideal for
classification. Both the pairwise label information and the classification
information are used to learn the hash codes within one stream framework. We
constrain the outputs of the last layer to be binary codes directly, which is
rarely investigated in deep hashing algorithm. Because of the discrete nature
of hash codes, an alternating minimization method is used to optimize the
objective function. Experimental results have shown that our method outperforms
current state-of-the-art methods on benchmark datasets.Comment: Accepted by NIPS 201
Wasserstein CNN: Learning Invariant Features for NIR-VIS Face Recognition
Heterogeneous face recognition (HFR) aims to match facial images acquired
from different sensing modalities with mission-critical applications in
forensics, security and commercial sectors. However, HFR is a much more
challenging problem than traditional face recognition because of large
intra-class variations of heterogeneous face images and limited training
samples of cross-modality face image pairs. This paper proposes a novel
approach namely Wasserstein CNN (convolutional neural networks, or WCNN for
short) to learn invariant features between near-infrared and visual face images
(i.e. NIR-VIS face recognition). The low-level layers of WCNN are trained with
widely available face images in visual spectrum. The high-level layer is
divided into three parts, i.e., NIR layer, VIS layer and NIR-VIS shared layer.
The first two layers aims to learn modality-specific features and NIR-VIS
shared layer is designed to learn modality-invariant feature subspace.
Wasserstein distance is introduced into NIR-VIS shared layer to measure the
dissimilarity between heterogeneous feature distributions. So W-CNN learning
aims to achieve the minimization of Wasserstein distance between NIR
distribution and VIS distribution for invariant deep feature representation of
heterogeneous face images. To avoid the over-fitting problem on small-scale
heterogeneous face data, a correlation prior is introduced on the
fully-connected layers of WCNN network to reduce parameter space. This prior is
implemented by a low-rank constraint in an end-to-end network. The joint
formulation leads to an alternating minimization for deep feature
representation at training stage and an efficient computation for heterogeneous
data at testing stage. Extensive experiments on three challenging NIR-VIS face
recognition databases demonstrate the significant superiority of Wasserstein
CNN over state-of-the-art methods
Accelerating Deep Neural Networks with Spatial Bottleneck Modules
This paper presents an efficient module named spatial bottleneck for
accelerating the convolutional layers in deep neural networks. The core idea is
to decompose convolution into two stages, which first reduce the spatial
resolution of the feature map, and then restore it to the desired size. This
operation decreases the sampling density in the spatial domain, which is
independent yet complementary to network acceleration approaches in the channel
domain. Using different sampling rates, we can tradeoff between recognition
accuracy and model complexity.
As a basic building block, spatial bottleneck can be used to replace any
single convolutional layer, or the combination of two convolutional layers. We
empirically verify the effectiveness of spatial bottleneck by applying it to
the deep residual networks. Spatial bottleneck achieves 2x and 1.4x speedup on
the regular and channel-bottlenecked residual blocks, respectively, with the
accuracies retained in recognizing low-resolution images, and even improved in
recognizing high-resolution images.Comment: 9 pages, 5 figure
Learning Structured Ordinal Measures for Video based Face Recognition
This paper presents a structured ordinal measure method for video-based face
recognition that simultaneously learns ordinal filters and structured ordinal
features. The problem is posed as a non-convex integer program problem that
includes two parts. The first part learns stable ordinal filters to project
video data into a large-margin ordinal space. The second seeks self-correcting
and discrete codes by balancing the projected data and a rank-one ordinal
matrix in a structured low-rank way. Unsupervised and supervised structures are
considered for the ordinal matrix. In addition, as a complement to hierarchical
structures, deep feature representations are integrated into our method to
enhance coding stability. An alternating minimization method is employed to
handle the discrete and low-rank constraints, yielding high-quality codes that
capture prior structures well. Experimental results on three commonly used face
video databases show that our method with a simple voting classifier can
achieve state-of-the-art recognition rates using fewer features and samples
Supervised Discrete Hashing with Relaxation
Data-dependent hashing has recently attracted attention due to being able to
support efficient retrieval and storage of high-dimensional data such as
documents, images, and videos. In this paper, we propose a novel learning-based
hashing method called "Supervised Discrete Hashing with Relaxation" (SDHR)
based on "Supervised Discrete Hashing" (SDH). SDH uses ordinary least squares
regression and traditional zero-one matrix encoding of class label information
as the regression target (code words), thus fixing the regression target. In
SDHR, the regression target is instead optimized. The optimized regression
target matrix satisfies a large margin constraint for correct classification of
each example. Compared with SDH, which uses the traditional zero-one matrix,
SDHR utilizes the learned regression target matrix and, therefore, more
accurately measures the classification error of the regression model and is
more flexible. As expected, SDHR generally outperforms SDH. Experimental
results on two large-scale image datasets (CIFAR-10 and MNIST) and a
large-scale and challenging face dataset (FRGC) demonstrate the effectiveness
and efficiency of SDHR
- …