208 research outputs found
Deep Adaptive Feature Embedding with Local Sample Distributions for Person Re-identification
Person re-identification (re-id) aims to match pedestrians observed by
disjoint camera views. It attracts increasing attention in computer vision due
to its importance to surveillance system. To combat the major challenge of
cross-view visual variations, deep embedding approaches are proposed by
learning a compact feature space from images such that the Euclidean distances
correspond to their cross-view similarity metric. However, the global Euclidean
distance cannot faithfully characterize the ideal similarity in a complex
visual feature space because features of pedestrian images exhibit unknown
distributions due to large variations in poses, illumination and occlusion.
Moreover, intra-personal training samples within a local range are robust to
guide deep embedding against uncontrolled variations, which however, cannot be
captured by a global Euclidean distance. In this paper, we study the problem of
person re-id by proposing a novel sampling to mine suitable \textit{positives}
(i.e. intra-class) within a local range to improve the deep embedding in the
context of large intra-class variations. Our method is capable of learning a
deep similarity metric adaptive to local sample structure by minimizing each
sample's local distances while propagating through the relationship between
samples to attain the whole intra-class minimization. To this end, a novel
objective function is proposed to jointly optimize similarity metric learning,
local positive mining and robust deep embedding. This yields local
discriminations by selecting local-ranged positive samples, and the learned
features are robust to dramatic intra-class variations. Experiments on
benchmarks show state-of-the-art results achieved by our method.Comment: Published on Pattern Recognitio
What-and-Where to Match: Deep Spatially Multiplicative Integration Networks for Person Re-identification
Matching pedestrians across disjoint camera views, known as person
re-identification (re-id), is a challenging problem that is of importance to
visual recognition and surveillance. Most existing methods exploit local
regions within spatial manipulation to perform matching in local
correspondence. However, they essentially extract \emph{fixed} representations
from pre-divided regions for each image and perform matching based on the
extracted representation subsequently. For models in this pipeline, local finer
patterns that are crucial to distinguish positive pairs from negative ones
cannot be captured, and thus making them underperformed. In this paper, we
propose a novel deep multiplicative integration gating function, which answers
the question of \emph{what-and-where to match} for effective person re-id. To
address \emph{what} to match, our deep network emphasizes common local patterns
by learning joint representations in a multiplicative way. The network
comprises two Convolutional Neural Networks (CNNs) to extract convolutional
activations, and generates relevant descriptors for pedestrian matching. This
thus, leads to flexible representations for pair-wise images. To address
\emph{where} to match, we combat the spatial misalignment by performing
spatially recurrent pooling via a four-directional recurrent neural network to
impose spatial dependency over all positions with respect to the entire image.
The proposed network is designed to be end-to-end trainable to characterize
local pairwise feature interactions in a spatially aligned manner. To
demonstrate the superiority of our method, extensive experiments are conducted
over three benchmark data sets: VIPeR, CUHK03 and Market-1501.Comment: Published at Pattern Recognition, Elsevie
Locality Preserving Projections for Grassmann manifold
Learning on Grassmann manifold has become popular in many computer vision
tasks, with the strong capability to extract discriminative information for
imagesets and videos. However, such learning algorithms particularly on
high-dimensional Grassmann manifold always involve with significantly high
computational cost, which seriously limits the applicability of learning on
Grassmann manifold in more wide areas. In this research, we propose an
unsupervised dimensionality reduction algorithm on Grassmann manifold based on
the Locality Preserving Projections (LPP) criterion. LPP is a commonly used
dimensionality reduction algorithm for vector-valued data, aiming to preserve
local structure of data in the dimension-reduced space. The strategy is to
construct a mapping from higher dimensional Grassmann manifold into the one in
a relative low-dimensional with more discriminative capability. The proposed
method can be optimized as a basic eigenvalue problem. The performance of our
proposed method is assessed on several classification and clustering tasks and
the experimental results show its clear advantages over other Grassmann based
algorithms.Comment: Accepted by IJCAI 201
Matrix Neural Networks
Traditional neural networks assume vectorial inputs as the network is arranged as layers of single line of computing units called neurons. This special structure requires the non-vectorial inputs such as matrices to be converted into vectors. This process can be problematic. Firstly, the spatial information among elements of the data may be lost during vectorisation. Secondly, the solution space becomes very large which demands very special treatments to the network parameters and high computational cost. To address these issues, we propose matrix neural networks (MatNet), which takes matrices directly as inputs. Each neuron senses summarised information through bilinear mapping from lower layer units in exactly the same way as the classic feed forward neural networks. Under this structure, back prorogation and gradient descent combination can be utilised to obtain network parameters e ciently. Furthermore, it can be conveniently extended for multimodal inputs. We apply MatNet to MNIST handwritten digits classi cation and image super resolution tasks to show its e ectiveness. Without too much tweaking MatNet achieves comparable performance as the state-of-the-art methods in both tasks with considerably reduced complexity
- …