1,788,073 research outputs found
Cooperative Training of Deep Aggregation Networks for RGB-D Action Recognition
A novel deep neural network training paradigm that exploits the conjoint
information in multiple heterogeneous sources is proposed. Specifically, in a
RGB-D based action recognition task, it cooperatively trains a single
convolutional neural network (named c-ConvNet) on both RGB visual features and
depth features, and deeply aggregates the two kinds of features for action
recognition. Differently from the conventional ConvNet that learns the deep
separable features for homogeneous modality-based classification with only one
softmax loss function, the c-ConvNet enhances the discriminative power of the
deeply learned features and weakens the undesired modality discrepancy by
jointly optimizing a ranking loss and a softmax loss for both homogeneous and
heterogeneous modalities. The ranking loss consists of intra-modality and
cross-modality triplet losses, and it reduces both the intra-modality and
cross-modality feature variations. Furthermore, the correlations between RGB
and depth data are embedded in the c-ConvNet, and can be retrieved by either of
the modalities and contribute to the recognition in the case even only one of
the modalities is available. The proposed method was extensively evaluated on
two large RGB-D action recognition datasets, ChaLearn LAP IsoGD and NTU RGB+D
datasets, and one small dataset, SYSU 3D HOI, and achieved state-of-the-art
results
Exponential Discriminative Metric Embedding in Deep Learning
With the remarkable success achieved by the Convolutional Neural Networks
(CNNs) in object recognition recently, deep learning is being widely used in
the computer vision community. Deep Metric Learning (DML), integrating deep
learning with conventional metric learning, has set new records in many fields,
especially in classification task. In this paper, we propose a replicable DML
method, called Include and Exclude (IE) loss, to force the distance between a
sample and its designated class center away from the mean distance of this
sample to other class centers with a large margin in the exponential feature
projection space. With the supervision of IE loss, we can train CNNs to enhance
the intra-class compactness and inter-class separability, leading to great
improvements on several public datasets ranging from object recognition to face
verification. We conduct a comparative study of our algorithm with several
typical DML methods on three kinds of networks with different capacity.
Extensive experiments on three object recognition datasets and two face
recognition datasets demonstrate that IE loss is always superior to other
mainstream DML methods and approach the state-of-the-art results
- …
