8,144 research outputs found
Deep Comprehensive Correlation Mining for Image Clustering
Recent developed deep unsupervised methods allow us to jointly learn
representation and cluster unlabelled data. These deep clustering methods
mainly focus on the correlation among samples, e.g., selecting high precision
pairs to gradually tune the feature representation, which neglects other useful
correlations. In this paper, we propose a novel clustering framework, named
deep comprehensive correlation mining(DCCM), for exploring and taking full
advantage of various kinds of correlations behind the unlabeled data from three
aspects: 1) Instead of only using pair-wise information, pseudo-label
supervision is proposed to investigate category information and learn
discriminative features. 2) The features' robustness to image transformation of
input space is fully explored, which benefits the network learning and
significantly improves the performance. 3) The triplet mutual information among
features is presented for clustering problem to lift the recently discovered
instance-level deep mutual information to a triplet-level formation, which
further helps to learn more discriminative features. Extensive experiments on
several challenging datasets show that our method achieves good performance,
e.g., attaining clustering accuracy on CIFAR-10, which is
higher than the state-of-the-art results.Comment: Accepted to ICCV 201
Max-margin Deep Generative Models
Deep generative models (DGMs) are effective on learning multilayered
representations of complex data and performing inference of input data by
exploring the generative ability. However, little work has been done on
examining or empowering the discriminative ability of DGMs on making accurate
predictions. This paper presents max-margin deep generative models (mmDGMs),
which explore the strongly discriminative principle of max-margin learning to
improve the discriminative power of DGMs, while retaining the generative
capability. We develop an efficient doubly stochastic subgradient algorithm for
the piecewise linear objective. Empirical results on MNIST and SVHN datasets
demonstrate that (1) max-margin learning can significantly improve the
prediction performance of DGMs and meanwhile retain the generative ability; and
(2) mmDGMs are competitive to the state-of-the-art fully discriminative
networks by employing deep convolutional neural networks (CNNs) as both
recognition and generative models
Soft Proposal Networks for Weakly Supervised Object Localization
Weakly supervised object localization remains challenging, where only image
labels instead of bounding boxes are available during training. Object proposal
is an effective component in localization, but often computationally expensive
and incapable of joint optimization with some of the remaining modules. In this
paper, to the best of our knowledge, we for the first time integrate weakly
supervised object proposal into convolutional neural networks (CNNs) in an
end-to-end learning manner. We design a network component, Soft Proposal (SP),
to be plugged into any standard convolutional architecture to introduce the
nearly cost-free object proposal, orders of magnitude faster than
state-of-the-art methods. In the SP-augmented CNNs, referred to as Soft
Proposal Networks (SPNs), iteratively evolved object proposals are generated
based on the deep feature maps then projected back, and further jointly
optimized with network parameters, with image-level supervision only. Through
the unified learning process, SPNs learn better object-centric filters,
discover more discriminative visual evidence, and suppress background
interference, significantly boosting both weakly supervised object localization
and classification performance. We report the best results on popular
benchmarks, including PASCAL VOC, MS COCO, and ImageNet.Comment: ICCV 201
Semantic-Guided Multi-Attention Localization for Zero-Shot Learning
Zero-shot learning extends the conventional object classification to the
unseen class recognition by introducing semantic representations of classes.
Existing approaches predominantly focus on learning the proper mapping function
for visual-semantic embedding, while neglecting the effect of learning
discriminative visual features. In this paper, we study the significance of the
discriminative region localization. We propose a semantic-guided
multi-attention localization model, which automatically discovers the most
discriminative parts of objects for zero-shot learning without any human
annotations. Our model jointly learns cooperative global and local features
from the whole object as well as the detected parts to categorize objects based
on semantic descriptions. Moreover, with the joint supervision of embedding
softmax loss and class-center triplet loss, the model is encouraged to learn
features with high inter-class dispersion and intra-class compactness. Through
comprehensive experiments on three widely used zero-shot learning benchmarks,
we show the efficacy of the multi-attention localization and our proposed
approach improves the state-of-the-art results by a considerable margin.Comment: accepted to NeurIPS'1
Supervised COSMOS Autoencoder: Learning Beyond the Euclidean Loss!
Autoencoders are unsupervised deep learning models used for learning
representations. In literature, autoencoders have shown to perform well on a
variety of tasks spread across multiple domains, thereby establishing
widespread applicability. Typically, an autoencoder is trained to generate a
model that minimizes the reconstruction error between the input and the
reconstructed output, computed in terms of the Euclidean distance. While this
can be useful for applications related to unsupervised reconstruction, it may
not be optimal for classification. In this paper, we propose a novel Supervised
COSMOS Autoencoder which utilizes a multi-objective loss function to learn
representations that simultaneously encode the (i) "similarity" between the
input and reconstructed vectors in terms of their direction, (ii)
"distribution" of pixel values of the reconstruction with respect to the input
sample, while also incorporating (iii) "discriminability" in the feature
learning pipeline. The proposed autoencoder model incorporates a Cosine
similarity and Mahalanobis distance based loss function, along with supervision
via Mutual Information based loss. Detailed analysis of each component of the
proposed model motivates its applicability for feature learning in different
classification tasks. The efficacy of Supervised COSMOS autoencoder is
demonstrated via extensive experimental evaluations on different image
datasets. The proposed model outperforms existing algorithms on MNIST,
CIFAR-10, and SVHN databases. It also yields state-of-the-art results on
CelebA, LFWA, Adience, and IJB-A databases for attribute prediction and face
recognition, respectively
Contrastive-center loss for deep neural networks
The deep convolutional neural network(CNN) has significantly raised the
performance of image classification and face recognition. Softmax is usually
used as supervision, but it only penalizes the classification loss. In this
paper, we propose a novel auxiliary supervision signal called contrastivecenter
loss, which can further enhance the discriminative power of the features, for
it learns a class center for each class. The proposed contrastive-center loss
simultaneously considers intra-class compactness and inter-class separability,
by penalizing the contrastive values between: (1)the distances of training
samples to their corresponding class centers, and (2)the sum of the distances
of training samples to their non-corresponding class centers. Experiments on
different datasets demonstrate the effectiveness of contrastive-center loss
Zero-Shot Fine-Grained Classification by Deep Feature Learning with Semantics
Fine-grained image classification, which aims to distinguish images with
subtle distinctions, is a challenging task due to two main issues: lack of
sufficient training data for every class and difficulty in learning
discriminative features for representation. In this paper, to address the two
issues, we propose a two-phase framework for recognizing images from unseen
fine-grained classes, i.e. zero-shot fine-grained classification. In the first
feature learning phase, we finetune deep convolutional neural networks using
hierarchical semantic structure among fine-grained classes to extract
discriminative deep visual features. Meanwhile, a domain adaptation structure
is induced into deep convolutional neural networks to avoid domain shift from
training data to test data. In the second label inference phase, a semantic
directed graph is constructed over attributes of fine-grained classes. Based on
this graph, we develop a label propagation algorithm to infer the labels of
images in the unseen classes. Experimental results on two benchmark datasets
demonstrate that our model outperforms the state-of-the-art zero-shot learning
models. In addition, the features obtained by our feature learning model also
yield significant gains when they are used by other zero-shot learning models,
which shows the flexility of our model in zero-shot fine-grained
classification.Comment: This paper has been submitted to IEEE TIP for peer-revie
Spectral-Spatial Feature Extraction and Classification by ANN Supervised with Center Loss in Hyperspectral Imagery
In this paper, we propose a spectral-spatial feature extraction and
classification framework based on artificial neuron network (ANN) in the
context of hyperspectral imagery. With limited labeled samples, only spectral
information is exploited for training and spatial context is integrated
posteriorly at the testing stage. Taking advantage of recent advances in face
recognition, a joint supervision symbol that combines softmax loss and center
loss is adopted to train the proposed network, by which intra-class features
are gathered while inter-class variations are enlarged. Based on the learned
architecture, the extracted spectrum-based features are classified by a center
classifier. Moreover, to fuse the spectral and spatial information, an adaptive
spectral-spatial center classifier is developed, where multiscale neighborhoods
are considered simultaneously, and the final label is determined using an
adaptive voting strategy. Finally, experimental results on three well-known
datasets validate the effectiveness of the proposed methods compared with the
state-of-the-art approaches.Comment: 17 pages, 10 figure
Fine-grained pose prediction, normalization, and recognition
Pose variation and subtle differences in appearance are key challenges to
fine-grained classification. While deep networks have markedly improved general
recognition, many approaches to fine-grained recognition rely on anchoring
networks to parts for better accuracy. Identifying parts to find correspondence
discounts pose variation so that features can be tuned to appearance. To this
end previous methods have examined how to find parts and extract
pose-normalized features. These methods have generally separated fine-grained
recognition into stages which first localize parts using hand-engineered and
coarsely-localized proposal features, and then separately learn deep
descriptors centered on inferred part positions. We unify these steps in an
end-to-end trainable network supervised by keypoint locations and class labels
that localizes parts by a fully convolutional network to focus the learning of
feature representations for the fine-grained classification task. Experiments
on the popular CUB200 dataset show that our method is state-of-the-art and
suggest a continuing role for strong supervision
Deep Discriminative Representation Learning with Attention Map for Scene Classification
Learning powerful discriminative features for remote sensing image scene
classification is a challenging computer vision problem. In the past, most
classification approaches were based on handcrafted features. However, most
recent approaches to remote sensing scene classification are based on
Convolutional Neural Networks (CNNs). The de facto practice when learning these
CNN models is only to use original RGB patches as input with training performed
on large amounts of labeled data (ImageNet). In this paper, we show class
activation map (CAM) encoded CNN models, codenamed DDRL-AM, trained using
original RGB patches and attention map based class information provide
complementary information to the standard RGB deep models. To the best of our
knowledge, we are the first to investigate attention information encoded CNNs.
Additionally, to enhance the discriminability, we further employ a recently
developed object function called "center loss," which has proved to be very
useful in face recognition. Finally, our framework provides attention guidance
to the model in an end-to-end fashion. Extensive experiments on two benchmark
datasets show that our approach matches or exceeds the performance of other
methods
- …