41,152 research outputs found
Progressive Ensemble Networks for Zero-Shot Recognition
Despite the advancement of supervised image recognition algorithms, their
dependence on the availability of labeled data and the rapid expansion of image
categories raise the significant challenge of zero-shot learning. Zero-shot
learning (ZSL) aims to transfer knowledge from labeled classes into unlabeled
classes to reduce human labeling effort. In this paper, we propose a novel
progressive ensemble network model with multiple projected label embeddings to
address zero-shot image recognition. The ensemble network is built by learning
multiple image classification functions with a shared feature extraction
network but different label embedding representations, which enhance the
diversity of the classifiers and facilitate information transfer to unlabeled
classes. A progressive training framework is then deployed to gradually label
the most confident images in each unlabeled class with predicted pseudo-labels
and update the ensemble network with the training data augmented by the
pseudo-labels. The proposed model performs training on both labeled and
unlabeled data. It can naturally bridge the domain shift problem in visual
appearances and be extended to the generalized zero-shot learning scenario. We
conduct experiments on multiple ZSL datasets and the empirical results
demonstrate the efficacy of the proposed model.Comment: CVPR1
Peer Collaborative Learning for Online Knowledge Distillation
Traditional knowledge distillation uses a two-stage training strategy to
transfer knowledge from a high-capacity teacher model to a compact student
model, which relies heavily on the pre-trained teacher. Recent online knowledge
distillation alleviates this limitation by collaborative learning, mutual
learning and online ensembling, following a one-stage end-to-end training
fashion. However, collaborative learning and mutual learning fail to construct
an online high-capacity teacher, whilst online ensembling ignores the
collaboration among branches and its logit summation impedes the further
optimisation of the ensemble teacher. In this work, we propose a novel Peer
Collaborative Learning method for online knowledge distillation, which
integrates online ensembling and network collaboration into a unified
framework. Specifically, given a target network, we construct a multi-branch
network for training, in which each branch is called a peer. We perform random
augmentation multiple times on the inputs to peers and assemble feature
representations outputted from peers with an additional classifier as the peer
ensemble teacher. This helps to transfer knowledge from a high-capacity teacher
to peers, and in turn further optimises the ensemble teacher. Meanwhile, we
employ the temporal mean model of each peer as the peer mean teacher to
collaboratively transfer knowledge among peers, which helps each peer to learn
richer knowledge and facilitates to optimise a more stable model with better
generalisation. Extensive experiments on CIFAR-10, CIFAR-100 and ImageNet show
that the proposed method significantly improves the generalisation of various
backbone networks and outperforms the state-of-the-art methods
DeepCluE: Enhanced Image Clustering via Multi-layer Ensembles in Deep Neural Networks
Deep clustering has recently emerged as a promising technique for complex
data clustering. Despite the considerable progress, previous deep clustering
works mostly build or learn the final clustering by only utilizing a single
layer of representation, e.g., by performing the K-means clustering on the last
fully-connected layer or by associating some clustering loss to a specific
layer, which neglect the possibilities of jointly leveraging multi-layer
representations for enhancing the deep clustering performance. In view of this,
this paper presents a Deep Clustering via Ensembles (DeepCluE) approach, which
bridges the gap between deep clustering and ensemble clustering by harnessing
the power of multiple layers in deep neural networks. In particular, we utilize
a weight-sharing convolutional neural network as the backbone, which is trained
with both the instance-level contrastive learning (via an instance projector)
and the cluster-level contrastive learning (via a cluster projector) in an
unsupervised manner. Thereafter, multiple layers of feature representations are
extracted from the trained network, upon which the ensemble clustering process
is further conducted. Specifically, a set of diversified base clusterings are
generated from the multi-layer representations via a highly efficient
clusterer. Then the reliability of clusters in multiple base clusterings is
automatically estimated by exploiting an entropy-based criterion, based on
which the set of base clusterings are re-formulated into a weighted-cluster
bipartite graph. By partitioning this bipartite graph via transfer cut, the
final consensus clustering can be obtained. Experimental results on six image
datasets confirm the advantages of DeepCluE over the state-of-the-art deep
clustering approaches.Comment: To appear in IEEE Transactions on Emerging Topics in Computational
Intelligenc
Unsupervised Learning of Visual Representations using Videos
Is strong supervision necessary for learning a good visual representation? Do
we really need millions of semantically-labeled images to train a Convolutional
Neural Network (CNN)? In this paper, we present a simple yet surprisingly
powerful approach for unsupervised learning of CNN. Specifically, we use
hundreds of thousands of unlabeled videos from the web to learn visual
representations. Our key idea is that visual tracking provides the supervision.
That is, two patches connected by a track should have similar visual
representation in deep feature space since they probably belong to the same
object or object part. We design a Siamese-triplet network with a ranking loss
function to train this CNN representation. Without using a single image from
ImageNet, just using 100K unlabeled videos and the VOC 2012 dataset, we train
an ensemble of unsupervised networks that achieves 52% mAP (no bounding box
regression). This performance comes tantalizingly close to its
ImageNet-supervised counterpart, an ensemble which achieves a mAP of 54.4%. We
also show that our unsupervised network can perform competitively in other
tasks such as surface-normal estimation
Cross-stitch Networks for Multi-task Learning
Multi-task learning in Convolutional Networks has displayed remarkable
success in the field of recognition. This success can be largely attributed to
learning shared representations from multiple supervisory tasks. However,
existing multi-task approaches rely on enumerating multiple network
architectures specific to the tasks at hand, that do not generalize. In this
paper, we propose a principled approach to learn shared representations in
ConvNets using multi-task learning. Specifically, we propose a new sharing
unit: "cross-stitch" unit. These units combine the activations from multiple
networks and can be trained end-to-end. A network with cross-stitch units can
learn an optimal combination of shared and task-specific representations. Our
proposed method generalizes across multiple tasks and shows dramatically
improved performance over baseline methods for categories with few training
examples.Comment: To appear in CVPR 2016 (Spotlight
An Ensemble-based Approach to Click-Through Rate Prediction for Promoted Listings at Etsy
Etsy is a global marketplace where people across the world connect to make,
buy and sell unique goods. Sellers at Etsy can promote their product listings
via advertising campaigns similar to traditional sponsored search ads.
Click-Through Rate (CTR) prediction is an integral part of online search
advertising systems where it is utilized as an input to auctions which
determine the final ranking of promoted listings to a particular user for each
query. In this paper, we provide a holistic view of Etsy's promoted listings'
CTR prediction system and propose an ensemble learning approach which is based
on historical or behavioral signals for older listings as well as content-based
features for new listings. We obtain representations from texts and images by
utilizing state-of-the-art deep learning techniques and employ multimodal
learning to combine these different signals. We compare the system to
non-trivial baselines on a large-scale real world dataset from Etsy,
demonstrating the effectiveness of the model and strong correlations between
offline experiments and online performance. The paper is also the first
technical overview to this kind of product in e-commerce context
Evaluation of Deep Convolutional Nets for Document Image Classification and Retrieval
This paper presents a new state-of-the-art for document image classification
and retrieval, using features learned by deep convolutional neural networks
(CNNs). In object and scene analysis, deep neural nets are capable of learning
a hierarchical chain of abstraction from pixel inputs to concise and
descriptive representations. The current work explores this capacity in the
realm of document analysis, and confirms that this representation strategy is
superior to a variety of popular hand-crafted alternatives. Experiments also
show that (i) features extracted from CNNs are robust to compression, (ii) CNNs
trained on non-document images transfer well to document analysis tasks, and
(iii) enforcing region-specific feature-learning is unnecessary given
sufficient training data. This work also makes available a new labelled subset
of the IIT-CDIP collection, containing 400,000 document images across 16
categories, useful for training new CNNs for document analysis
- …