15,693 research outputs found
Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination
Neural net classifiers trained on data with annotated class labels can also
capture apparent visual similarity among categories without being directed to
do so. We study whether this observation can be extended beyond the
conventional domain of supervised learning: Can we learn a good feature
representation that captures apparent similarity among instances, instead of
classes, by merely asking the feature to be discriminative of individual
instances? We formulate this intuition as a non-parametric classification
problem at the instance-level, and use noise-contrastive estimation to tackle
the computational challenges imposed by the large number of instance classes.
Our experimental results demonstrate that, under unsupervised learning
settings, our method surpasses the state-of-the-art on ImageNet classification
by a large margin. Our method is also remarkable for consistently improving
test performance with more training data and better network architectures. By
fine-tuning the learned feature, we further obtain competitive results for
semi-supervised learning and object detection tasks. Our non-parametric model
is highly compact: With 128 features per image, our method requires only 600MB
storage for a million images, enabling fast nearest neighbour retrieval at the
run time.Comment: CVPR 2018 spotlight paper. Code:
https://github.com/zhirongw/lemniscate.pytorc
Unsupervised Semantic-based Aggregation of Deep Convolutional Features
In this paper, we propose a simple but effective semantic-based aggregation
(SBA) method. The proposed SBA utilizes the discriminative filters of deep
convolutional layers as semantic detectors. Moreover, we propose the effective
unsupervised strategy to select some semantic detectors to generate the
"probabilistic proposals", which highlight certain discriminative pattern of
objects and suppress the noise of background. The final global SBA
representation could then be acquired by aggregating the regional
representations weighted by the selected "probabilistic proposals"
corresponding to various semantic content. Our unsupervised SBA is easy to
generalize and achieves excellent performance on various tasks. We conduct
comprehensive experiments and show that our unsupervised SBA outperforms the
state-of-the-art unsupervised and supervised aggregation methods on image
retrieval, place recognition and cloud classification.Comment: 10 pages. arXiv admin note: text overlap with arXiv:1705.0124
Prototypical Contrastive Learning of Unsupervised Representations
This paper presents Prototypical Contrastive Learning (PCL), an unsupervised
representation learning method that addresses the fundamental limitations of
instance-wise contrastive learning. PCL not only learns low-level features for
the task of instance discrimination, but more importantly, it implicitly
encodes semantic structures of the data into the learned embedding space.
Specifically, we introduce prototypes as latent variables to help find the
maximum-likelihood estimation of the network parameters in an
Expectation-Maximization framework. We iteratively perform E-step as finding
the distribution of prototypes via clustering and M-step as optimizing the
network via contrastive learning. We propose ProtoNCE loss, a generalized
version of the InfoNCE loss for contrastive learning, which encourages
representations to be closer to their assigned prototypes. PCL outperforms
state-of-the-art instance-wise contrastive learning methods on multiple
benchmarks with substantial improvement in low-resource transfer learning. Code
and pretrained models are available at https://github.com/salesforce/PCL
Transfer Adaptation Learning: A Decade Survey
The world we see is ever-changing and it always changes with people, things,
and the environment. Domain is referred to as the state of the world at a
certain moment. A research problem is characterized as transfer adaptation
learning (TAL) when it needs knowledge correspondence between different
moments/domains. Conventional machine learning aims to find a model with the
minimum expected risk on test data by minimizing the regularized empirical risk
on the training data, which, however, supposes that the training and test data
share similar joint probability distribution. TAL aims to build models that can
perform tasks of target domain by learning knowledge from a semantic related
but distribution different source domain. It is an energetic research filed of
increasing influence and importance, which is presenting a blowout publication
trend. This paper surveys the advances of TAL methodologies in the past decade,
and the technical challenges and essential problems of TAL have been observed
and discussed with deep insights and new perspectives. Broader solutions of
transfer adaptation learning being created by researchers are identified, i.e.,
instance re-weighting adaptation, feature adaptation, classifier adaptation,
deep network adaptation and adversarial adaptation, which are beyond the early
semi-supervised and unsupervised split. The survey helps researchers rapidly
but comprehensively understand and identify the research foundation, research
status, theoretical limitations, future challenges and under-studied issues
(universality, interpretability, and credibility) to be broken in the field
toward universal representation and safe applications in open-world scenarios.Comment: 26 pages, 4 figure
Local Aggregation for Unsupervised Learning of Visual Embeddings
Unsupervised approaches to learning in neural networks are of substantial
interest for furthering artificial intelligence, both because they would enable
the training of networks without the need for large numbers of expensive
annotations, and because they would be better models of the kind of
general-purpose learning deployed by humans. However, unsupervised networks
have long lagged behind the performance of their supervised counterparts,
especially in the domain of large-scale visual recognition. Recent developments
in training deep convolutional embeddings to maximize non-parametric instance
separation and clustering objectives have shown promise in closing this gap.
Here, we describe a method that trains an embedding function to maximize a
metric of local aggregation, causing similar data instances to move together in
the embedding space, while allowing dissimilar instances to separate. This
aggregation metric is dynamic, allowing soft clusters of different scales to
emerge. We evaluate our procedure on several large-scale visual recognition
datasets, achieving state-of-the-art unsupervised transfer learning performance
on object recognition in ImageNet, scene recognition in Places 205, and object
detection in PASCAL VOC
Improving Generalization via Scalable Neighborhood Component Analysis
Current major approaches to visual recognition follow an end-to-end
formulation that classifies an input image into one of the pre-determined set
of semantic categories. Parametric softmax classifiers are a common choice for
such a closed world with fixed categories, especially when big labeled data is
available during training. However, this becomes problematic for open-set
scenarios where new categories are encountered with very few examples for
learning a generalizable parametric classifier. We adopt a non-parametric
approach for visual recognition by optimizing feature embeddings instead of
parametric classifiers. We use a deep neural network to learn the visual
feature that preserves the neighborhood structure in the semantic space, based
on the Neighborhood Component Analysis (NCA) criterion. Limited by its
computational bottlenecks, we devise a mechanism to use augmented memory to
scale NCA for large datasets and very deep networks. Our experiments deliver
not only remarkable performance on ImageNet classification for such a simple
non-parametric method, but most importantly a more generalizable feature
representation for sub-category discovery and few-shot recognition.Comment: To appear in ECCV 201
Learning Spatiotemporal Features via Video and Text Pair Discrimination
Current video representations heavily rely on learning from manually
annotated video datasets which are time-consuming and expensive to acquire. We
observe videos are naturally accompanied by abundant text information such as
YouTube titles and Instagram captions. In this paper, we leverage this
visual-textual connection to learn spatiotemporal features in an efficient
weakly-supervised manner. We present a general cross-modal pair discrimination
(CPD) framework to capture this correlation between a video and its associated
text. Specifically, we adopt noise-contrastive estimation to tackle the
computational issue imposed by the huge amount of pair instance classes and
design a practical curriculum learning strategy. We train our CPD models on
both standard video dataset (Kinetics-210k) and uncurated web video dataset
(Instagram-300k) to demonstrate its effectiveness. Without further fine-tuning,
the learnt models obtain competitive results for action classification on
Kinetics under the linear classification protocol. Moreover, our visual model
provides an effective initialization to fine-tune on downstream tasks, which
yields a remarkable performance gain for action recognition on UCF101 and
HMDB51, compared with the existing state-of-the-art self-supervised training
methods. In addition, our CPD model yields a new state of the art for zero-shot
action recognition on UCF101 by directly utilizing the learnt visual-textual
embeddings. The code will be made available at
https://github.com/MCG-NJU/CPD-Video.Comment: Technical Repor
Unsupervised Visual Domain Adaptation: A Deep Max-Margin Gaussian Process Approach
In unsupervised domain adaptation, it is widely known that the target domain
error can be provably reduced by having a shared input representation that
makes the source and target domains indistinguishable from each other. Very
recently it has been studied that not just matching the marginal input
distributions, but the alignment of output (class) distributions is also
critical. The latter can be achieved by minimizing the maximum discrepancy of
predictors (classifiers). In this paper, we adopt this principle, but propose a
more systematic and effective way to achieve hypothesis consistency via
Gaussian processes (GP). The GP allows us to define/induce a hypothesis space
of the classifiers from the posterior distribution of the latent random
functions, turning the learning into a simple large-margin posterior separation
problem, far easier to solve than previous approaches based on adversarial
minimax optimization. We formulate a learning objective that effectively pushes
the posterior to minimize the maximum discrepancy. This is further shown to be
equivalent to maximizing margins and minimizing uncertainty of the class
predictions in the target domain, a well-established principle in classical
(semi-)supervised learning. Empirical results demonstrate that our approach is
comparable or superior to the existing methods on several benchmark domain
adaptation datasets
Deep Learning with Nonparametric Clustering
Clustering is an essential problem in machine learning and data mining. One
vital factor that impacts clustering performance is how to learn or design the
data representation (or features). Fortunately, recent advances in deep
learning can learn unsupervised features effectively, and have yielded state of
the art performance in many classification problems, such as character
recognition, object recognition and document categorization. However, little
attention has been paid to the potential of deep learning for unsupervised
clustering problems. In this paper, we propose a deep belief network with
nonparametric clustering. As an unsupervised method, our model first leverages
the advantages of deep learning for feature representation and dimension
reduction. Then, it performs nonparametric clustering under a maximum margin
framework -- a discriminative clustering model and can be trained online
efficiently in the code space. Lastly model parameters are refined in the deep
belief network. Thus, this model can learn features for clustering and infer
model complexity in an unified framework. The experimental results show the
advantage of our approach over competitive baselines.Comment: 14 pages, 6 figure
Local Label Propagation for Large-Scale Semi-Supervised Learning
A significant issue in training deep neural networks to solve supervised
learning tasks is the need for large numbers of labelled datapoints. The goal
of semi-supervised learning is to leverage ubiquitous unlabelled data, together
with small quantities of labelled data, to achieve high task performance.
Though substantial recent progress has been made in developing semi-supervised
algorithms that are effective for comparatively small datasets, many of these
techniques do not scale readily to the large (unlaballed) datasets
characteristic of real-world applications. In this paper we introduce a novel
approach to scalable semi-supervised learning, called Local Label Propagation
(LLP). Extending ideas from recent work on unsupervised embedding learning, LLP
first embeds datapoints, labelled and otherwise, in a common latent space using
a deep neural network. It then propagates pseudolabels from known to unknown
datapoints in a manner that depends on the local geometry of the embedding,
taking into account both inter-point distance and local data density as a
weighting on propagation likelihood. The parameters of the deep embedding are
then trained to simultaneously maximize pseudolabel categorization performance
as well as a metric of the clustering of datapoints within each psuedo-label
group, iteratively alternating stages of network training and label
propagation. We illustrate the utility of the LLP method on the ImageNet
dataset, achieving results that outperform previous state-of-the-art scalable
semi-supervised learning algorithms by large margins, consistently across a
wide variety of training regimes. We also show that the feature representation
learned with LLP transfers well to scene recognition in the Places 205 dataset
- …