36,278 research outputs found
Multi-Task Curriculum Framework for Open-Set Semi-Supervised Learning
Semi-supervised learning (SSL) has been proposed to leverage unlabeled data
for training powerful models when only limited labeled data is available. While
existing SSL methods assume that samples in the labeled and unlabeled data
share the classes of their samples, we address a more complex novel scenario
named open-set SSL, where out-of-distribution (OOD) samples are contained in
unlabeled data. Instead of training an OOD detector and SSL separately, we
propose a multi-task curriculum learning framework. First, to detect the OOD
samples in unlabeled data, we estimate the probability of the sample belonging
to OOD. We use a joint optimization framework, which updates the network
parameters and the OOD score alternately. Simultaneously, to achieve high
performance on the classification of in-distribution (ID) data, we select ID
samples in unlabeled data having small OOD scores, and use these data with
labeled data for training the deep neural networks to classify ID samples in a
semi-supervised manner. We conduct several experiments, and our method achieves
state-of-the-art results by successfully eliminating the effect of OOD samples.Comment: ECCV 202
Recent Advances in Zero-shot Recognition
With the recent renaissance of deep convolution neural networks, encouraging
breakthroughs have been achieved on the supervised recognition tasks, where
each class has sufficient training data and fully annotated training data.
However, to scale the recognition to a large number of classes with few or now
training samples for each class remains an unsolved problem. One approach to
scaling up the recognition is to develop models capable of recognizing unseen
categories without any training instances, or zero-shot recognition/ learning.
This article provides a comprehensive review of existing zero-shot recognition
techniques covering various aspects ranging from representations of models, and
from datasets and evaluation settings. We also overview related recognition
tasks including one-shot and open set recognition which can be used as natural
extensions of zero-shot recognition when limited number of class samples become
available or when zero-shot recognition is implemented in a real-world setting.
Importantly, we highlight the limitations of existing approaches and point out
future research directions in this existing new research area.Comment: accepted by IEEE Signal Processing Magazin
Unsupervised Person Re-identification: Clustering and Fine-tuning
The superiority of deeply learned pedestrian representations has been
reported in very recent literature of person re-identification (re-ID). In this
paper, we consider the more pragmatic issue of learning a deep feature with no
or only a few labels. We propose a progressive unsupervised learning (PUL)
method to transfer pretrained deep representations to unseen domains. Our
method is easy to implement and can be viewed as an effective baseline for
unsupervised re-ID feature learning. Specifically, PUL iterates between 1)
pedestrian clustering and 2) fine-tuning of the convolutional neural network
(CNN) to improve the original model trained on the irrelevant labeled dataset.
Since the clustering results can be very noisy, we add a selection operation
between the clustering and fine-tuning. At the beginning when the model is
weak, CNN is fine-tuned on a small amount of reliable examples which locate
near to cluster centroids in the feature space. As the model becomes stronger
in subsequent iterations, more images are being adaptively selected as CNN
training samples. Progressively, pedestrian clustering and the CNN model are
improved simultaneously until algorithm convergence. This process is naturally
formulated as self-paced learning. We then point out promising directions that
may lead to further improvement. Extensive experiments on three large-scale
re-ID datasets demonstrate that PUL outputs discriminative features that
improve the re-ID accuracy.Comment: Add more results, parameter analysis and comparison
Webly Supervised Joint Embedding for Cross-Modal Image-Text Retrieval
Cross-modal retrieval between visual data and natural language description
remains a long-standing challenge in multimedia. While recent image-text
retrieval methods offer great promise by learning deep representations aligned
across modalities, most of these methods are plagued by the issue of training
with small-scale datasets covering a limited number of images with ground-truth
sentences. Moreover, it is extremely expensive to create a larger dataset by
annotating millions of images with sentences and may lead to a biased model.
Inspired by the recent success of webly supervised learning in deep neural
networks, we capitalize on readily-available web images with noisy annotations
to learn robust image-text joint representation. Specifically, our main idea is
to leverage web images and corresponding tags, along with fully annotated
datasets, in training for learning the visual-semantic joint embedding. We
propose a two-stage approach for the task that can augment a typical supervised
pair-wise ranking loss based formulation with weakly-annotated web images to
learn a more robust visual-semantic embedding. Experiments on two standard
benchmark datasets demonstrate that our method achieves a significant
performance gain in image-text retrieval compared to state-of-the-art
approaches.Comment: ACM Multimedia 201
Dynamic Curriculum Learning for Imbalanced Data Classification
Human attribute analysis is a challenging task in the field of computer
vision, since the data is largely imbalance-distributed. Common techniques such
as re-sampling and cost-sensitive learning require prior-knowledge to train the
system. To address this problem, we propose a unified framework called Dynamic
Curriculum Learning (DCL) to online adaptively adjust the sampling strategy and
loss learning in single batch, which resulting in better generalization and
discrimination. Inspired by the curriculum learning, DCL consists of two level
curriculum schedulers: (1) sampling scheduler not only manages the data
distribution from imbalanced to balanced but also from easy to hard; (2) loss
scheduler controls the learning importance between classification and metric
learning loss. Learning from these two schedulers, we demonstrate our DCL
framework with the new state-of-the-art performance on the widely used face
attribute dataset CelebA and pedestrian attribute dataset RAP
Domain Adaptation for Semantic Segmentation via Class-Balanced Self-Training
Recent deep networks achieved state of the art performance on a variety of
semantic segmentation tasks. Despite such progress, these models often face
challenges in real world `wild tasks' where large difference between labeled
training/source data and unseen test/target data exists. In particular, such
difference is often referred to as `domain gap', and could cause significantly
decreased performance which cannot be easily remedied by further increasing the
representation power. Unsupervised domain adaptation (UDA) seeks to overcome
such problem without target domain labels. In this paper, we propose a novel
UDA framework based on an iterative self-training procedure, where the problem
is formulated as latent variable loss minimization, and can be solved by
alternatively generating pseudo labels on target data and re-training the model
with these labels. On top of self-training, we also propose a novel
class-balanced self-training framework to avoid the gradual dominance of large
classes on pseudo-label generation, and introduce spatial priors to refine
generated labels. Comprehensive experiments show that the proposed methods
achieve state of the art semantic segmentation performance under multiple major
UDA settings.Comment: Accepted to ECCV 201
Do We Really Need to Access the Source Data? Source Hypothesis Transfer for Unsupervised Domain Adaptation
Unsupervised domain adaptation (UDA) aims to leverage the knowledge learned
from a labeled source dataset to solve similar tasks in a new unlabeled domain.
Prior UDA methods typically require to access the source data when learning to
adapt the model, making them risky and inefficient for decentralized private
data. This work tackles a practical setting where only a trained source model
is available and investigates how we can effectively utilize such a model
without source data to solve UDA problems. We propose a simple yet generic
representation learning framework, named \emph{Source HypOthesis Transfer}
(SHOT). SHOT freezes the classifier module (hypothesis) of the source model and
learns the target-specific feature extraction module by exploiting both
information maximization and self-supervised pseudo-labeling to implicitly
align representations from the target domains to the source hypothesis. To
verify its versatility, we evaluate SHOT in a variety of adaptation cases
including closed-set, partial-set, and open-set domain adaptation. Experiments
indicate that SHOT yields state-of-the-art results among multiple domain
adaptation benchmarks.Comment: ICML2020. Fix the typos for Digits. Code is available at
https://github.com/tim-learn/SHO
Adversarial Generation of Training Examples: Applications to Moving Vehicle License Plate Recognition
Generative Adversarial Networks (GAN) have attracted much research attention
recently, leading to impressive results for natural image generation. However,
to date little success was observed in using GAN generated images for improving
classification tasks. Here we attempt to explore, in the context of car license
plate recognition, whether it is possible to generate synthetic training data
using GAN to improve recognition accuracy. With a carefully-designed pipeline,
we show that the answer is affirmative. First, a large-scale image set is
generated using the generator of GAN, without manual annotation. Then, these
images are fed to a deep convolutional neural network (DCNN) followed by a
bidirectional recurrent neural network (BRNN) with long short-term memory
(LSTM), which performs the feature learning and sequence labelling. Finally,
the pre-trained model is fine-tuned on real images. Our experimental results on
a few data sets demonstrate the effectiveness of using GAN images: an
improvement of 7.5% over a strong baseline with moderate-sized real data being
available. We show that the proposed framework achieves competitive recognition
accuracy on challenging test datasets. We also leverage the depthwise separate
convolution to construct a lightweight convolutional RNN, which is about half
size and 2x faster on CPU. Combining this framework and the proposed pipeline,
we make progress in performing accurate recognition on mobile and embedded
devices
Weakly Supervised Adversarial Domain Adaptation for Semantic Segmentation in Urban Scenes
Semantic segmentation, a pixel-level vision task, is developed rapidly by
using convolutional neural networks (CNNs). Training CNNs requires a large
amount of labeled data, but manually annotating data is difficult. For
emancipating manpower, in recent years, some synthetic datasets are released.
However, they are still different from real scenes, which causes that training
a model on the synthetic data (source domain) cannot achieve a good performance
on real urban scenes (target domain). In this paper, we propose a weakly
supervised adversarial domain adaptation to improve the segmentation
performance from synthetic data to real scenes, which consists of three deep
neural networks. To be specific, a detection and segmentation ("DS" for short)
model focuses on detecting objects and predicting segmentation map; a
pixel-level domain classifier ("PDC" for short) tries to distinguish the image
features from which domains; an object-level domain classifier ("ODC" for
short) discriminates the objects from which domains and predicts the objects
classes. PDC and ODC are treated as the discriminators, and DS is considered as
the generator. By adversarial learning, DS is supposed to learn
domain-invariant features. In experiments, our proposed method yields the new
record of mIoU metric in the same problem.Comment: To appear at TI
Adaptive Semantic Segmentation with a Strategic Curriculum of Proxy Labels
Training deep networks for semantic segmentation requires annotation of large
amounts of data, which can be time-consuming and expensive. Unfortunately,
these trained networks still generalize poorly when tested in domains not
consistent with the training data. In this paper, we show that by carefully
presenting a mixture of labeled source domain and proxy-labeled target domain
data to a network, we can achieve state-of-the-art unsupervised domain
adaptation results. With our design, the network progressively learns features
specific to the target domain using annotation from only the source domain. We
generate proxy labels for the target domain using the network's own
predictions. Our architecture then allows selective mining of easy samples from
this set of proxy labels, and hard samples from the annotated source domain. We
conduct a series of experiments with the GTA5, Cityscapes and BDD100k datasets
on synthetic-to-real domain adaptation and geographic domain adaptation,
showing the advantages of our method over baselines and existing approaches
- …