5,465 research outputs found
Low-Shot Learning from Imaginary 3D Model
Since the advent of deep learning, neural networks have demonstrated
remarkable results in many visual recognition tasks, constantly pushing the
limits. However, the state-of-the-art approaches are largely unsuitable in
scarce data regimes. To address this shortcoming, this paper proposes employing
a 3D model, which is derived from training images. Such a model can then be
used to hallucinate novel viewpoints and poses for the scarce samples of the
few-shot learning scenario. A self-paced learning approach allows for the
selection of a diverse set of high-quality images, which facilitates the
training of a classifier. The performance of the proposed approach is showcased
on the fine-grained CUB-200-2011 dataset in a few-shot setting and
significantly improves our baseline accuracy.Comment: To appear at WACV 2019. arXiv admin note: text overlap with
arXiv:1811.0919
Self Paced Adversarial Training for Multimodal Few-shot Learning
State-of-the-art deep learning algorithms yield remarkable results in many
visual recognition tasks. However, they still fail to provide satisfactory
results in scarce data regimes. To a certain extent this lack of data can be
compensated by multimodal information. Missing information in one modality of a
single data point (e.g. an image) can be made up for in another modality (e.g.
a textual description). Therefore, we design a few-shot learning task that is
multimodal during training (i.e. image and text) and single-modal during test
time (i.e. image). In this regard, we propose a self-paced class-discriminative
generative adversarial network incorporating multimodality in the context of
few-shot learning. The proposed approach builds upon the idea of cross-modal
data generation in order to alleviate the data sparsity problem. We improve
few-shot learning accuracies on the finegrained CUB and Oxford-102 datasets.Comment: To appear at WACV 201
STC: A Simple to Complex Framework for Weakly-supervised Semantic Segmentation
Recently, significant improvement has been made on semantic object
segmentation due to the development of deep convolutional neural networks
(DCNNs). Training such a DCNN usually relies on a large number of images with
pixel-level segmentation masks, and annotating these images is very costly in
terms of both finance and human effort. In this paper, we propose a simple to
complex (STC) framework in which only image-level annotations are utilized to
learn DCNNs for semantic segmentation. Specifically, we first train an initial
segmentation network called Initial-DCNN with the saliency maps of simple
images (i.e., those with a single category of major object(s) and clean
background). These saliency maps can be automatically obtained by existing
bottom-up salient object detection techniques, where no supervision information
is needed. Then, a better network called Enhanced-DCNN is learned with
supervision from the predicted segmentation masks of simple images based on the
Initial-DCNN as well as the image-level annotations. Finally, more pixel-level
segmentation masks of complex images (two or more categories of objects with
cluttered background), which are inferred by using Enhanced-DCNN and
image-level annotations, are utilized as the supervision information to learn
the Powerful-DCNN for semantic segmentation. Our method utilizes K simple
images from Flickr.com and 10K complex images from PASCAL VOC for step-wisely
boosting the segmentation network. Extensive experimental results on PASCAL VOC
2012 segmentation benchmark well demonstrate the superiority of the proposed
STC framework compared with other state-of-the-arts.Comment: To Appear in IEEE Transactions on Pattern Analysis and Machine
Intelligenc
Unsupervised Person Re-identification: Clustering and Fine-tuning
The superiority of deeply learned pedestrian representations has been
reported in very recent literature of person re-identification (re-ID). In this
paper, we consider the more pragmatic issue of learning a deep feature with no
or only a few labels. We propose a progressive unsupervised learning (PUL)
method to transfer pretrained deep representations to unseen domains. Our
method is easy to implement and can be viewed as an effective baseline for
unsupervised re-ID feature learning. Specifically, PUL iterates between 1)
pedestrian clustering and 2) fine-tuning of the convolutional neural network
(CNN) to improve the original model trained on the irrelevant labeled dataset.
Since the clustering results can be very noisy, we add a selection operation
between the clustering and fine-tuning. At the beginning when the model is
weak, CNN is fine-tuned on a small amount of reliable examples which locate
near to cluster centroids in the feature space. As the model becomes stronger
in subsequent iterations, more images are being adaptively selected as CNN
training samples. Progressively, pedestrian clustering and the CNN model are
improved simultaneously until algorithm convergence. This process is naturally
formulated as self-paced learning. We then point out promising directions that
may lead to further improvement. Extensive experiments on three large-scale
re-ID datasets demonstrate that PUL outputs discriminative features that
improve the re-ID accuracy.Comment: Add more results, parameter analysis and comparison
Improved Hard Example Mining by Discovering Attribute-based Hard Person Identity
In this paper, we propose Hard Person Identity Mining (HPIM) that attempts to
refine the hard example mining to improve the exploration efficacy in person
re-identification. It is motivated by following observation: the more
attributes some people share, the more difficult to separate their identities.
Based on this observation, we develop HPIM via a transferred attribute
describer, a deep multi-attribute classifier trained from the source noisy
person attribute datasets. We encode each image into the attribute
probabilistic description in the target person re-ID dataset. Afterwards in the
attribute code space, we consider each person as a distribution to generate his
view-specific attribute codes in different practical scenarios. Hence we
estimate the person-specific statistical moments from zeroth to higher order,
which are further used to calculate the central moment discrepancies between
persons. Such discrepancy is a ground to choose hard identity to organize
proper mini-batches, without concerning the person representation changing in
metric learning. It presents as a complementary tool of hard example mining,
which helps to explore the global instead of the local hard example constraint
in the mini-batch built by randomly sampled identities. Extensive experiments
on two person re-identification benchmarks validated the effectiveness of our
proposed algorithm
Adaptive Semantic Segmentation with a Strategic Curriculum of Proxy Labels
Training deep networks for semantic segmentation requires annotation of large
amounts of data, which can be time-consuming and expensive. Unfortunately,
these trained networks still generalize poorly when tested in domains not
consistent with the training data. In this paper, we show that by carefully
presenting a mixture of labeled source domain and proxy-labeled target domain
data to a network, we can achieve state-of-the-art unsupervised domain
adaptation results. With our design, the network progressively learns features
specific to the target domain using annotation from only the source domain. We
generate proxy labels for the target domain using the network's own
predictions. Our architecture then allows selective mining of easy samples from
this set of proxy labels, and hard samples from the annotated source domain. We
conduct a series of experiments with the GTA5, Cityscapes and BDD100k datasets
on synthetic-to-real domain adaptation and geographic domain adaptation,
showing the advantages of our method over baselines and existing approaches
Domain Adaptation for Semantic Segmentation via Class-Balanced Self-Training
Recent deep networks achieved state of the art performance on a variety of
semantic segmentation tasks. Despite such progress, these models often face
challenges in real world `wild tasks' where large difference between labeled
training/source data and unseen test/target data exists. In particular, such
difference is often referred to as `domain gap', and could cause significantly
decreased performance which cannot be easily remedied by further increasing the
representation power. Unsupervised domain adaptation (UDA) seeks to overcome
such problem without target domain labels. In this paper, we propose a novel
UDA framework based on an iterative self-training procedure, where the problem
is formulated as latent variable loss minimization, and can be solved by
alternatively generating pseudo labels on target data and re-training the model
with these labels. On top of self-training, we also propose a novel
class-balanced self-training framework to avoid the gradual dominance of large
classes on pseudo-label generation, and introduce spatial priors to refine
generated labels. Comprehensive experiments show that the proposed methods
achieve state of the art semantic segmentation performance under multiple major
UDA settings.Comment: Accepted to ECCV 201
Multimodal Co-Training for Selecting Good Examples from Webly Labeled Video
We tackle the problem of learning concept classifiers from videos on the web
without using manually labeled data. Although metadata attached to videos
(e.g., video titles, descriptions) can be of help collecting training data for
the target concept, the collected data is often very noisy. The main challenge
is therefore how to select good examples from noisy training data. Previous
approaches firstly learn easy examples that are unlikely to be noise and then
gradually learn more complex examples. However, hard examples that are much
different from easy ones are never learned. In this paper, we propose an
approach called multimodal co-training (MMCo) for selecting good examples from
noisy training data. MMCo jointly learns classifiers for multiple modalities
that complement each other to select good examples. Since MMCo selects examples
by consensus of multimodal classifiers, a hard example for one modality can
still be used as a training example by exploiting the power of the other
modalities. The algorithm is very simple and easily implemented but yields
consistent and significant boosts in example selection and classification
performance on the FCVID and YouTube8M benchmarks
Self-Paced Multi-Task Clustering
Multi-task clustering (MTC) has attracted a lot of research attentions in
machine learning due to its ability in utilizing the relationship among
different tasks. Despite the success of traditional MTC models, they are either
easy to stuck into local optima, or sensitive to outliers and noisy data. To
alleviate these problems, we propose a novel self-paced multi-task clustering
(SPMTC) paradigm. In detail, SPMTC progressively selects data examples to train
a series of MTC models with increasing complexity, thus highly decreases the
risk of trapping into poor local optima. Furthermore, to reduce the negative
influence of outliers and noisy data, we design a soft version of SPMTC to
further improve the clustering performance. The corresponding SPMTC framework
can be easily solved by an alternating optimization method. The proposed model
is guaranteed to converge and experiments on real data sets have demonstrated
its promising results compared with state-of-the-art multi-task clustering
methods
Self-paced and self-consistent co-training for semi-supervised image segmentation
Deep co-training has recently been proposed as an effective approach for
image segmentation when annotated data is scarce. In this paper, we improve
existing approaches for semi-supervised segmentation with a self-paced and
self-consistent co-training method. To help distillate information from
unlabeled images, we first design a self-paced learning strategy for
co-training that lets jointly-trained neural networks focus on
easier-to-segment regions first, and then gradually consider harder ones.This
is achieved via an end-to-end differentiable loss inthe form of a generalized
Jensen Shannon Divergence(JSD). Moreover, to encourage predictions from
different networks to be both consistent and confident, we enhance this
generalized JSD loss with an uncertainty regularizer based on entropy. The
robustness of individual models is further improved using a self-ensembling
loss that enforces their prediction to be consistent across different training
iterations. We demonstrate the potential of our method on three challenging
image segmentation problems with different image modalities, using small
fraction of labeled data. Results show clear advantages in terms of performance
compared to the standard co-training baselines and recently proposed
state-of-the-art approaches for semi-supervised segmentatio
- …