26,550 research outputs found
Borrowing Treasures from the Wealthy: Deep Transfer Learning through Selective Joint Fine-tuning
Deep neural networks require a large amount of labeled training data during
supervised learning. However, collecting and labeling so much data might be
infeasible in many cases. In this paper, we introduce a source-target selective
joint fine-tuning scheme for improving the performance of deep learning tasks
with insufficient training data. In this scheme, a target learning task with
insufficient training data is carried out simultaneously with another source
learning task with abundant training data. However, the source learning task
does not use all existing training data. Our core idea is to identify and use a
subset of training images from the original source learning task whose
low-level characteristics are similar to those from the target learning task,
and jointly fine-tune shared convolutional layers for both tasks. Specifically,
we compute descriptors from linear or nonlinear filter bank responses on
training images from both tasks, and use such descriptors to search for a
desired subset of training samples for the source learning task.
Experiments demonstrate that our selective joint fine-tuning scheme achieves
state-of-the-art performance on multiple visual classification tasks with
insufficient training data for deep learning. Such tasks include Caltech 256,
MIT Indoor 67, Oxford Flowers 102 and Stanford Dogs 120. In comparison to
fine-tuning without a source domain, the proposed method can improve the
classification accuracy by 2% - 10% using a single model.Comment: To appear in 2017 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR 2017
Fine-grained Discriminative Localization via Saliency-guided Faster R-CNN
Discriminative localization is essential for fine-grained image
classification task, which devotes to recognizing hundreds of subcategories in
the same basic-level category. Reflecting on discriminative regions of objects,
key differences among different subcategories are subtle and local. Existing
methods generally adopt a two-stage learning framework: The first stage is to
localize the discriminative regions of objects, and the second is to encode the
discriminative features for training classifiers. However, these methods
generally have two limitations: (1) Separation of the two-stage learning is
time-consuming. (2) Dependence on object and parts annotations for
discriminative localization learning leads to heavily labor-consuming labeling.
It is highly challenging to address these two important limitations
simultaneously. Existing methods only focus on one of them. Therefore, this
paper proposes the discriminative localization approach via saliency-guided
Faster R-CNN to address the above two limitations at the same time, and our
main novelties and advantages are: (1) End-to-end network based on Faster R-CNN
is designed to simultaneously localize discriminative regions and encode
discriminative features, which accelerates classification speed. (2)
Saliency-guided localization learning is proposed to localize the
discriminative region automatically, avoiding labor-consuming labeling. Both
are jointly employed to simultaneously accelerate classification speed and
eliminate dependence on object and parts annotations. Comparing with the
state-of-the-art methods on the widely-used CUB-200-2011 dataset, our approach
achieves both the best classification accuracy and efficiency.Comment: 9 pages, to appear in ACM MM 201
The Application of Two-level Attention Models in Deep Convolutional Neural Network for Fine-grained Image Classification
Fine-grained classification is challenging because categories can only be
discriminated by subtle and local differences. Variances in the pose, scale or
rotation usually make the problem more difficult. Most fine-grained
classification systems follow the pipeline of finding foreground object or
object parts (where) to extract discriminative features (what).
In this paper, we propose to apply visual attention to fine-grained
classification task using deep neural network. Our pipeline integrates three
types of attention: the bottom-up attention that propose candidate patches, the
object-level top-down attention that selects relevant patches to a certain
object, and the part-level top-down attention that localizes discriminative
parts. We combine these attentions to train domain-specific deep nets, then use
it to improve both the what and where aspects. Importantly, we avoid using
expensive annotations like bounding box or part information from end-to-end.
The weak supervision constraint makes our work easier to generalize.
We have verified the effectiveness of the method on the subsets of ILSVRC2012
dataset and CUB200_2011 dataset. Our pipeline delivered significant
improvements and achieved the best accuracy under the weakest supervision
condition. The performance is competitive against other methods that rely on
additional annotations
- …