6,989 research outputs found
The Application of Two-level Attention Models in Deep Convolutional Neural Network for Fine-grained Image Classification
Fine-grained classification is challenging because categories can only be
discriminated by subtle and local differences. Variances in the pose, scale or
rotation usually make the problem more difficult. Most fine-grained
classification systems follow the pipeline of finding foreground object or
object parts (where) to extract discriminative features (what).
In this paper, we propose to apply visual attention to fine-grained
classification task using deep neural network. Our pipeline integrates three
types of attention: the bottom-up attention that propose candidate patches, the
object-level top-down attention that selects relevant patches to a certain
object, and the part-level top-down attention that localizes discriminative
parts. We combine these attentions to train domain-specific deep nets, then use
it to improve both the what and where aspects. Importantly, we avoid using
expensive annotations like bounding box or part information from end-to-end.
The weak supervision constraint makes our work easier to generalize.
We have verified the effectiveness of the method on the subsets of ILSVRC2012
dataset and CUB200_2011 dataset. Our pipeline delivered significant
improvements and achieved the best accuracy under the weakest supervision
condition. The performance is competitive against other methods that rely on
additional annotations
PANDA: Pose Aligned Networks for Deep Attribute Modeling
We propose a method for inferring human attributes (such as gender, hair
style, clothes style, expression, action) from images of people under large
variation of viewpoint, pose, appearance, articulation and occlusion.
Convolutional Neural Nets (CNN) have been shown to perform very well on large
scale object recognition problems. In the context of attribute classification,
however, the signal is often subtle and it may cover only a small part of the
image, while the image is dominated by the effects of pose and viewpoint.
Discounting for pose variation would require training on very large labeled
datasets which are not presently available. Part-based models, such as poselets
and DPM have been shown to perform well for this problem but they are limited
by shallow low-level features. We propose a new method which combines
part-based models and deep learning by training pose-normalized CNNs. We show
substantial improvement vs. state-of-the-art methods on challenging attribute
classification tasks in unconstrained settings. Experiments confirm that our
method outperforms both the best part-based methods on this problem and
conventional CNNs trained on the full bounding box of the person.Comment: 8 page
- …