Search CORE

139,673 research outputs found

Residual Attention Network for Image Classification

Author: Jiang Mengqing
Li Cheng
Qian Chen
Tang Xiaoou
Wang Fei
Wang Xiaogang
Yang Shuo
Zhang Honggang
Publication venue
Publication date: 23/04/2017
Field of study

In this work, we propose "Residual Attention Network", a convolutional neural network using attention mechanism which can incorporate with state-of-art feed forward network architecture in an end-to-end training fashion. Our Residual Attention Network is built by stacking Attention Modules which generate attention-aware features. The attention-aware features from different modules change adaptively as layers going deeper. Inside each Attention Module, bottom-up top-down feedforward structure is used to unfold the feedforward and feedback attention process into a single feedforward process. Importantly, we propose attention residual learning to train very deep Residual Attention Networks which can be easily scaled up to hundreds of layers. Extensive analyses are conducted on CIFAR-10 and CIFAR-100 datasets to verify the effectiveness of every module mentioned above. Our Residual Attention Network achieves state-of-the-art object recognition performance on three benchmark datasets including CIFAR-10 (3.90% error), CIFAR-100 (20.45% error) and ImageNet (4.8% single model and single crop, top-5 error). Note that, our method achieves 0.6% top-1 accuracy improvement with 46% trunk depth and 69% forward FLOPs comparing to ResNet-200. The experiment also demonstrates that our network is robust against noisy labels.Comment: accepted to CVPR201

arXiv.org e-Print Archive

Crossref

Multi-Label Image Classification via Knowledge Distillation from Weakly-Supervised Detection

Author: Bilen Hakan
Chen Shang-Fu
Chen Tianshui
Gong Yunchao
He Kaiming
Hu Hexiang
Lawrence Zitnick C.
Li Dong
Li Qiang
Lin Tsung-Yi
Liu Feng
Romero Adriana
Simonyan Karen
Tan Mingkui
Wang Jiang
Wang Zhouxia
Xavier Glorot
Xie Pengtao
Yang Hao
Yeh Chih-Kuan
Zagoruyko Sergey
Zhang Junjie
Zhu Feng
Zhu Yi
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 21/02/2019
Field of study

Multi-label image classification is a fundamental but challenging task towards general visual understanding. Existing methods found the region-level cues (e.g., features from RoIs) can facilitate multi-label classification. Nevertheless, such methods usually require laborious object-level annotations (i.e., object labels and bounding boxes) for effective learning of the object-level visual features. In this paper, we propose a novel and efficient deep framework to boost multi-label classification by distilling knowledge from weakly-supervised detection task without bounding box annotations. Specifically, given the image-level annotations, (1) we first develop a weakly-supervised detection (WSD) model, and then (2) construct an end-to-end multi-label image classification framework augmented by a knowledge distillation module that guides the classification model by the WSD model according to the class-level predictions for the whole image and the object-level visual features for object RoIs. The WSD model is the teacher model and the classification model is the student model. After this cross-task knowledge distillation, the performance of the classification model is significantly improved and the efficiency is maintained since the WSD model can be safely discarded in the test phase. Extensive experiments on two large-scale datasets (MS-COCO and NUS-WIDE) show that our framework achieves superior performances over the state-of-the-art methods on both performance and efficiency.Comment: accepted by ACM Multimedia 2018, 9 pages, 4 figures, 5 table

arXiv.org e-Print Archive

Crossref

Hierarchy-based Image Embeddings for Semantic Image Retrieval

Author: Barz Björn
Denzler Joachim
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 19/03/2019
Field of study

Deep neural networks trained for classification have been found to learn powerful image representations, which are also often used for other tasks such as comparing images w.r.t. their visual similarity. However, visual similarity does not imply semantic similarity. In order to learn semantically discriminative features, we propose to map images onto class embeddings whose pair-wise dot products correspond to a measure of semantic similarity between classes. Such an embedding does not only improve image retrieval results, but could also facilitate integrating semantics for other tasks, e.g., novelty detection or few-shot learning. We introduce a deterministic algorithm for computing the class centroids directly based on prior world-knowledge encoded in a hierarchy of classes such as WordNet. Experiments on CIFAR-100, NABirds, and ImageNet show that our learned semantic image embeddings improve the semantic consistency of image retrieval results by a large margin.Comment: Accepted at WACV 2019. Source code: https://github.com/cvjena/semantic-embedding

arXiv.org e-Print Archive

Crossref