Search CORE

13 research outputs found

Towards Faster Training of Global Covariance Pooling Networks by Iterative Matrix Square Root Normalization

Author: Gao Zilin
Li Peihua
Wang Qilong
Xie Jiangtao
Publication venue
Publication date: 01/04/2018
Field of study

Global covariance pooling in convolutional neural networks has achieved impressive improvement over the classical first-order pooling. Recent works have shown matrix square root normalization plays a central role in achieving state-of-the-art performance. However, existing methods depend heavily on eigendecomposition (EIG) or singular value decomposition (SVD), suffering from inefficient training due to limited support of EIG and SVD on GPU. Towards addressing this problem, we propose an iterative matrix square root normalization method for fast end-to-end training of global covariance pooling networks. At the core of our method is a meta-layer designed with loop-embedded directed graph structure. The meta-layer consists of three consecutive nonlinear structured layers, which perform pre-normalization, coupled matrix iteration and post-compensation, respectively. Our method is much faster than EIG or SVD based ones, since it involves only matrix multiplications, suitable for parallel implementation on GPU. Moreover, the proposed network with ResNet architecture can converge in much less epochs, further accelerating network training. On large-scale ImageNet, we achieve competitive performance superior to existing counterparts. By finetuning our models pre-trained on ImageNet, we establish state-of-the-art results on three challenging fine-grained benchmarks. The source code and network models will be available at http://www.peihuali.org/iSQRT-COVComment: Accepted to CVPR 201

arXiv.org e-Print Archive

Crossref

OpenSceneVLAD: Appearance Invariant, Open Set Scene Classification

Author: Ehsan Shoaib
Fisher Robert B
McDonald-Maier Klaus D.
Milford Michael
Smith William H. B.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2022
Field of study

Scene classification is a well-established area of computer vision research that aims to classify a scene image into pre-defined categories such as playground, beach and airport. Recent work has focused on increasing the variety of pre-defined categories for classification, but so far failed to consider two major challenges: changes in scene appearance due to lighting and open set classification (the ability to classify unknown scene data as not belonging to the trained classes). Our first contribution, SceneVLAD, fuses scene classification and visual place recognition CNNs for appearance invariant scene classification that outperforms state-of-the-art scene classification by a mean F1 score of up to 0.1. Our second contribution, OpenSceneVLAD, extends the first to an open set classification scenario using intra-class splitting to achieve a mean increase in F1 scores of up to 0.06 compared to using state-of-the-art openmax layer. We achieve these results on three scene class datasets extracted from large scale outdoor visual localisation datasets, one of which we collected ourselves.</p

Queensland University of Technology ePrints Archive

Edinburgh Research Explorer

Recommended from our members

Dictionary learning inspired deep network for scene recognition

Author: Chen Q
Chen W
Liu Y
Wassell I
Publication venue: 32nd AAAI Conference on Artificial Intelligence, AAAI 2018
Publication date: 01/01/2018
Field of study

Scene recognition remains one of the most challenging prob- lems in image understanding. With the help of fully con- nected layers (FCL) and rectified linear units (ReLu), deep networks can extract the moderately sparse and discrimi- native feature representation required for scene recognition. However, few methods consider exploiting a sparsity model for learning the feature representation in order to provide en- hanced discriminative capability. In this paper, we replace the conventional FCL and ReLu with a new dictionary learn- ing layer, that is composed of a finite number of recurrent units to simultaneously enhance the sparse representation and discriminative abilities of features via the determination of optimal dictionaries. In addition, with the help of the struc- ture of the dictionary, we propose a new label discrimina- tive regressor to boost the discrimination ability. We also pro- pose new constraints to prevent overfitting by incorporating the advantage of the Mahalanobis and Euclidean distances to balance the recognition accuracy and generalization per- formance. Our proposed approach is evaluated using various scene datasets and shows superior performance to many state- of-the-art approaches

Apollo (Cambridge)