Search CORE

3,232 research outputs found

Towards Faster Training of Global Covariance Pooling Networks by Iterative Matrix Square Root Normalization

Author: Gao Zilin
Li Peihua
Wang Qilong
Xie Jiangtao
Publication venue
Publication date: 01/04/2018
Field of study

Global covariance pooling in convolutional neural networks has achieved impressive improvement over the classical first-order pooling. Recent works have shown matrix square root normalization plays a central role in achieving state-of-the-art performance. However, existing methods depend heavily on eigendecomposition (EIG) or singular value decomposition (SVD), suffering from inefficient training due to limited support of EIG and SVD on GPU. Towards addressing this problem, we propose an iterative matrix square root normalization method for fast end-to-end training of global covariance pooling networks. At the core of our method is a meta-layer designed with loop-embedded directed graph structure. The meta-layer consists of three consecutive nonlinear structured layers, which perform pre-normalization, coupled matrix iteration and post-compensation, respectively. Our method is much faster than EIG or SVD based ones, since it involves only matrix multiplications, suitable for parallel implementation on GPU. Moreover, the proposed network with ResNet architecture can converge in much less epochs, further accelerating network training. On large-scale ImageNet, we achieve competitive performance superior to existing counterparts. By finetuning our models pre-trained on ImageNet, we establish state-of-the-art results on three challenging fine-grained benchmarks. The source code and network models will be available at http://www.peihuali.org/iSQRT-COVComment: Accepted to CVPR 201

arXiv.org e-Print Archive

Crossref

Improved scaling of Time-Evolving Block-Decimation algorithm through Reduced-Rank Randomized Singular Value Decomposition

Author: Plenio M. B.
Rosenbach R.
Tamascelli D.
Publication venue: 'American Physical Society (APS)'
Publication date: 04/04/2015
Field of study

When the amount of entanglement in a quantum system is limited, the relevant dynamics of the system is restricted to a very small part of the state space. When restricted to this subspace the description of the system becomes efficient in the system size. A class of algorithms, exemplified by the Time-Evolving Block-Decimation (TEBD) algorithm, make use of this observation by selecting the relevant subspace through a decimation technique relying on the Singular Value Decomposition (SVD). In these algorithms, the complexity of each time-evolution step is dominated by the SVD. Here we show that, by applying a randomized version of the SVD routine (RRSVD), the power law governing the computational complexity of TEBD is lowered by one degree, resulting in a considerable speed-up. We exemplify the potential gains in efficiency at the hand of some real world examples to which TEBD can be successfully applied to and demonstrate that for those system RRSVD delivers results as accurate as state-of-the-art deterministic SVD routines.Comment: 14 pages, 5 figure

arXiv.org e-Print Archive

AIR Universita degli studi di Milano

Unfolding and Shrinking Neural Machine Translation Ensembles

Author: Byrne Bill
Stahlberg Felix
Publication venue
Publication date: 01/01/2017
Field of study

Ensembling is a well-known technique in neural machine translation (NMT) to improve system performance. Instead of a single neural net, multiple neural nets with the same topology are trained separately, and the decoder generates predictions by averaging over the individual models. Ensembling often improves the quality of the generated translations drastically. However, it is not suitable for production systems because it is cumbersome and slow. This work aims to reduce the runtime to be on par with a single system without compromising the translation quality. First, we show that the ensemble can be unfolded into a single large neural network which imitates the output of the ensemble system. We show that unfolding can already improve the runtime in practice since more work can be done on the GPU. We proceed by describing a set of techniques to shrink the unfolded network by reducing the dimensionality of layers. On Japanese-English we report that the resulting network has the size and decoding speed of a single NMT network but performs on the level of a 3-ensemble system.Comment: Accepted at EMNLP 201

arXiv.org e-Print Archive

Crossref