3,232 research outputs found
Towards Faster Training of Global Covariance Pooling Networks by Iterative Matrix Square Root Normalization
Global covariance pooling in convolutional neural networks has achieved
impressive improvement over the classical first-order pooling. Recent works
have shown matrix square root normalization plays a central role in achieving
state-of-the-art performance. However, existing methods depend heavily on
eigendecomposition (EIG) or singular value decomposition (SVD), suffering from
inefficient training due to limited support of EIG and SVD on GPU. Towards
addressing this problem, we propose an iterative matrix square root
normalization method for fast end-to-end training of global covariance pooling
networks. At the core of our method is a meta-layer designed with loop-embedded
directed graph structure. The meta-layer consists of three consecutive
nonlinear structured layers, which perform pre-normalization, coupled matrix
iteration and post-compensation, respectively. Our method is much faster than
EIG or SVD based ones, since it involves only matrix multiplications, suitable
for parallel implementation on GPU. Moreover, the proposed network with ResNet
architecture can converge in much less epochs, further accelerating network
training. On large-scale ImageNet, we achieve competitive performance superior
to existing counterparts. By finetuning our models pre-trained on ImageNet, we
establish state-of-the-art results on three challenging fine-grained
benchmarks. The source code and network models will be available at
http://www.peihuali.org/iSQRT-COVComment: Accepted to CVPR 201
Improved scaling of Time-Evolving Block-Decimation algorithm through Reduced-Rank Randomized Singular Value Decomposition
When the amount of entanglement in a quantum system is limited, the relevant
dynamics of the system is restricted to a very small part of the state space.
When restricted to this subspace the description of the system becomes
efficient in the system size. A class of algorithms, exemplified by the
Time-Evolving Block-Decimation (TEBD) algorithm, make use of this observation
by selecting the relevant subspace through a decimation technique relying on
the Singular Value Decomposition (SVD). In these algorithms, the complexity of
each time-evolution step is dominated by the SVD. Here we show that, by
applying a randomized version of the SVD routine (RRSVD), the power law
governing the computational complexity of TEBD is lowered by one degree,
resulting in a considerable speed-up. We exemplify the potential gains in
efficiency at the hand of some real world examples to which TEBD can be
successfully applied to and demonstrate that for those system RRSVD delivers
results as accurate as state-of-the-art deterministic SVD routines.Comment: 14 pages, 5 figure
Unfolding and Shrinking Neural Machine Translation Ensembles
Ensembling is a well-known technique in neural machine translation (NMT) to
improve system performance. Instead of a single neural net, multiple neural
nets with the same topology are trained separately, and the decoder generates
predictions by averaging over the individual models. Ensembling often improves
the quality of the generated translations drastically. However, it is not
suitable for production systems because it is cumbersome and slow. This work
aims to reduce the runtime to be on par with a single system without
compromising the translation quality. First, we show that the ensemble can be
unfolded into a single large neural network which imitates the output of the
ensemble system. We show that unfolding can already improve the runtime in
practice since more work can be done on the GPU. We proceed by describing a set
of techniques to shrink the unfolded network by reducing the dimensionality of
layers. On Japanese-English we report that the resulting network has the size
and decoding speed of a single NMT network but performs on the level of a
3-ensemble system.Comment: Accepted at EMNLP 201
- …