Weight-sharing is ubiquitous in deep learning. Motivated by this, we propose
a "weight-sharing regularization" penalty on the weights wβRd
of a neural network, defined as R(w)=dβ11ββi>jdββ£wiββwjββ£. We study the proximal mapping of R and provide an
intuitive interpretation of it in terms of a physical system of interacting
particles. We also parallelize existing algorithms for
proxRβ (to run on GPU) and find that one of them is
fast in practice but slow (O(d)) for worst-case inputs. Using the physical
interpretation, we design a novel parallel algorithm which runs in O(log3d) when sufficient processors are available, thus guaranteeing fast training.
Our experiments reveal that weight-sharing regularization enables fully
connected networks to learn convolution-like filters even when pixels have been
shuffled while convolutional neural networks fail in this setting. Our code is
available on github.Comment: Our code is available at
https://github.com/motahareh-sohrabi/weight-sharing-regularizatio