We introduce the concept of a Modular Autoencoder (MAE), capable of learning
a set of diverse but complementary representations from unlabelled data, that
can later be used for supervised tasks. The learning of the representations is
controlled by a trade off parameter, and we show on six benchmark datasets the
optimum lies between two extremes: a set of smaller, independent autoencoders
each with low capacity, versus a single monolithic encoding, outperforming an
appropriate baseline. In the present paper we explore the special case of
linear MAE, and derive an SVD-based algorithm which converges several orders of
magnitude faster than gradient descent.Comment: 18 pages, 8 figures, to appear in a special issue of The Journal Of
Machine Learning Research (vol.44, Dec 2015