While deep neural networks have been shown to perform remarkably well in many
machine learning tasks, labeling a large amount of ground truth data for
supervised training is usually very costly to scale. Therefore, learning robust
representations with unlabeled data is critical in relieving human effort and
vital for many downstream tasks. Recent advances in unsupervised and
self-supervised learning approaches for visual data have benefited greatly from
domain knowledge. Here we are interested in a more generic unsupervised
learning framework that can be easily generalized to other domains. In this
paper, we propose to learn data representations with a novel type of denoising
autoencoder, where the noisy input data is generated by corrupting latent clean
data in the gradient domain. This can be naturally generalized to span multiple
scales with a Laplacian pyramid representation of the input data. In this way,
the agent learns more robust representations that exploit the underlying data
structures across multiple scales. Experiments on several visual benchmarks
demonstrate that better representations can be learned with the proposed
approach, compared to its counterpart with single-scale corruption and other
approaches. Furthermore, we also demonstrate that the learned representations
perform well when transferring to other downstream vision tasks