3 research outputs found
Autoencoders Learn Generative Linear Models
We provide a series of results for unsupervised learning with autoencoders.
Specifically, we study shallow two-layer autoencoder architectures with shared
weights. We focus on three generative models for data that are common in
statistical machine learning: (i) the mixture-of-gaussians model, (ii) the
sparse coding model, and (iii) the sparsity model with non-negative
coefficients. For each of these models, we prove that under suitable choices of
hyperparameters, architectures, and initialization, autoencoders learned by
gradient descent can successfully recover the parameters of the corresponding
model. To our knowledge, this is the first result that rigorously studies the
dynamics of gradient descent for weight-sharing autoencoders. Our analysis can
be viewed as theoretical evidence that shallow autoencoder modules indeed can
be used as feature learning mechanisms for a variety of data models, and may
shed insight on how to train larger stacked architectures with autoencoders as
basic building blocks.Comment: Experimental study on synthesis data added. Typos fixe
Walking the Tightrope: An Investigation of the Convolutional Autoencoder Bottleneck
In this paper, we present an in-depth investigation of the convolutional
autoencoder (CAE) bottleneck. Autoencoders (AE), and especially their
convolutional variants, play a vital role in the current deep learning toolbox.
Researchers and practitioners employ CAEs for a variety of tasks, ranging from
outlier detection and compression to transfer and representation learning.
Despite their widespread adoption, we have limited insight into how the
bottleneck shape impacts the emergent properties of the CAE. We demonstrate
that increased height and width of the bottleneck drastically improves
generalization, which in turn leads to better performance of the latent codes
in downstream transfer learning tasks. The number of channels in the
bottleneck, on the other hand, is secondary in importance. Furthermore, we show
empirically that, contrary to popular belief, CAEs do not learn to copy their
input, even when the bottleneck has the same number of neurons as there are
pixels in the input. Copying does not occur, despite training the CAE for 1,000
epochs on a tiny ( 600 images) dataset. We believe that the findings
in this paper are directly applicable and will lead to improvements in models
that rely on CAEs.Comment: code available at https://github.com/IljaManakov/WalkingTheTightrop
Learning Distributions Generated by One-Layer ReLU Networks
We consider the problem of estimating the parameters of a -dimensional
rectified Gaussian distribution from i.i.d. samples. A rectified Gaussian
distribution is defined by passing a standard Gaussian distribution through a
one-layer ReLU neural network. We give a simple algorithm to estimate the
parameters (i.e., the weight matrix and bias vector of the ReLU neural network)
up to an error using samples and
time (log factors are ignored for simplicity). This
implies that we can estimate the distribution up to in total
variation distance using samples, where
is the condition number of the covariance matrix. Our only assumption
is that the bias vector is non-negative. Without this non-negativity
assumption, we show that estimating the bias vector within any error requires
the number of samples at least exponential in the infinity norm of the bias
vector. Our algorithm is based on the key observation that vector norms and
pairwise angles can be estimated separately. We use a recent result on learning
from truncated samples. We also prove two sample complexity lower bounds:
samples are required to estimate the parameters up to
error , while samples are necessary to
estimate the distribution up to in total variation distance. The
first lower bound implies that our algorithm is optimal for parameter
estimation. Finally, we show an interesting connection between learning a
two-layer generative model and non-negative matrix factorization. Experimental
results are provided to support our analysis.Comment: NeurIPS 201