1,497 research outputs found
Implicit Density Estimation by Local Moment Matching to Sample from Auto-Encoders
Recent work suggests that some auto-encoder variants do a good job of
capturing the local manifold structure of the unknown data generating density.
This paper contributes to the mathematical understanding of this phenomenon and
helps define better justified sampling algorithms for deep learning based on
auto-encoder variants. We consider an MCMC where each step samples from a
Gaussian whose mean and covariance matrix depend on the previous state, defines
through its asymptotic distribution a target density. First, we show that good
choices (in the sense of consistency) for these mean and covariance functions
are the local expected value and local covariance under that target density.
Then we show that an auto-encoder with a contractive penalty captures
estimators of these local moments in its reconstruction function and its
Jacobian. A contribution of this work is thus a novel alternative to
maximum-likelihood density estimation, which we call local moment matching. It
also justifies a recently proposed sampling algorithm for the Contractive
Auto-Encoder and extends it to the Denoising Auto-Encoder
Representation Learning: A Review and New Perspectives
The success of machine learning algorithms generally depends on data
representation, and we hypothesize that this is because different
representations can entangle and hide more or less the different explanatory
factors of variation behind the data. Although specific domain knowledge can be
used to help design representations, learning with generic priors can also be
used, and the quest for AI is motivating the design of more powerful
representation-learning algorithms implementing such priors. This paper reviews
recent work in the area of unsupervised feature learning and deep learning,
covering advances in probabilistic models, auto-encoders, manifold learning,
and deep networks. This motivates longer-term unanswered questions about the
appropriate objectives for learning good representations, for computing
representations (i.e., inference), and the geometrical connections between
representation learning, density estimation and manifold learning
Score Function Features for Discriminative Learning: Matrix and Tensor Framework
Feature learning forms the cornerstone for tackling challenging learning
problems in domains such as speech, computer vision and natural language
processing. In this paper, we consider a novel class of matrix and
tensor-valued features, which can be pre-trained using unlabeled samples. We
present efficient algorithms for extracting discriminative information, given
these pre-trained features and labeled samples for any related task. Our class
of features are based on higher-order score functions, which capture local
variations in the probability density function of the input. We establish a
theoretical framework to characterize the nature of discriminative information
that can be extracted from score-function features, when used in conjunction
with labeled samples. We employ efficient spectral decomposition algorithms (on
matrices and tensors) for extracting discriminative components. The advantage
of employing tensor-valued features is that we can extract richer
discriminative information in the form of an overcomplete representations.
Thus, we present a novel framework for employing generative models of the input
for discriminative learning.Comment: 29 page
Mode Regularized Generative Adversarial Networks
Although Generative Adversarial Networks achieve state-of-the-art results on
a variety of generative tasks, they are regarded as highly unstable and prone
to miss modes. We argue that these bad behaviors of GANs are due to the very
particular functional shape of the trained discriminators in high dimensional
spaces, which can easily make training stuck or push probability mass in the
wrong direction, towards that of higher concentration than that of the data
generating distribution. We introduce several ways of regularizing the
objective, which can dramatically stabilize the training of GAN models. We also
show that our regularizers can help the fair distribution of probability mass
across the modes of the data generating distribution, during the early phases
of training and thus providing a unified solution to the missing modes problem.Comment: Published as a conference paper at ICLR 201
- …