7,539 research outputs found
Representation Learning: A Review and New Perspectives
The success of machine learning algorithms generally depends on data
representation, and we hypothesize that this is because different
representations can entangle and hide more or less the different explanatory
factors of variation behind the data. Although specific domain knowledge can be
used to help design representations, learning with generic priors can also be
used, and the quest for AI is motivating the design of more powerful
representation-learning algorithms implementing such priors. This paper reviews
recent work in the area of unsupervised feature learning and deep learning,
covering advances in probabilistic models, auto-encoders, manifold learning,
and deep networks. This motivates longer-term unanswered questions about the
appropriate objectives for learning good representations, for computing
representations (i.e., inference), and the geometrical connections between
representation learning, density estimation and manifold learning
Weighted Contrastive Divergence
Learning algorithms for energy based Boltzmann architectures that rely on
gradient descent are in general computationally prohibitive, typically due to
the exponential number of terms involved in computing the partition function.
In this way one has to resort to approximation schemes for the evaluation of
the gradient. This is the case of Restricted Boltzmann Machines (RBM) and its
learning algorithm Contrastive Divergence (CD). It is well-known that CD has a
number of shortcomings, and its approximation to the gradient has several
drawbacks. Overcoming these defects has been the basis of much research and new
algorithms have been devised, such as persistent CD. In this manuscript we
propose a new algorithm that we call Weighted CD (WCD), built from small
modifications of the negative phase in standard CD. However small these
modifications may be, experimental work reported in this paper suggest that WCD
provides a significant improvement over standard CD and persistent CD at a
small additional computational cost
Inducing Features of Random Fields
We present a technique for constructing random fields from a set of training
samples. The learning paradigm builds increasingly complex fields by allowing
potential functions, or features, that are supported by increasingly large
subgraphs. Each feature has a weight that is trained by minimizing the
Kullback-Leibler divergence between the model and the empirical distribution of
the training data. A greedy algorithm determines how features are incrementally
added to the field and an iterative scaling algorithm is used to estimate the
optimal values of the weights.
The statistical modeling techniques introduced in this paper differ from
those common to much of the natural language processing literature since there
is no probabilistic finite state or push-down automaton on which the model is
built. Our approach also differs from the techniques common to the computer
vision literature in that the underlying random fields are non-Markovian and
have a large number of parameters that must be estimated. Relations to other
learning approaches including decision trees and Boltzmann machines are given.
As a demonstration of the method, we describe its application to the problem of
automatic word classification in natural language processing.
Key words: random field, Kullback-Leibler divergence, iterative scaling,
divergence geometry, maximum entropy, EM algorithm, statistical learning,
clustering, word morphology, natural language processingComment: 34 pages, compressed postscrip
- …