2,557 research outputs found
Learning neural trans-dimensional random field language models with noise-contrastive estimation
Trans-dimensional random field language models (TRF LMs) where sentences are
modeled as a collection of random fields, have shown close performance with
LSTM LMs in speech recognition and are computationally more efficient in
inference. However, the training efficiency of neural TRF LMs is not
satisfactory, which limits the scalability of TRF LMs on large training corpus.
In this paper, several techniques on both model formulation and parameter
estimation are proposed to improve the training efficiency and the performance
of neural TRF LMs. First, TRFs are reformulated in the form of exponential
tilting of a reference distribution. Second, noise-contrastive estimation (NCE)
is introduced to jointly estimate the model parameters and normalization
constants. Third, we extend the neural TRF LMs by marrying the deep
convolutional neural network (CNN) and the bidirectional LSTM into the
potential function to extract the deep hierarchical features and
bidirectionally sequential features. Utilizing all the above techniques enables
the successful and efficient training of neural TRF LMs on a 40x larger
training set with only 1/3 training time and further reduces the WER with
relative reduction of 4.7% on top of a strong LSTM LM baseline.Comment: 5 pages and 2 figure
Representation Learning: A Review and New Perspectives
The success of machine learning algorithms generally depends on data
representation, and we hypothesize that this is because different
representations can entangle and hide more or less the different explanatory
factors of variation behind the data. Although specific domain knowledge can be
used to help design representations, learning with generic priors can also be
used, and the quest for AI is motivating the design of more powerful
representation-learning algorithms implementing such priors. This paper reviews
recent work in the area of unsupervised feature learning and deep learning,
covering advances in probabilistic models, auto-encoders, manifold learning,
and deep networks. This motivates longer-term unanswered questions about the
appropriate objectives for learning good representations, for computing
representations (i.e., inference), and the geometrical connections between
representation learning, density estimation and manifold learning
The Poisson transform for unnormalised statistical models
Contrary to standard statistical models, unnormalised statistical models only
specify the likelihood function up to a constant. While such models are natural
and popular, the lack of normalisation makes inference much more difficult.
Here we show that inferring the parameters of a unnormalised model on a space
can be mapped onto an equivalent problem of estimating the intensity
of a Poisson point process on . The unnormalised statistical model now
specifies an intensity function that does not need to be normalised.
Effectively, the normalisation constant may now be inferred as just another
parameter, at no loss of information. The result can be extended to cover
non-IID models, which includes for example unnormalised models for sequences of
graphs (dynamical graphs), or for sequences of binary vectors. As a
consequence, we prove that unnormalised parameteric inference in non-IID models
can be turned into a semi-parametric estimation problem. Moreover, we show that
the noise-contrastive divergence of Gutmann & Hyv\"arinen (2012) can be
understood as an approximation of the Poisson transform, and extended to
non-IID settings. We use our results to fit spatial Markov chain models of eye
movements, where the Poisson transform allows us to turn a highly non-standard
model into vanilla semi-parametric logistic regression
- …