15,124 research outputs found
Speeding up Context-based Sentence Representation Learning with Non-autoregressive Convolutional Decoding
Context plays an important role in human language understanding, thus it may
also be useful for machines learning vector representations of language. In
this paper, we explore an asymmetric encoder-decoder structure for unsupervised
context-based sentence representation learning. We carefully designed
experiments to show that neither an autoregressive decoder nor an RNN decoder
is required. After that, we designed a model which still keeps an RNN as the
encoder, while using a non-autoregressive convolutional decoder. We further
combine a suite of effective designs to significantly improve model efficiency
while also achieving better performance. Our model is trained on two different
large unlabelled corpora, and in both cases the transferability is evaluated on
a set of downstream NLP tasks. We empirically show that our model is simple and
fast while producing rich sentence representations that excel in downstream
tasks
Non-Autoregressive Neural Machine Translation with Enhanced Decoder Input
Non-autoregressive translation (NAT) models, which remove the dependence on
previous target tokens from the inputs of the decoder, achieve significantly
inference speedup but at the cost of inferior accuracy compared to
autoregressive translation (AT) models. Previous work shows that the quality of
the inputs of the decoder is important and largely impacts the model accuracy.
In this paper, we propose two methods to enhance the decoder inputs so as to
improve NAT models. The first one directly leverages a phrase table generated
by conventional SMT approaches to translate source tokens to target tokens,
which are then fed into the decoder as inputs. The second one transforms
source-side word embeddings to target-side word embeddings through
sentence-level alignment and word-level adversary learning, and then feeds the
transformed word embeddings into the decoder as inputs. Experimental results
show our method largely outperforms the NAT baseline~\citep{gu2017non} by
BLEU scores on WMT14 English-German task and BLEU scores on WMT16
English-Romanian task.Comment: AAAI 201
Boosting Monte Carlo simulations of spin glasses using autoregressive neural networks
The autoregressive neural networks are emerging as a powerful computational
tool to solve relevant problems in classical and quantum mechanics. One of
their appealing functionalities is that, after they have learned a probability
distribution from a dataset, they allow exact and efficient sampling of typical
system configurations. Here we employ a neural autoregressive distribution
estimator (NADE) to boost Markov chain Monte Carlo (MCMC) simulations of a
paradigmatic classical model of spin-glass theory, namely the two-dimensional
Edwards-Anderson Hamiltonian. We show that a NADE can be trained to accurately
mimic the Boltzmann distribution using unsupervised learning from system
configurations generated using standard MCMC algorithms. The trained NADE is
then employed as smart proposal distribution for the Metropolis-Hastings
algorithm. This allows us to perform efficient MCMC simulations, which provide
unbiased results even if the expectation value corresponding to the probability
distribution learned by the NADE is not exact. Notably, we implement a
sequential tempering procedure, whereby a NADE trained at a higher temperature
is iteratively employed as proposal distribution in a MCMC simulation run at a
slightly lower temperature. This allows one to efficiently simulate the
spin-glass model even in the low-temperature regime, avoiding the divergent
correlation times that plague MCMC simulations driven by local-update
algorithms. Furthermore, we show that the NADE-driven simulations quickly
sample ground-state configurations, paving the way to their future utilization
to tackle binary optimization problems.Comment: 13 pages, 14 figure
A Deep and Autoregressive Approach for Topic Modeling of Multimodal Data
Topic modeling based on latent Dirichlet allocation (LDA) has been a
framework of choice to deal with multimodal data, such as in image annotation
tasks. Another popular approach to model the multimodal data is through deep
neural networks, such as the deep Boltzmann machine (DBM). Recently, a new type
of topic model called the Document Neural Autoregressive Distribution Estimator
(DocNADE) was proposed and demonstrated state-of-the-art performance for text
document modeling. In this work, we show how to successfully apply and extend
this model to multimodal data, such as simultaneous image classification and
annotation. First, we propose SupDocNADE, a supervised extension of DocNADE,
that increases the discriminative power of the learned hidden topic features
and show how to employ it to learn a joint representation from image visual
words, annotation words and class label information. We test our model on the
LabelMe and UIUC-Sports data sets and show that it compares favorably to other
topic models. Second, we propose a deep extension of our model and provide an
efficient way of training the deep model. Experimental results show that our
deep model outperforms its shallow version and reaches state-of-the-art
performance on the Multimedia Information Retrieval (MIR) Flickr data set.Comment: 24 pages, 10 figures. A version has been accepted by TPAMI on Aug
4th, 2015. Add footnote about how to train the model in practice in Section
5.1. arXiv admin note: substantial text overlap with arXiv:1305.530
- …