76,443 research outputs found
Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting
Inspired by how humans summarize long documents, we propose an accurate and
fast summarization model that first selects salient sentences and then rewrites
them abstractively (i.e., compresses and paraphrases) to generate a concise
overall summary. We use a novel sentence-level policy gradient method to bridge
the non-differentiable computation between these two neural networks in a
hierarchical way, while maintaining language fluency. Empirically, we achieve
the new state-of-the-art on all metrics (including human evaluation) on the
CNN/Daily Mail dataset, as well as significantly higher abstractiveness scores.
Moreover, by first operating at the sentence-level and then the word-level, we
enable parallel decoding of our neural generative model that results in
substantially faster (10-20x) inference speed as well as 4x faster training
convergence than previous long-paragraph encoder-decoder models. We also
demonstrate the generalization of our model on the test-only DUC-2002 dataset,
where we achieve higher scores than a state-of-the-art model.Comment: ACL 2018 (17 pages
Towards Robust Neural Vocoding for Speech Generation: A Survey
Recently, neural vocoders have been widely used in speech synthesis tasks,
including text-to-speech and voice conversion. However, when encountering data
distribution mismatch between training and inference, neural vocoders trained
on real data often degrade in voice quality for unseen scenarios. In this
paper, we train four common neural vocoders, including WaveNet, WaveRNN,
FFTNet, Parallel WaveGAN alternately on five different datasets. To study the
robustness of neural vocoders, we evaluate the models using acoustic features
from seen/unseen speakers, seen/unseen languages, a text-to-speech model, and a
voice conversion model. We found out that the speaker variety is much more
important for achieving a universal vocoder than the language. Through our
experiments, we show that WaveNet and WaveRNN are more suitable for
text-to-speech models, while Parallel WaveGAN is more suitable for voice
conversion applications. Great amount of subjective MOS results in naturalness
for all vocoders are presented for future studies.Comment: Submitted to INTERSPEECH 202
Pragmatic Neural Language Modelling in Machine Translation
This paper presents an in-depth investigation on integrating neural language
models in translation systems. Scaling neural language models is a difficult
task, but crucial for real-world applications. This paper evaluates the impact
on end-to-end MT quality of both new and existing scaling techniques. We show
when explicitly normalising neural models is necessary and what optimisation
tricks one should use in such scenarios. We also focus on scalable training
algorithms and investigate noise contrastive estimation and diagonal contexts
as sources for further speed improvements. We explore the trade-offs between
neural models and back-off n-gram models and find that neural models make
strong candidates for natural language applications in memory constrained
environments, yet still lag behind traditional models in raw translation
quality. We conclude with a set of recommendations one should follow to build a
scalable neural language model for MT.Comment: NAACL 201
From English To Foreign Languages: Transferring Pre-trained Language Models
Pre-trained models have demonstrated their effectiveness in many downstream
natural language processing (NLP) tasks. The availability of multilingual
pre-trained models enables zero-shot transfer of NLP tasks from high resource
languages to low resource ones. However, recent research in improving
pre-trained models focuses heavily on English. While it is possible to train
the latest neural architectures for other languages from scratch, it is
undesirable due to the required amount of compute. In this work, we tackle the
problem of transferring an existing pre-trained model from English to other
languages under a limited computational budget. With a single GPU, our approach
can obtain a foreign BERT base model within a day and a foreign BERT large
within two days. Furthermore, evaluating our models on six languages, we
demonstrate that our models are better than multilingual BERT on two zero-shot
tasks: natural language inference and dependency parsing
Discrete Flows: Invertible Generative Models of Discrete Data
While normalizing flows have led to significant advances in modeling
high-dimensional continuous distributions, their applicability to discrete
distributions remains unknown. In this paper, we show that flows can in fact be
extended to discrete events---and under a simple change-of-variables formula
not requiring log-determinant-Jacobian computations. Discrete flows have
numerous applications. We consider two flow architectures: discrete
autoregressive flows that enable bidirectionality, allowing, for example,
tokens in text to depend on both left-to-right and right-to-left contexts in an
exact language model; and discrete bipartite flows that enable efficient
non-autoregressive generation as in RealNVP. Empirically, we find that discrete
autoregressive flows outperform autoregressive baselines on synthetic discrete
distributions, an addition task, and Potts models; and bipartite flows can
obtain competitive performance with autoregressive baselines on character-level
language modeling for Penn Tree Bank and text8
The ADAPT System Description for the IWSLT 2018 Basque to English Translation Task
In this paper we present the ADAPT system built for the Basque to English Low
Resource MT Evaluation Campaign. Basque is a low-resourced,
morphologically-rich language. This poses a challenge for Neural Machine
Translation models which usually achieve better performance when trained with
large sets of data.
Accordingly, we used synthetic data to improve the translation quality
produced by a model built using only authentic data. Our proposal uses
back-translated data to: (a) create new sentences, so the system can be trained
with more data; and (b) translate sentences that are close to the test set, so
the model can be fine-tuned to the document to be translated
An Analysis of Neural Language Modeling at Multiple Scales
Many of the leading approaches in language modeling introduce novel, complex
and specialized architectures. We take existing state-of-the-art word level
language models based on LSTMs and QRNNs and extend them to both larger
vocabularies as well as character-level granularity. When properly tuned, LSTMs
and QRNNs achieve state-of-the-art results on character-level (Penn Treebank,
enwik8) and word-level (WikiText-103) datasets, respectively. Results are
obtained in only 12 hours (WikiText-103) to 2 days (enwik8) using a single
modern GPU
A Stable and Effective Learning Strategy for Trainable Greedy Decoding
Beam search is a widely used approximate search strategy for neural network
decoders, and it generally outperforms simple greedy decoding on tasks like
machine translation. However, this improvement comes at substantial
computational cost. In this paper, we propose a flexible new method that allows
us to reap nearly the full benefits of beam search with nearly no additional
computational cost. The method revolves around a small neural network actor
that is trained to observe and manipulate the hidden state of a
previously-trained decoder. To train this actor network, we introduce the use
of a pseudo-parallel corpus built using the output of beam search on a base
model, ranked by a target quality metric like BLEU. Our method is inspired by
earlier work on this problem, but requires no reinforcement learning, and can
be trained reliably on a range of models. Experiments on three parallel corpora
and three architectures show that the method yields substantial improvements in
translation quality and speed over each base system.Comment: Accepted by EMNLP 201
Rethinking Full Connectivity in Recurrent Neural Networks
Recurrent neural networks (RNNs) are omnipresent in sequence modeling tasks.
Practical models usually consist of several layers of hundreds or thousands of
neurons which are fully connected. This places a heavy computational and memory
burden on hardware, restricting adoption in practical low-cost and low-power
devices. Compared to fully convolutional models, the costly sequential
operation of RNNs severely hinders performance on parallel hardware. This paper
challenges the convention of full connectivity in RNNs. We study structurally
sparse RNNs, showing that they are well suited for acceleration on parallel
hardware, with a greatly reduced cost of the recurrent operations as well as
orders of magnitude less recurrent weights. Extensive experiments on
challenging tasks ranging from language modeling and speech recognition to
video action recognition reveal that structurally sparse RNNs achieve
competitive performance as compared to fully-connected networks. This allows
for using large sparse RNNs for a wide range of real-world tasks that
previously were too costly with fully connected networks
Unsupervised Neural Machine Translation Initialized by Unsupervised Statistical Machine Translation
Recent work achieved remarkable results in training neural machine
translation (NMT) systems in a fully unsupervised way, with new and dedicated
architectures that rely on monolingual corpora only. In this work, we propose
to define unsupervised NMT (UNMT) as NMT trained with the supervision of
synthetic bilingual data. Our approach straightforwardly enables the use of
state-of-the-art architectures proposed for supervised NMT by replacing
human-made bilingual data with synthetic bilingual data for training. We
propose to initialize the training of UNMT with synthetic bilingual data
generated by unsupervised statistical machine translation (USMT). The UNMT
system is then incrementally improved using back-translation. Our preliminary
experiments show that our approach achieves a new state-of-the-art for
unsupervised machine translation on the WMT16 German--English news translation
task, for both translation directions.Comment: preliminary wor
- …