140 research outputs found
A Semantic Relevance Based Neural Network for Text Summarization and Text Simplification
Text summarization and text simplification are two major ways to simplify the
text for poor readers, including children, non-native speakers, and the
functionally illiterate. Text summarization is to produce a brief summary of
the main ideas of the text, while text simplification aims to reduce the
linguistic complexity of the text and retain the original meaning. Recently,
most approaches for text summarization and text simplification are based on the
sequence-to-sequence model, which achieves much success in many text generation
tasks. However, although the generated simplified texts are similar to source
texts literally, they have low semantic relevance. In this work, our goal is to
improve semantic relevance between source texts and simplified texts for text
summarization and text simplification. We introduce a Semantic Relevance Based
neural model to encourage high semantic similarity between texts and summaries.
In our model, the source text is represented by a gated attention encoder,
while the summary representation is produced by a decoder. Besides, the
similarity score between the representations is maximized during training. Our
experiments show that the proposed model outperforms the state-of-the-art
systems on two benchmark corpus
Lock-Free Parallel Perceptron for Graph-based Dependency Parsing
Dependency parsing is an important NLP task. A popular approach for
dependency parsing is structured perceptron. Still, graph-based dependency
parsing has the time complexity of , and it suffers from slow training.
To deal with this problem, we propose a parallel algorithm called parallel
perceptron. The parallel algorithm can make full use of a multi-core computer
which saves a lot of training time. Based on experiments we observe that
dependency parsing with parallel perceptron can achieve 8-fold faster training
speed than traditional structured perceptron methods when using 10 threads, and
with no loss at all in accuracy
A Generic Online Parallel Learning Framework for Large Margin Models
To speed up the training process, many existing systems use parallel
technology for online learning algorithms. However, most research mainly focus
on stochastic gradient descent (SGD) instead of other algorithms. We propose a
generic online parallel learning framework for large margin models, and also
analyze our framework on popular large margin algorithms, including MIRA and
Structured Perceptron. Our framework is lock-free and easy to implement on
existing systems. Experiments show that systems with our framework can gain
near linear speed up by increasing running threads, and with no loss in
accuracy
Decoding-History-Based Adaptive Control of Attention for Neural Machine Translation
Attention-based sequence-to-sequence model has proved successful in Neural
Machine Translation (NMT). However, the attention without consideration of
decoding history, which includes the past information in the decoder and the
attention mechanism, often causes much repetition. To address this problem, we
propose the decoding-history-based Adaptive Control of Attention (ACA) for the
NMT model. ACA learns to control the attention by keeping track of the decoding
history and the current information with a memory vector, so that the model can
take the translated contents and the current information into consideration.
Experiments on Chinese-English translation and the English-Vietnamese
translation have demonstrated that our model significantly outperforms the
strong baselines. The analysis shows that our model is capable of generating
translation with less repetition and higher accuracy. The code will be
available at https://github.com/lancopk
meProp: Sparsified Back Propagation for Accelerated Deep Learning with Reduced Overfitting
We propose a simple yet effective technique for neural network learning. The
forward propagation is computed as usual. In back propagation, only a small
subset of the full gradient is computed to update the model parameters. The
gradient vectors are sparsified in such a way that only the top- elements
(in terms of magnitude) are kept. As a result, only rows or columns
(depending on the layout) of the weight matrix are modified, leading to a
linear reduction ( divided by the vector dimension) in the computational
cost. Surprisingly, experimental results demonstrate that we can update only
1-4% of the weights at each back propagation pass. This does not result in a
larger number of training iterations. More interestingly, the accuracy of the
resulting models is actually improved rather than degraded, and a detailed
analysis is given. The code is available at https://github.com/lancopku/mePropComment: Accepted by the 34th International Conference on Machine Learning
(ICML 2017
Unsupervised Machine Commenting with Neural Variational Topic Model
Article comments can provide supplementary opinions and facts for readers,
thereby increase the attraction and engagement of articles. Therefore,
automatically commenting is helpful in improving the activeness of the
community, such as online forums and news websites. Previous work shows that
training an automatic commenting system requires large parallel corpora.
Although part of articles are naturally paired with the comments on some
websites, most articles and comments are unpaired on the Internet. To fully
exploit the unpaired data, we completely remove the need for parallel data and
propose a novel unsupervised approach to train an automatic article commenting
model, relying on nothing but unpaired articles and comments. Our model is
based on a retrieval-based commenting framework, which uses news to retrieve
comments based on the similarity of their topics. The topic representation is
obtained from a neural variational topic model, which is trained in an
unsupervised manner. We evaluate our model on a news comment dataset.
Experiments show that our proposed topic-based approach significantly
outperforms previous lexicon-based models. The model also profits from paired
corpora and achieves state-of-the-art performance under semi-supervised
scenarios
Bag-of-Words as Target for Neural Machine Translation
A sentence can be translated into more than one correct sentences. However,
most of the existing neural machine translation models only use one of the
correct translations as the targets, and the other correct sentences are
punished as the incorrect sentences in the training stage. Since most of the
correct translations for one sentence share the similar bag-of-words, it is
possible to distinguish the correct translations from the incorrect ones by the
bag-of-words. In this paper, we propose an approach that uses both the
sentences and the bag-of-words as targets in the training stage, in order to
encourage the model to generate the potentially correct sentences that are not
appeared in the training set. We evaluate our model on a Chinese-English
translation dataset, and experiments show our model outperforms the strong
baselines by the BLEU score of 4.55.Comment: accepted by ACL 201
Automatic Academic Paper Rating Based on Modularized Hierarchical Convolutional Neural Network
As more and more academic papers are being submitted to conferences and
journals, evaluating all these papers by professionals is time-consuming and
can cause inequality due to the personal factors of the reviewers. In this
paper, in order to assist professionals in evaluating academic papers, we
propose a novel task: automatic academic paper rating (AAPR), which
automatically determine whether to accept academic papers. We build a new
dataset for this task and propose a novel modularized hierarchical
convolutional neural network to achieve automatic academic paper rating.
Evaluation results show that the proposed model outperforms the baselines by a
large margin. The dataset and code are available at
\url{https://github.com/lancopku/AAPR}Comment: Accepted by ACL201
Autoencoder as Assistant Supervisor: Improving Text Representation for Chinese Social Media Text Summarization
Most of the current abstractive text summarization models are based on the
sequence-to-sequence model (Seq2Seq). The source content of social media is
long and noisy, so it is difficult for Seq2Seq to learn an accurate semantic
representation. Compared with the source content, the annotated summary is
short and well written. Moreover, it shares the same meaning as the source
content. In this work, we supervise the learning of the representation of the
source content with that of the summary. In implementation, we regard a summary
autoencoder as an assistant supervisor of Seq2Seq. Following previous work, we
evaluate our model on a popular Chinese social media dataset. Experimental
results show that our model achieves the state-of-the-art performances on the
benchmark dataset.Comment: accepted by ACL 201
Does Higher Order LSTM Have Better Accuracy for Segmenting and Labeling Sequence Data?
Existing neural models usually predict the tag of the current token
independent of the neighboring tags. The popular LSTM-CRF model considers the
tag dependencies between every two consecutive tags. However, it is hard for
existing neural models to take longer distance dependencies of tags into
consideration. The scalability is mainly limited by the complex model
structures and the cost of dynamic programming during training. In our work, we
first design a new model called "high order LSTM" to predict multiple tags for
the current token which contains not only the current tag but also the previous
several tags. We call the number of tags in one prediction as "order". Then we
propose a new method called Multi-Order BiLSTM (MO-BiLSTM) which combines low
order and high order LSTMs together. MO-BiLSTM keeps the scalability to high
order models with a pruning technique. We evaluate MO-BiLSTM on all-phrase
chunking and NER datasets. Experiment results show that MO-BiLSTM achieves the
state-of-the-art result in chunking and highly competitive results in two NER
datasets.Comment: Accepted by COLING 201
- …