487 research outputs found
Abstractive Text Classification Using Sequence-to-convolution Neural Networks
We propose a new deep neural network model and its training scheme for text
classification. Our model Sequence-to-convolution Neural Networks(Seq2CNN)
consists of two blocks: Sequential Block that summarizes input texts and
Convolution Block that receives summary of input and classifies it to a label.
Seq2CNN is trained end-to-end to classify various-length texts without
preprocessing inputs into fixed length. We also present Gradual Weight
Shift(GWS) method that stabilizes training. GWS is applied to our model's loss
function. We compared our model with word-based TextCNN trained with different
data preprocessing methods. We obtained significant improvement in
classification accuracy over word-based TextCNN without any ensemble or data
augmentation
Learning to Extract Coherent Summary via Deep Reinforcement Learning
Coherence plays a critical role in producing a high-quality summary from a
document. In recent years, neural extractive summarization is becoming
increasingly attractive. However, most of them ignore the coherence of
summaries when extracting sentences. As an effort towards extracting coherent
summaries, we propose a neural coherence model to capture the cross-sentence
semantic and syntactic coherence patterns. The proposed neural coherence model
obviates the need for feature engineering and can be trained in an end-to-end
fashion using unlabeled data. Empirical results show that the proposed neural
coherence model can efficiently capture the cross-sentence coherence patterns.
Using the combined output of the neural coherence model and ROUGE package as
the reward, we design a reinforcement learning method to train a proposed
neural extractive summarizer which is named Reinforced Neural Extractive
Summarization (RNES) model. The RNES model learns to optimize coherence and
informative importance of the summary simultaneously. Experimental results show
that the proposed RNES outperforms existing baselines and achieves
state-of-the-art performance in term of ROUGE on CNN/Daily Mail dataset. The
qualitative evaluation indicates that summaries produced by RNES are more
coherent and readable.Comment: 8 pages, 1 figure, presented at AAAI-201
Deconvolutional Paragraph Representation Learning
Learning latent representations from long text sequences is an important
first step in many natural language processing applications. Recurrent Neural
Networks (RNNs) have become a cornerstone for this challenging task. However,
the quality of sentences during RNN-based decoding (reconstruction) decreases
with the length of the text. We propose a sequence-to-sequence, purely
convolutional and deconvolutional autoencoding framework that is free of the
above issue, while also being computationally efficient. The proposed method is
simple, easy to implement and can be leveraged as a building block for many
applications. We show empirically that compared to RNNs, our framework is
better at reconstructing and correcting long paragraphs. Quantitative
evaluation on semi-supervised text classification and summarization tasks
demonstrate the potential for better utilization of long unlabeled text data.Comment: Accepted by NIPS 201
Distilling Knowledge Learned in BERT for Text Generation
Large-scale pre-trained language model such as BERT has achieved great
success in language understanding tasks. However, it remains an open question
how to utilize BERT for language generation. In this paper, we present a novel
approach, Conditional Masked Language Modeling (C-MLM), to enable the
finetuning of BERT on target generation tasks. The finetuned BERT (teacher) is
exploited as extra supervision to improve conventional Seq2Seq models (student)
for better text generation performance. By leveraging BERT's idiosyncratic
bidirectional nature, distilling knowledge learned in BERT can encourage
auto-regressive Seq2Seq models to plan ahead, imposing global sequence-level
supervision for coherent text generation. Experiments show that the proposed
approach significantly outperforms strong Transformer baselines on multiple
language generation tasks such as machine translation and text summarization.
Our proposed model also achieves new state of the art on IWSLT German-English
and English-Vietnamese MT datasets. Code is available at
https://github.com/ChenRocks/Distill-BERT-Textgen.Comment: ACL 202
The Natural Language Decathlon: Multitask Learning as Question Answering
Deep learning has improved performance on many natural language processing
(NLP) tasks individually. However, general NLP models cannot emerge within a
paradigm that focuses on the particularities of a single metric, dataset, and
task. We introduce the Natural Language Decathlon (decaNLP), a challenge that
spans ten tasks: question answering, machine translation, summarization,
natural language inference, sentiment analysis, semantic role labeling,
zero-shot relation extraction, goal-oriented dialogue, semantic parsing, and
commonsense pronoun resolution. We cast all tasks as question answering over a
context. Furthermore, we present a new Multitask Question Answering Network
(MQAN) jointly learns all tasks in decaNLP without any task-specific modules or
parameters in the multitask setting. MQAN shows improvements in transfer
learning for machine translation and named entity recognition, domain
adaptation for sentiment analysis and natural language inference, and zero-shot
capabilities for text classification. We demonstrate that the MQAN's
multi-pointer-generator decoder is key to this success and performance further
improves with an anti-curriculum training strategy. Though designed for
decaNLP, MQAN also achieves state of the art results on the WikiSQL semantic
parsing task in the single-task setting. We also release code for procuring and
processing data, training and evaluating models, and reproducing all
experiments for decaNLP
Dial2Desc: End-to-end Dialogue Description Generation
We first propose a new task named Dialogue Description (Dial2Desc). Unlike
other existing dialogue summarization tasks such as meeting summarization, we
do not maintain the natural flow of a conversation but describe an object or an
action of what people are talking about. The Dial2Desc system takes a dialogue
text as input, then outputs a concise description of the object or the action
involved in this conversation. After reading this short description, one can
quickly extract the main topic of a conversation and build a clear picture in
his mind, without reading or listening to the whole conversation. Based on the
existing dialogue dataset, we build a new dataset, which has more than one
hundred thousand dialogue-description pairs. As a step forward, we demonstrate
that one can get more accurate and descriptive results using a new neural
attentive model that exploits the interaction between utterances from different
speakers, compared with other baselines
Abstractive and Extractive Text Summarization using Document Context Vector and Recurrent Neural Networks
Sequence to sequence (Seq2Seq) learning has recently been used for
abstractive and extractive summarization. In current study, Seq2Seq models have
been used for eBay product description summarization. We propose a novel
Document-Context based Seq2Seq models using RNNs for abstractive and extractive
summarizations. Intuitively, this is similar to humans reading the title,
abstract or any other contextual information before reading the document. This
gives humans a high-level idea of what the document is about. We use this idea
and propose that Seq2Seq models should be started with contextual information
at the first time-step of the input to obtain better summaries. In this manner,
the output summaries are more document centric, than being generic, overcoming
one of the major hurdles of using generative models. We generate
document-context from user-behavior and seller provided information. We train
and evaluate our models on human-extracted-golden-summaries. The
document-contextual Seq2Seq models outperform standard Seq2Seq models.
Moreover, generating human extracted summaries is prohibitively expensive to
scale, we therefore propose a semi-supervised technique for extracting
approximate summaries and using it for training Seq2Seq models at scale.
Semi-supervised models are evaluated against human extracted summaries and are
found to be of similar efficacy. We provide side by side comparison for
abstractive and extractive summarizers (contextual and non-contextual) on same
evaluation dataset. Overall, we provide methodologies to use and evaluate the
proposed techniques for large document summarization. Furthermore, we found
these techniques to be highly effective, which is not the case with existing
techniques.Comment: ACM KDD 2018 Deep Learning Da
Convolutional Sequence to Sequence Learning
The prevalent approach to sequence to sequence learning maps an input
sequence to a variable length output sequence via recurrent neural networks. We
introduce an architecture based entirely on convolutional neural networks.
Compared to recurrent models, computations over all elements can be fully
parallelized during training and optimization is easier since the number of
non-linearities is fixed and independent of the input length. Our use of gated
linear units eases gradient propagation and we equip each decoder layer with a
separate attention module. We outperform the accuracy of the deep LSTM setup of
Wu et al. (2016) on both WMT'14 English-German and WMT'14 English-French
translation at an order of magnitude faster speed, both on GPU and CPU
Neural Extractive Summarization with Side Information
Most extractive summarization methods focus on the main body of the document
from which sentences need to be extracted. However, the gist of the document
may lie in side information, such as the title and image captions which are
often available for newswire articles. We propose to explore side information
in the context of single-document extractive summarization. We develop a
framework for single-document summarization composed of a hierarchical document
encoder and an attention-based extractor with attention over side information.
We evaluate our model on a large scale news dataset. We show that extractive
summarization with side information consistently outperforms its counterpart
that does not use any side information, in terms of both informativeness and
fluency.Comment: 9 page
Doc2Im: document to image conversion through self-attentive embedding
Text classification is a fundamental task in NLP applications. Latest
research in this field has largely been divided into two major sub-fields.
Learning representations is one sub-field and learning deeper models, both
sequential and convolutional, which again connects back to the representation
is the other side. We posit the idea that the stronger the representation is,
the simpler classifier models are needed to achieve higher performance. In this
paper we propose a completely novel direction to text classification research,
wherein we convert text to a representation very similar to images, such that
any deep network able to handle images is equally able to handle text. We take
a deeper look at the representation of documents as an image and subsequently
utilize very simple convolution based models taken as is from computer vision
domain. This image can be cropped, re-scaled, re-sampled and augmented just
like any other image to work with most of the state-of-the-art large
convolution based models which have been designed to handle large image
datasets. We show impressive results with some of the latest benchmarks in the
related fields. We perform transfer learning experiments, both from text to
text domain and also from image to text domain. We believe this is a paradigm
shift from the way document understanding and text classification has been
traditionally done, and will drive numerous novel research ideas in the
community
- …