203 research outputs found
Title-Guided Encoding for Keyphrase Generation
Keyphrase generation (KG) aims to generate a set of keyphrases given a
document, which is a fundamental task in natural language processing (NLP).
Most previous methods solve this problem in an extractive manner, while
recently, several attempts are made under the generative setting using deep
neural networks. However, the state-of-the-art generative methods simply treat
the document title and the document main body equally, ignoring the leading
role of the title to the overall document. To solve this problem, we introduce
a new model called Title-Guided Network (TG-Net) for automatic keyphrase
generation task based on the encoder-decoder architecture with two new
features: (i) the title is additionally employed as a query-like input, and
(ii) a title-guided encoder gathers the relevant information from the title to
each word in the document. Experiments on a range of KG datasets demonstrate
that our model outperforms the state-of-the-art models with a large margin,
especially for documents with either very low or very high title length ratios.Comment: AAAI 1
Semi-Supervised Learning for Neural Keyphrase Generation
We study the problem of generating keyphrases that summarize the key points
for a given document. While sequence-to-sequence (seq2seq) models have achieved
remarkable performance on this task (Meng et al., 2017), model training often
relies on large amounts of labeled data, which is only applicable to
resource-rich domains. In this paper, we propose semi-supervised keyphrase
generation methods by leveraging both labeled data and large-scale unlabeled
samples for learning. Two strategies are proposed. First, unlabeled documents
are first tagged with synthetic keyphrases obtained from unsupervised keyphrase
extraction methods or a selflearning algorithm, and then combined with labeled
samples for training. Furthermore, we investigate a multi-task learning
framework to jointly learn to generate keyphrases as well as the titles of the
articles. Experimental results show that our semi-supervised learning-based
methods outperform a state-of-the-art model trained with labeled data only.Comment: To appear in EMNLP 2018 (12 pages, 7 figures, 6 tables
Persian Keyphrase Generation Using Sequence-to-Sequence Models
Keyphrases are a very short summary of an input text and provide the main
subjects discussed in the text. Keyphrase extraction is a useful upstream task
and can be used in various natural language processing problems, for example,
text summarization and information retrieval, to name a few. However, not all
the keyphrases are explicitly mentioned in the body of the text. In real-world
examples there are always some topics that are discussed implicitly. Extracting
such keyphrases requires a generative approach, which is adopted here. In this
paper, we try to tackle the problem of keyphrase generation and extraction from
news articles using deep sequence-to-sequence models. These models
significantly outperform the conventional methods such as Topic Rank, KPMiner,
and KEA in the task of keyphrase extraction
Evaluating anaphora and coreference resolution to improve automatic keyphrase extraction
In this paper we analyze the effectiveness of using linguistic knowledge from coreference and
anaphora resolution for improving the performance for supervised keyphrase extraction. In order
to verify the impact of these features, we de\ufb01ne a baseline keyphrase extraction system and
evaluate its performance on a standard dataset using different machine learning algorithms. Then,
we consider new sets of features by adding combinations of the linguistic features we propose
and we evaluate the new performance of the system. We also use anaphora and coreference
resolution to transform the documents, trying to simulate the cohesion process performed by the
human mind. We found that our approach has a slightly positive impact on the performance of
automatic keyphrase extraction, in particular when considering the ranking of the results
Efficient Keyphrase Generation with GANs
Keyphrase Generation is the task of predicting keyphrases: short text sequences that convey the main semantic meaning of a document. In this paper, we introduce a keyphrase generation approach that makes use of a Generative Adversarial Networks (GANs) architecture. In our system, the Generator produces a sequence of keyphrases for an input document. The Discriminator, in turn, tries to distinguish between machine generated and human curated keyphrases. We propose a novel Discriminator architecture based on a BERT pretrained model fine-tuned for Sequence Classification. We train our proposed architecture using only a small subset of the standard available training dataset, amounting to less than 1% of the total, achieving a great level of data efficiency. The resulting model is evaluated on five public datasets, obtaining competitive and promising results with respect to four state-of-the-art generative models
- …