580 research outputs found
Semi-Supervised Learning for Neural Keyphrase Generation
We study the problem of generating keyphrases that summarize the key points
for a given document. While sequence-to-sequence (seq2seq) models have achieved
remarkable performance on this task (Meng et al., 2017), model training often
relies on large amounts of labeled data, which is only applicable to
resource-rich domains. In this paper, we propose semi-supervised keyphrase
generation methods by leveraging both labeled data and large-scale unlabeled
samples for learning. Two strategies are proposed. First, unlabeled documents
are first tagged with synthetic keyphrases obtained from unsupervised keyphrase
extraction methods or a selflearning algorithm, and then combined with labeled
samples for training. Furthermore, we investigate a multi-task learning
framework to jointly learn to generate keyphrases as well as the titles of the
articles. Experimental results show that our semi-supervised learning-based
methods outperform a state-of-the-art model trained with labeled data only.Comment: To appear in EMNLP 2018 (12 pages, 7 figures, 6 tables
DivGraphPointer: A Graph Pointer Network for Extracting Diverse Keyphrases
Keyphrase extraction from documents is useful to a variety of applications
such as information retrieval and document summarization. This paper presents
an end-to-end method called DivGraphPointer for extracting a set of diversified
keyphrases from a document. DivGraphPointer combines the advantages of
traditional graph-based ranking methods and recent neural network-based
approaches. Specifically, given a document, a word graph is constructed from
the document based on word proximity and is encoded with graph convolutional
networks, which effectively capture document-level word salience by modeling
long-range dependency between words in the document and aggregating multiple
appearances of identical words into one node. Furthermore, we propose a
diversified point network to generate a set of diverse keyphrases out of the
word graph in the decoding process. Experimental results on five benchmark data
sets show that our proposed method significantly outperforms the existing
state-of-the-art approaches.Comment: Accepted to SIGIR 201
BIKE: Bilingual Keyphrase Experiments
This paper presents a novel strategy for translating lists
of keyphrases. Typical keyphrase lists appear in
scientific articles, information retrieval systems and
web page meta-data. Our system combines a statistical
translation model trained on a bilingual corpus of
scientific papers with sense-focused look-up in a large
bilingual terminological resource. For the latter,
we developed a novel technique that benefits from viewing
the keyphrase list as contextual help for sense
disambiguation. The optimal combination of modules was
discovered by a genetic algorithm. Our work applies to
the French / English language pair
Multi-Task Learning of Keyphrase Boundary Classification
Keyphrase boundary classification (KBC) is the task of detecting keyphrases
in scientific articles and labelling them with respect to predefined types.
Although important in practice, this task is so far underexplored, partly due
to the lack of labelled data. To overcome this, we explore several auxiliary
tasks, including semantic super-sense tagging and identification of multi-word
expressions, and cast the task as a multi-task learning problem with deep
recurrent neural networks. Our multi-task models perform significantly better
than previous state of the art approaches on two scientific KBC datasets,
particularly for long keyphrases.Comment: ACL 201
- …