502 research outputs found
SemEval 2017 Task 10: ScienceIE - Extracting Keyphrases and Relations from Scientific Publications
We describe the SemEval task of extracting keyphrases and relations between
them from scientific documents, which is crucial for understanding which
publications describe which processes, tasks and materials. Although this was a
new task, we had a total of 26 submissions across 3 evaluation scenarios. We
expect the task and the findings reported in this paper to be relevant for
researchers working on understanding scientific content, as well as the broader
knowledge base population and information extraction communities
Semi-Supervised Learning for Neural Keyphrase Generation
We study the problem of generating keyphrases that summarize the key points
for a given document. While sequence-to-sequence (seq2seq) models have achieved
remarkable performance on this task (Meng et al., 2017), model training often
relies on large amounts of labeled data, which is only applicable to
resource-rich domains. In this paper, we propose semi-supervised keyphrase
generation methods by leveraging both labeled data and large-scale unlabeled
samples for learning. Two strategies are proposed. First, unlabeled documents
are first tagged with synthetic keyphrases obtained from unsupervised keyphrase
extraction methods or a selflearning algorithm, and then combined with labeled
samples for training. Furthermore, we investigate a multi-task learning
framework to jointly learn to generate keyphrases as well as the titles of the
articles. Experimental results show that our semi-supervised learning-based
methods outperform a state-of-the-art model trained with labeled data only.Comment: To appear in EMNLP 2018 (12 pages, 7 figures, 6 tables
Extraction of Keyphrases from Text: Evaluation of Four Algorithms
This report presents an empirical evaluation of four algorithms for automatically extracting keywords and keyphrases from documents. The four algorithms are compared using five different collections of documents. For each document, we have a target set of keyphrases, which were generated by hand. The target keyphrases were generated for human readers; they were not tailored for any of the four keyphrase extraction algorithms. Each of the algorithms was evaluated by the degree to which the algorithms keyphrases matched the manually generated keyphrases. The four algorithms were (1) the AutoSummarize feature in Microsofts Word 97, (2) an algorithm based on Eric Brills part-of-speech tagger, (3) the Summarize feature in Veritys Search 97, and (4) NRCs Extractor algorithm. For all five document collections, NRCs Extractor yields the best match with the manually generated keyphrases
Multi-Task Learning of Keyphrase Boundary Classification
Keyphrase boundary classification (KBC) is the task of detecting keyphrases
in scientific articles and labelling them with respect to predefined types.
Although important in practice, this task is so far underexplored, partly due
to the lack of labelled data. To overcome this, we explore several auxiliary
tasks, including semantic super-sense tagging and identification of multi-word
expressions, and cast the task as a multi-task learning problem with deep
recurrent neural networks. Our multi-task models perform significantly better
than previous state of the art approaches on two scientific KBC datasets,
particularly for long keyphrases.Comment: ACL 201
- …