Search CORE

1,085 research outputs found

KERT: Automatic Extraction and Ranking of Topical Keyphrases from Content-Representative Document Titles

Author: Danilevsky Marina
Desai Nihit
Guo Jingyi
Han Jiawei
Wang Chi
Publication venue
Publication date: 02/06/2013
Field of study

We introduce KERT (Keyphrase Extraction and Ranking by Topic), a framework for topical keyphrase generation and ranking. By shifting from the unigram-centric traditional methods of unsupervised keyphrase extraction to a phrase-centric approach, we are able to directly compare and rank phrases of different lengths. We construct a topical keyphrase ranking function which implements the four criteria that represent high quality topical keyphrases (coverage, purity, phraseness, and completeness). The effectiveness of our approach is demonstrated on two collections of content-representative titles in the domains of Computer Science and Physics.Comment: 9 page

arXiv.org e-Print Archive

CiteSeerX

Evaluating anaphora and coreference resolution to improve automatic keyphrase extraction

Author: Basaldella Marco
Chiaradia Giorgia
Tasso Carlo
Publication venue: country:JPN
Publication date: 01/01/2016
Field of study

In this paper we analyze the effectiveness of using linguistic knowledge from coreference and anaphora resolution for improving the performance for supervised keyphrase extraction. In order to verify the impact of these features, we de\ufb01ne a baseline keyphrase extraction system and evaluate its performance on a standard dataset using different machine learning algorithms. Then, we consider new sets of features by adding combinations of the linguistic features we propose and we evaluate the new performance of the system. We also use anaphora and coreference resolution to transform the documents, trying to simulate the cohesion process performed by the human mind. We found that our approach has a slightly positive impact on the performance of automatic keyphrase extraction, in particular when considering the ranking of the results

Archivio istituzionale della ricerca - Università degli Studi di Udine

Creation and evaluation of large keyphrase extraction collections with multiple opinions

Author: Deleu Johannes
Demeester Thomas
Develder Chris
Sterckx Lucas
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

While several automatic keyphrase extraction (AKE) techniques have been developed and analyzed, there is little consensus on the definition of the task and a lack of overview of the effectiveness of different techniques. Proper evaluation of keyphrase extraction requires large test collections with multiple opinions, currently not available for research. In this paper, we (i) present a set of test collections derived from various sources with multiple annotations (which we also refer to as opinions in the remained of the paper) for each document, (ii) systematically evaluate keyphrase extraction using several supervised and unsupervised AKE techniques, (iii) and experimentally analyze the effects of disagreement on AKE evaluation. Our newly created set of test collections spans different types of topical content from general news and magazines, and is annotated with multiple annotations per article by a large annotator panel. Our annotator study shows that for a given document there seems to be a large disagreement on the preferred keyphrases, suggesting the need for multiple opinions per document. A first systematic evaluation of ranking and classification of keyphrases using both unsupervised and supervised AKE techniques on the test collections shows a superior effectiveness of supervised models, even for a low annotation effort and with basic positional and frequency features, and highlights the importance of a suitable keyphrase candidate generation approach. We also study the influence of multiple opinions, training data and document length on evaluation of keyphrase extraction. Our new test collection for keyphrase extraction is one of the largest of its kind and will be made available to stimulate future work to improve reliable evaluation of new keyphrase extractors

Ghent University Academic Bibliography

DivGraphPointer: A Graph Pointer Network for Extracting Diverse Keyphrases

Author: Bougouin Adrien
Frank Eibe
Glorot Xavier
Ioffe Sergey
Kim Su Nam
Kim Youngsam
Kingma Diederik
Mihalcea Rada
Mikolov Tomáš
Qazvinian Vahed
Wan Xiaojun
Wang Yining
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 19/05/2019
Field of study

Keyphrase extraction from documents is useful to a variety of applications such as information retrieval and document summarization. This paper presents an end-to-end method called DivGraphPointer for extracting a set of diversified keyphrases from a document. DivGraphPointer combines the advantages of traditional graph-based ranking methods and recent neural network-based approaches. Specifically, given a document, a word graph is constructed from the document based on word proximity and is encoded with graph convolutional networks, which effectively capture document-level word salience by modeling long-range dependency between words in the document and aggregating multiple appearances of identical words into one node. Furthermore, we propose a diversified point network to generate a set of diverse keyphrases out of the word graph in the decoding process. Experimental results on five benchmark data sets show that our proposed method significantly outperforms the existing state-of-the-art approaches.Comment: Accepted to SIGIR 201

arXiv.org e-Print Archive

Crossref

Semi-Supervised Learning for Neural Keyphrase Generation

Author: Wang Lu
Ye Hai
Publication venue
Publication date: 01/01/2018
Field of study

We study the problem of generating keyphrases that summarize the key points for a given document. While sequence-to-sequence (seq2seq) models have achieved remarkable performance on this task (Meng et al., 2017), model training often relies on large amounts of labeled data, which is only applicable to resource-rich domains. In this paper, we propose semi-supervised keyphrase generation methods by leveraging both labeled data and large-scale unlabeled samples for learning. Two strategies are proposed. First, unlabeled documents are first tagged with synthetic keyphrases obtained from unsupervised keyphrase extraction methods or a selflearning algorithm, and then combined with labeled samples for training. Furthermore, we investigate a multi-task learning framework to jointly learn to generate keyphrases as well as the titles of the articles. Experimental results show that our semi-supervised learning-based methods outperform a state-of-the-art model trained with labeled data only.Comment: To appear in EMNLP 2018 (12 pages, 7 figures, 6 tables

arXiv.org e-Print Archive

Crossref