Search CORE

2,728 research outputs found

KERT: Automatic Extraction and Ranking of Topical Keyphrases from Content-Representative Document Titles

Author: Danilevsky Marina
Desai Nihit
Guo Jingyi
Han Jiawei
Wang Chi
Publication venue
Publication date: 02/06/2013
Field of study

We introduce KERT (Keyphrase Extraction and Ranking by Topic), a framework for topical keyphrase generation and ranking. By shifting from the unigram-centric traditional methods of unsupervised keyphrase extraction to a phrase-centric approach, we are able to directly compare and rank phrases of different lengths. We construct a topical keyphrase ranking function which implements the four criteria that represent high quality topical keyphrases (coverage, purity, phraseness, and completeness). The effectiveness of our approach is demonstrated on two collections of content-representative titles in the domains of Computer Science and Physics.Comment: 9 page

arXiv.org e-Print Archive

CiteSeerX

Thesaurus based automatic keyphrase indexing

Author: Medelyan Olena
Witten Ian H.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2006
Field of study

We propose a new method that enhances automatic keyphrase extraction by using semantic information on terms and phrases gleaned from a domain-specific thesaurus. We evaluate the results against keyphrase sets assigned by a state-of-the-art keyphrase extraction system and those assigned by six professional indexers

Crossref

Research Commons@Waikato

DivGraphPointer: A Graph Pointer Network for Extracting Diverse Keyphrases

Author: Bougouin Adrien
Frank Eibe
Glorot Xavier
Ioffe Sergey
Kim Su Nam
Kim Youngsam
Kingma Diederik
Mihalcea Rada
Mikolov Tomáš
Qazvinian Vahed
Wan Xiaojun
Wang Yining
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 19/05/2019
Field of study

Keyphrase extraction from documents is useful to a variety of applications such as information retrieval and document summarization. This paper presents an end-to-end method called DivGraphPointer for extracting a set of diversified keyphrases from a document. DivGraphPointer combines the advantages of traditional graph-based ranking methods and recent neural network-based approaches. Specifically, given a document, a word graph is constructed from the document based on word proximity and is encoded with graph convolutional networks, which effectively capture document-level word salience by modeling long-range dependency between words in the document and aggregating multiple appearances of identical words into one node. Furthermore, we propose a diversified point network to generate a set of diverse keyphrases out of the word graph in the decoding process. Experimental results on five benchmark data sets show that our proposed method significantly outperforms the existing state-of-the-art approaches.Comment: Accepted to SIGIR 201

arXiv.org e-Print Archive

Crossref

Coherent Keyphrase Extraction via Web Mining

Author: Turney Peter D.
Publication venue
Publication date: 01/01/2003
Field of study

Keyphrases are useful for a variety of purposes, including summarizing, indexing, labeling, categorizing, clustering, highlighting, browsing, and searching. The task of automatic keyphrase extraction is to select keyphrases from within the text of a given document. Automatic keyphrase extraction makes it feasible to generate keyphrases for the huge number of documents that do not have manually assigned keyphrases. A limitation of previous keyphrase extraction algorithms is that the selected keyphrases are occasionally incoherent. That is, the majority of the output keyphrases may fit together well, but there may be a minority that appear to be outliers, with no clear semantic relation to the majority or to each other. This paper presents enhancements to the Kea keyphrase extraction algorithm that are designed to increase the coherence of the extracted keyphrases. The approach is to use the degree of statistical association among candidate keyphrases as evidence that they may be semantically related. The statistical association is measured using web mining. Experiments demonstrate that the enhancements improve the quality of the extracted keyphrases. Furthermore, the enhancements are not domain-specific: the algorithm generalizes well when it is trained on one domain (computer science documents) and tested on another (physics documents).Comment: 6 pages, related work available at http://purl.org/peter.turney

arXiv.org e-Print Archive

CiteSeerX

NRC Publications Archive

CogPrints Cognitive Sciences Eprint Archive

Semi-Supervised Learning for Neural Keyphrase Generation

Author: Wang Lu
Ye Hai
Publication venue
Publication date: 01/01/2018
Field of study

We study the problem of generating keyphrases that summarize the key points for a given document. While sequence-to-sequence (seq2seq) models have achieved remarkable performance on this task (Meng et al., 2017), model training often relies on large amounts of labeled data, which is only applicable to resource-rich domains. In this paper, we propose semi-supervised keyphrase generation methods by leveraging both labeled data and large-scale unlabeled samples for learning. Two strategies are proposed. First, unlabeled documents are first tagged with synthetic keyphrases obtained from unsupervised keyphrase extraction methods or a selflearning algorithm, and then combined with labeled samples for training. Furthermore, we investigate a multi-task learning framework to jointly learn to generate keyphrases as well as the titles of the articles. Experimental results show that our semi-supervised learning-based methods outperform a state-of-the-art model trained with labeled data only.Comment: To appear in EMNLP 2018 (12 pages, 7 figures, 6 tables

arXiv.org e-Print Archive

Crossref