Automatic Keyphrase Extraction

Mataśka, Katarzyna; Zakrzewska, Danuta

Automatic Keyphrase Extraction

Authors: Katarzyna Mataśka
Danuta Zakrzewska
Publication date: 1 January 2006
Publisher: 'Uniwersytetu Marii Curie-Sklodowskiej w Lublinie'
Doi

Abstract

Increasing number of documents in the Web caused the growth of needs for tools supporting automatic search and classification of texts. Keywords are one of characteristic features of documents that may be used as criteria in automatic document management. In the paper we describe the technique for automatic keyphrase extraction based on the KEA algorithm [1]. The main modifications consist in changes in the stemming method and simplification of the discretization technique. Besides, in the presented algorithm the keyphrase list may contain proper names, and the candidate phrase list may contain number sequences. We describe experiments, that were done on the set of English language documents available in the Internet and that allow for optimization of extraction parameters. The comparison of the efficiency of the algorithm with the KEA technique is presented