'Uniwersytetu Marii Curie-Sklodowskiej w Lublinie'
Doi
Abstract
Increasing number of documents in the Web caused the growth of needs for tools supporting automatic search and classification of texts. Keywords are one of characteristic features of documents that may be used as criteria in automatic document management. In the paper we describe the technique for automatic keyphrase extraction based on the KEA algorithm [1]. The main modifications consist in changes in the stemming method and simplification of the discretization technique. Besides, in the presented algorithm the keyphrase list may contain proper names, and the candidate phrase list may contain number sequences. We describe experiments, that were done on the set of English language documents available in the Internet and that allow for optimization of extraction parameters. The comparison of the efficiency of the algorithm with the KEA technique is presented