Keyphrase extraction by synonym analysis of n-grams for e-journals categorisation

Hussey, Richard; Williams, Shirley; Mitchell, Richard

research

oai:centaur.reading.ac.uk:19584

Keyphrase extraction by synonym analysis of n-grams for e-journals categorisation

Authors: Richard Hussey
Shirley Williams
Richard Mitchell
Publication date: 1 February 2011
Publisher

Abstract

Automatic keyword or keyphrase extraction is concerned with assigning keyphrases to documents based on words from within the document. Previous studies have shown that in a significant number of cases author-supplied keywords are not appropriate for the document to which they are attached. This can either be because they represent what the author believes the paper is about not what it actually is, or because they include keyphrases which are more classificatory than explanatory e.g., “University of Poppleton” instead of “Knowledge Discovery in Databases”. Thus, there is a need for a system that can generate appropriate and diverse range of keyphrases that reflect the document. This paper proposes a solution that examines the synonyms of words and phrases in the document to find the underlying themes, and presents these as appropriate keyphrases. The primary method explores taking n-grams of the source document phrases, and examining the synonyms of these, while the secondary considers grouping outputs by their synonyms. The experiments undertaken show the primary method produces good results and that the secondary method produces both good results and potential for future work

Similar works

Full text

Open in the Core reader

Download PDF

Central Archive at the University of Reading

oai:centaur.reading.ac.uk:1958...

Last time updated on 01/07/2012Provided by our Sustaining member

This paper was published in Central Archive at the University of Reading.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.