1 research outputs found
Keywords lie far from the mean of all words in local vector space
Keyword extraction is an important document process that aims at finding a
small set of terms that concisely describe a document's topics. The most
popular state-of-the-art unsupervised approaches belong to the family of the
graph-based methods that build a graph-of-words and use various centrality
measures to score the nodes (candidate keywords). In this work, we follow a
different path to detect the keywords from a text document by modeling the main
distribution of the document's words using local word vector representations.
Then, we rank the candidates based on their position in the text and the
distance between the corresponding local vectors and the main distribution's
center. We confirm the high performance of our approach compared to strong
baselines and state-of-the-art unsupervised keyword extraction methods, through
an extended experimental study, investigating the properties of the local
representations