Article thumbnail
Location of Repository

Mining positive and negative patterns for relevance feature discovery

By Yuefeng Li, Abdulmohsen Algarni and Ning Zhong

Abstract

It is a big challenge to guarantee the quality of discovered relevance features in text documents for describing user preferences because of the large number of terms, patterns, and noise. Most existing popular text mining and classification methods have adopted term-based approaches. However, they have all suffered from the problems of polysemy and synonymy. Over the years, people have often held the hypothesis that pattern-based methods should perform better than term-based ones in describing user preferences, but many experiments do not support this hypothesis. The innovative technique presented in paper makes a breakthrough for this difficulty. This technique discovers both positive and negative patterns in text documents as higher level features in order to accurately weight low-level features (terms) based on their specificity and their distributions in the higher level features. Substantial experiments using this technique on Reuters Corpus Volume 1 and TREC topics show that the proposed approach significantly outperforms both the state-of-the-art term-based methods underpinned by Okapi BM25, Rocchio or Support Vector Machine and pattern based methods on precision, recall and F measures

Topics: 080600 INFORMATION SYSTEMS, user preferences, text mining, polysemy, synonymy
Publisher: ACM
Year: 2010
DOI identifier: 10.1145/1835804.1835900
OAI identifier: oai:eprints.qut.edu.au:42068
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://purl.org/au-research/gr... (external link)
  • https://eprints.qut.edu.au/420... (external link)
  • https://eprints.qut.edu.au/420... (external link)
  • Suggested articles


    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.