Skip to main content
Article thumbnail
Location of Repository

Building Bridges for Web Query Classification

By Dou Shen, Jian-Tao Sun, Qiang Yang and Zheng Chen

Abstract

Web query classification (QC) aims to classify Web users’ queries, which are often short and ambiguous, into a set of target categories. QC has many applications including page ranking in Web search, targeted advertisement in response to queries, and personalization. In this paper, we present a novel approach for QC that outperforms the winning solution of the ACM KDDCUP 2005 competition, whose objective is to classify 800,000 real user queries. In our approach, we first build a bridging classifier on an intermediate taxonomy in an offline mode. This classifier is then used in an online mode to map user queries to the target categories via the above intermediate taxonomy. A major innovation is that by leveraging the similarity distribution over the intermediate taxonomy, we do not need to retrain a new classifier for each new set of target categories, and therefore the bridging classifier needs to be trained only once. In addition, we introduce category selection as a new method for narrowing down the scope of the intermediate taxonomy based on which we classify the queries. Category selection can improve both efficiency and effectiveness of the online classification. By combining our algorithm with the winning solution of KDDCUP 2005, we made an improvement by 9.7 % and 3.8 % in terms of precision and F1 respectively compared with the best results of KDDCUP 2005

Topics: General Terms Algorithms, Experimentation Keywords Web Query Classification
Year: 2006
OAI identifier: oai:CiteSeerX.psu:10.1.1.135.6176
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://citeseerx.ist.psu.edu/v... (external link)
  • http://www.cs.ust.hk/~qyang/Do... (external link)
  • Suggested articles


    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.