Web query classification (QC) aims to classify Web users’ queries, which are often short and ambiguous, into a set of target categories. QC has many applications including page ranking in Web search, targeted advertisement in response to queries, and personalization. In this paper, we present a novel approach for QC that outperforms the winning solution of the ACM KDDCUP 2005 competition, whose objective is to classify 800,000 real user queries. In our approach, we first build a bridging classifier on an intermediate taxonomy in an offline mode. This classifier is then used in an online mode to map user queries to the target categories via the above intermediate taxonomy. A major innovation is that by leveraging the similarity distribution over the intermediate taxonomy, we do not need to retrain a new classifier for each new set of target categories, and therefore the bridging classifier needs to be trained only once. In addition, we introduce category selection as a new method for narrowing down the scope of the intermediate taxonomy based on which we classify the queries. Category selection can improve both efficiency and effectiveness of the online classification. By combining our algorithm with the winning solution of KDDCUP 2005, we made an improvement by 9.7 % and 3.8 % in terms of precision and F1 respectively compared with the best results of KDDCUP 2005
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.