unknown

Systematic optimization of search engines for difficult queries

Abstract

With the advent of Web, text information is being generated across the globe at an unfathomable rate and covering countless topics. This dramatic growth in text information and the increasing number of ways people can utilize it has influenced our daily lives in fundamental and profound ways. The widespread and multi-purposed use of search engines is one example of how transformed our lives have become in relation to text information. Although the vast majority of people's information needs can be served very successfully by current search engines, there is still a considerable number of queries that even the best search engines perform poorly on. In this dissertation, we propose a method of optimizing a search engine to better handle such “difficult queries”, which addresses issues and opportunities at three different stages of an interactive search. Specifically, we propose to improve search quality for difficult queries by: (1) Bridging the vocabulary gap by defining a semantic smoothing language model. A query can be difficult because it does not contain the optimal choice of related terms, or lacks sufficient discriminative terms. In the pre-retrieval stage, using semantic smoothing during query formulation can mitigate the effect of those omissions. (2) Incorporating user negative feedback, i.e., improving a search engine by learning from user feedback in the post-retrieval stage. When a query is so difficult that all the top retrieved results (e.g., top 10) are completely irrelevant, the feedback that a user can provide is solely negative. We propose a generalized optimization framework to learn from feedback on non-relevant documents to prune extensively, but carefully, a large number of non-relevant documents from the top of the ranking list. (3) Balancing priorities when learning from user interaction in order to optimize the whole session. When presenting the search results to the user, there is a tradeoff between promoting those with the highest immediate utility and promoting those with the best potential for collecting feedback information, which can be used to better serve the user over the course of the session (interactive search stage). We frame this tradeoff as a problem of optimizing the diversification of search results, and we propose a machine learning approach that adaptively optimizes each individual user query such that we maximize the overall utility of the entire session. In summary, this dissertation is expected to advance the state of the art of search engines by providing a suite of novel search algorithms, which use the listed approaches to improve a search engine's ability to handle difficult queries

    Similar works