3 research outputs found
Cascaded cross entropy-based search result diversification
Ankara : The Department of Computer Engineering and the Graduate School of Engineering and Science of Bilkent University, 2012.Thesis (Master's) -- Bilkent University, 2012.Includes bibliographical references leaves 82-86.Search engines are used to find information on the web. Retrieving relevant documents for ambiguous queries based on query-document similarity does not satisfy the users because such queries have more than one different meaning. In this study, a new method, cascaded cross entropy-based search result diversification (CCED), is proposed to list the web pages corresponding to different meanings of the query in higher rank positions. It combines modified reciprocal rank and cross entropy measures to balance the trade-off between query-document relevancy and diversity among the retrieved documents. We use the Latent Dirichlet Allocation (LDA) algorithm to compute query-document relevancy scores. The number of different meanings of an ambiguous query is estimated by complete-link clustering. We construct the first Turkish test collection for result diversification, BILDIV-2012. The performance of CCED is compared with Maximum Marginal Relevance (MMR) and IA-Select algorithms. In this comparison, the Ambient, TREC Diversity Track, and BILDIV-2012 test collections are used. We also compare performance of these algorithms with those of Bing and Google. The results indicate that CCED is the most successful method in terms of satisfying the users interested in different meanings of the query in higher rank positions of the result list.Köroğlu, BilgeM.S
Novelty and Diversity in Retrieval Evaluation
Queries submitted to search engines rarely provide a complete and precise
description of a user's information need.
Most queries are ambiguous to some extent, having multiple interpretations.
For example, the seemingly unambiguous query ``tennis lessons'' might be submitted
by a user interested in attending classes in her neighborhood, seeking lessons
for her child, looking for online videos lessons, or planning to start a business
teaching tennis.
Search engines face the challenging task of satisfying different groups of users
having diverse information needs associated with a given query.
One solution is to optimize ranking functions to satisfy diverse sets of information
needs.
Unfortunately, existing evaluation frameworks do not support such optimization.
Instead, ranking functions are rewarded for satisfying the most likely intent
associated with a given query.
In this thesis, we propose a framework and associated evaluation metrics that are
capable of optimizing ranking functions to satisfy diverse information needs.
Our proposed measures explicitly reward those ranking functions capable of presenting
the user with information that is novel with respect to previously viewed
documents.
Our measures reflects quality of a ranking function by taking into account its
ability to satisfy diverse users submitting a query.
Moreover, the task of identifying and establishing test frameworks to compare
ranking functions on a web-scale can be tedious.
One reason for this problem is the dynamic nature of the web, where documents
are constantly added and updated, making it necessary for search engine developers
to seek additional human assessments.
Along with issues of novelty and diversity, we explore one approximate
approach to compare different ranking functions by overcoming the problem of
lacking complete human assessments.
We demonstrate that our approach is capable of accurately sorting ranking
functions based on their capability of satisfying diverse users, even in the
face of incomplete human assessments