5 research outputs found

    Efficient Diversification of Web Search Results

    Full text link
    In this paper we analyze the efficiency of various search results diversification methods. While efficacy of diversification approaches has been deeply investigated in the past, response time and scalability issues have been rarely addressed. A unified framework for studying performance and feasibility of result diversification solutions is thus proposed. First we define a new methodology for detecting when, and how, query results need to be diversified. To this purpose, we rely on the concept of "query refinement" to estimate the probability of a query to be ambiguous. Then, relying on this novel ambiguity detection method, we deploy and compare on a standard test set, three different diversification methods: IASelect, xQuAD, and OptSelect. While the first two are recent state-of-the-art proposals, the latter is an original algorithm introduced in this paper. We evaluate both the efficiency and the effectiveness of our approach against its competitors by using the standard TREC Web diversification track testbed. Results shown that OptSelect is able to run two orders of magnitude faster than the two other state-of-the-art approaches and to obtain comparable figures in diversification effectiveness.Comment: VLDB201

    The Impact of Novel Computing Architectures on Large-Scale Distributed Web Information Retrieval Systems

    Get PDF
    Web search engines are the most popular mean of interaction with the Web. Realizing a search engine which scales even to such issues presents many challenges. Fast crawling technology is needed to gather the Web documents. Indexing has to process hundreds of gigabytes of data efficiently. Queries have to be handled quickly, at a rate of thousands per second. As a solution, within a datacenter, services are built up from clusters of common homogeneous PCs. However, Information Retrieval (IR) has to face issues raised by the growing amount of Web data, as well as the number of new users. In response to these issues, cost-effective specialized hardware is available nowadays. In our opinion, this hardware is ideal for migrating distributed IR systems to computer clusters comprising heterogeneous processors in order to respond their need of computing power. Toward this end, we introduce K-model, a computational model to properly evaluate algorithms designed for such hardware. We study the impact of K-model rules on algorithm design. To evaluate the benefits of using K-model in evaluating algorithms, we compare the complexity of a solution built using our properly designed techniques, and the existing ones. Although in theory competitors are more efficient than us, empirically, K-model is able to prove because our solutions have been shown to be faster than the state-of-the-art implementations

    Efficient Diversification of Search Results using Query Logs

    No full text
    We study the problem of diversifying search results by exploiting the knowledge mined from query logs. Our proposal exploits the presence of different “specializations ” of queries in query logs to detect the submission of ambiguous/faceted queries, and manage them by diversifying the search results returned in order to cover the different possible interpretations of the query. We present an original formulation of the results diversification problem in terms of an objective function to be maximized that admits the finding of an optimal solution in linear time

    Efficient diversification of search results using query logs

    No full text
    We study the problem of diversifying search results by exploiting the knowledge mined from query logs. Our proposal exploits the presence of different "specializations" of queries in query logs to detect the submission of ambiguous/faceted queries, and manage them by diversifying the search results returned in order to cover the different possible interpretations of the query. We present an original formulation of the results diversification problem in terms of an objective function to be maximized that admits the finding of an optimal solution in linear time. © 2011 Authors

    Query Log Mining to Enhance User Experience in Search Engines

    Get PDF
    The Web is the biggest repository of documents humans have ever built. Even more, it is increasingly growing in size every day. Users rely on Web search engines (WSEs) for finding information on the Web. By submitting a textual query expressing their information need, WSE users obtain a list of documents that are highly relevant to the query. Moreover, WSEs tend to store such huge amount of users activities in "query logs". Query log mining is the set of techniques aiming at extracting valuable knowledge from query logs. This knowledge represents one of the most used ways of enhancing the users’ search experience. According to this vision, in this thesis we firstly prove that the knowledge extracted from query logs suffer aging effects and we thus propose a solution to this phenomenon. Secondly, we propose new algorithms for query recommendation that overcome the aging problem. Moreover, we study new query recommendation techniques for efficiently producing recommendations for rare queries. Finally, we study the problem of diversifying Web search engine results. We define a methodology based on the knowledge derived from query logs for detecting when and how query results need to be diversified and we develop an efficient algorithm for diversifying search results
    corecore