50 research outputs found

    Static index pruning in web search engines

    Get PDF
    Static index pruning techniques permanently remove a presumably redundant part of an inverted file, to reduce the file size and query processing time. These techniques differ in deciding which parts of an index can be removed safely; that is, without changing the top-ranked query results. As defined in the literature, the query view of a document is the set of query terms that access to this particular document, that is, retrieves this document among its top results. In this paper, we first propose using query views to improve the quality of the top results compared against the original results. We incorporate query views in a number of static pruning strategies, namely term-centric, document-centric, term popularity based and document access popularity based approaches, and show that the new strategies considerably outperform their counterparts especially for the higher levels of pruning and for both disjunctive and conjunctive query processing. Additionally, we combine the notions of term and document access popularity to form new pruning strategies, and further extend these strategies with the query views. The new strategies improve the result quality especially for the conjunctive query processing, which is the default and most common search mode of a search engine

    Evolution of web search results within years

    Full text link
    We provide a first large-scale analysis of the evolution of query results obtained from a real search engine at two distant points in time, namely, in 2007 and 2010, for a set of 630,000 real queries

    HIV/AIDS, demography and development: individual choices versus public policies in SSA

    Get PDF
    Despite the increasing rate of diffusion of effective therapies, the battle against HIV/AIDS in Sub-Saharan Africa (SSA) is far from being over. Three main challenges are that the epidemics might paralyse or reverse the fertility transition, the expansion of the resources needed to finance the fight against HIV, and the emerging resistance to anti-retroviral treatments. This research proposes a UGT-like model showing the complexity of the interplay amongst the (macro)economy, the epidemics, their endogenous feedback on mortality and fertility and the central role of policy actions aimed to fight HIV. The disease-induced increase in adult mortality can hamper economic development by its upward pressure on the precautionary demand for children and downward pressure on education. This can dramatically reduce physical and human capital accumulation

    A Cost-Aware Strategy for Query Result Caching in Web Search Engines

    Get PDF
    Search engines and large scale IR systems need to cache query results for efficiency and scalability purposes. In this study, we propose to explicitly incorporate the query costs in the static caching policy. To this end, a query’s cost is represented by its execution time, which involves CPU time to decompress the postings and compute the query-document similarities to obtain the final top-N answers. Simulation results using a large Web crawl data and a real query log reveal that the proposed strategy improves overall system performance in terms of the total query execution time

    Static query result caching revisited

    No full text
    Query result caching is an important mechanism for search engine efficiency. In this study, we first review several query features that are used to determine the contents of a static result cache. Next, we introduce a new feature that more accurately represents the popularity of a query by measuring the stability of query frequency over a set of time intervals. Experimental results show that this new feature achieves hit ratios better than those of the previously proposed features

    Space efficient caching of query results in search engines

    No full text
    Web search engines serve millions of query requests per day. Caching query results is one of the most crucial mechanisms to cope with such a demanding load. In this paper, we propose an efficient storage model to cache document identifiers of query results. Essentially, we first cluster queries that have common result documents. Next, for each cluster, we attempt to store those common document identifiers in a more compact manner. Experimental results reveal that the proposed storage model achieves space reduction of up to 4%. The proposed model is envisioned to improve the cache hit rate and system throughput as it allows storing more query results within a particular cache space, in return to a negligible increase in the cost of preparing the final query result page

    A practitioner’s guide for static index pruning

    No full text
    Abstract. We compare the term- and document-centric static index pruning approaches as described in the literature and investigate their sensitivity to the scoring functions employed during the pruning and actual retrieval stages. 1 Static Inverted Index Pruning Static index pruning permanently removes some information from the index, for the purposes of utilizing the disk space and improving query processing efficiency. In the literature, several approaches are investigated for the static index pruning techniques. Among those methods, the term-centric pruning (referred to as TCP hereafter) proposed in [3] is shown to be very successful at keeping the top-k (k≤30) answers almost unchanged for the queries while significantly reducing the index size. In a nutshell, TCP scores (using the Smart’s TFIDF function) and sorts the postings of each term in the collection and removes the tail of the list according to some decision criteria. In [1], instead of the TFIDF function, BM25 is employed during the pruning and retrieval stages. In that study, it’s shown that by tuning the pruning algorithm according to the score function, it is possible to further boost the performance
    corecore