Search CORE

28 research outputs found

Predictive caching and prefetching of query results in search engines

Author: Ronny Lempel
Shlomo Moran
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2003
Field of study

We study the caching of query result pages in Web search engines. Popular search engines receive millions of queries per day, and ecient policies for caching query results may enable them to lower their response time and reduce their hardware requirements. We present PDC (probability driven cache), a novel scheme tailored for caching search results, that is based on a probabilistic model of search engine users. We then use a trace of over seven million queries submitted to the search engine AltaVista to evaluate PDC, as well as traditional LRU and SLRU based caching schemes. The trace driven simulations show that PDC outperforms the other policies. We also examine the prefetching of search results, and demonstrate that prefetching can increase cache hit ratios by 50% for large caches, and can double the hit ratios of small caches. When integrating prefetching into PDC, we attain hit ratios of over 0:53.

CiteSeerX

Crossref

Predictive caching and prefetching of query results in search engines

Author
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2003
Field of study

Crossref

Procesamiento de consultas en motores de búsqueda: diseño y evaluación en términos de consumo de energía

Author: Gil Costa Graciela Verónica
Marín Mauricio
Publication venue
Publication date: 13/08/2012
Field of study

Actualmente los centros de datos accedidos por los buscadores web junto con las computadoras personales consumen el 10% de la energía mundial, y de ese porcentaje aproximadamente el 2% es consumido sólo por los buscadores y sus centros de datos. Sin embargo, es de esperar que en los próximos años estos porcentajes se incrementen en un 30% o 40% debido a que el tamaño de la Web tiende a duplicarse cada ocho meses, la cantidad de usuarios que se conectan a ésta sigue creciendo y los buscadores satisfacen la creciente demanda incrementando el hardware utilizado. En este trabajo se presentan los objetivos y los desafíos de una línea de investigación que abarca los problemas de consumo de energía que deben solucionar actualmente los grandes centros de cómputos y de datos, en particular los buscadores Web.Eje: Procesamiento distribuido y paraleloRed de Universidades con Carreras en Informática (RedUNCI

Servicio de Difusión de la Creación Intelectual

Procesamiento de consultas en motores de búsqueda: diseño y evaluación en términos de consumo de energía

Author: Gil Costa Graciela Verónica
Marín Mauricio
Publication venue
Publication date: 01/05/2010
Field of study

The egalitarian effect of search engines

Author: A. Flammini
A. Vespignani
Barabasi
F. Menczer
Fortunato
Kleinberg
Kleinberg
Krapivsky
Lawrence
Lawrence
Pennock
S. Fortunato
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 23/08/2006
Field of study

Search engines have become key media for our scientific, economic, and social activities by enabling people to access information on the Web in spite of its size and complexity. On the down side, search engines bias the traffic of users according to their page-ranking strategies, and some have argued that they create a vicious cycle that amplifies the dominance of established and already popular sites. We show that, contrary to these prior claims and our own intuition, the use of search engines actually has an egalitarian effect. We reconcile theoretical arguments with empirical evidence showing that the combination of retrieval by search engines and search behavior by users mitigates the attraction of popular pages, directing more traffic toward less popular sites, even in comparison to what would be expected from users randomly surfing the Web.Comment: 9 pages, 8 figures, 2 appendices. The final version of this e-print has been published on the Proc. Natl. Acad. Sci. USA 103(34), 12684-12689 (2006), http://www.pnas.org/cgi/content/abstract/103/34/1268

arXiv.org e-Print Archive

Crossref

Distribution and Use of Knowledge under the “Laws of the Web”

Author: Josef Falkinger
Publication venue
Publication date
Field of study

Empirical evidence shows that the perception of information is strongly concentrated in those environments in which a mass of producers and users of knowledge interact through a distribution medium. This paper considers the consequences of this fact for economic equilibrium analysis. In particular, it examines how the ranking schemes applied by the distribution technology affect the use of knowledge, and it then describes the characteristics of an optimal ranking scheme. The analysis is carried out using a model in which agents’ productivity is based on the stock of knowledge used. The value of a piece of information is assessed in terms of its contribution to productivity.global rankings, information and internet services, limited attention, diversity, knowledge society

Research Papers in Economics

Shuffling a Stacked Deck: The Case for Partially Randomized Ranking of Search Engine Results

Author: Chakrabarti Soumen
Cho Junghoo
Olston Christopher
Pandey Sandeep
Roy Sourashis
Publication venue
Publication date: 01/01/2005
Field of study

In-degree, PageRank, number of visits and other measures of Web page popularity significantly influence the ranking of search results by modern search engines. The assumption is that popularity is closely correlated with quality, a more elusive concept that is difficult to measure directly. Unfortunately, the correlation between popularity and quality is very weak for newly-created pages that have yet to receive many visits and/or in-links. Worse, since discovery of new content is largely done by querying search engines, and because users usually focus their attention on the top few results, newly-created but high-quality pages are effectively ``shut out,'' and it can take a very long time before they become popular. We propose a simple and elegant solution to this problem: the introduction of a controlled amount of randomness into search result ranking methods. Doing so offers new pages a chance to prove their worth, although clearly using too much randomness will degrade result quality and annul any benefits achieved. Hence there is a tradeoff between exploration to estimate the quality of new pages and exploitation of pages already known to be of high quality. We study this tradeoff both analytically and via simulation, in the context of an economic objective function based on aggregate result quality amortized over time. We show that a modest amount of randomness leads to improved search results

arXiv.org e-Print Archive

CiteSeerX

Query-driven document partitioning and collection selection

Author: Domenico Laforenza
Fabrizio Silvestri
Publication venue
Publication date: 01/01/2006
Field of study

Abstract — We present a novel strategy to partition a document collection onto several servers and to perform effective collection selection. The method is based on the analysis of query logs. We proposed a novel document representation called query-vectors model. Each document is represented as a list recording the queries for which the document itself is a match, along with their ranks. To both partition the collection and build the collection selection function, we co-cluster queries and documents. The document clusters are then assigned to the underlying IR servers, while the query clusters represent queries that return similar results, and are used for collection selection. We show that this document partition strategy greatly boosts the performance of standard collection selection algorithms, including CORI, w.r.t. a round-robin assignment. Secondly, we show that performing collection selection by matching the query to the existing query clusters and successively choosing only one server, we reach an average precision-at-5 up to 1.74 and we constantly improve CORI precision of a factor between 11 % and 15%. As a side result we show a way to select rarely asked-for documents. Separating these documents from the rest of the collection allows the indexer to produce a more compact index containing only relevant documents that are likely to be requested in the future. In our tests, around 52 % of the documents (3,128,366) are not returned among the first 100 top-ranked results of any query. I

CiteSeerX

Multi-Faceted Search and Navigation of Biological Databases

Author: Mahoui M.
Oklak M.
Perumal. N
Publication venue: 'IntechOpen'
Publication date: 23/08/2011
Field of study

IntechOpen

Crossref

Diversity of Online Community Activities

Author: Anderson C.
G. Szabo
Guha R.
Hogg T.
James A. Plank M. J.
Mitzenmacher M.
Sornette D.
T. Hogg
Publication venue: 'IOP Publishing'
Publication date: 24/03/2008
Field of study

Web sites where users create and rate content as well as form networks with other users display long-tailed distributions in many aspects of behavior. Using behavior on one such community site, Essembly, we propose and evaluate plausible mechanisms to explain these behaviors. Unlike purely descriptive models, these mechanisms rely on user behaviors based on information available locally to each user. For Essembly, we find the long-tails arise from large differences among user activity rates and qualities of the rated content, as well as the extensive variability in the time users devote to the site. We show that the models not only explain overall behavior but also allow estimating the quality of content from their early behaviors.Comment: 14 page

arXiv.org e-Print Archive

Crossref

EDP Sciences OAI-PMH repository (1.2.0)