Search CORE

3,645 research outputs found

A financial cost metric for result caching

Author: Altingovde I.S.
Cambazoglu B.B.
Ozcan R.
Sazoglu F.B.
Ulusoy Ö.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2013
Field of study

Web search engines cache results of frequent and/or recent queries. Result caching strategies can be evaluated using different metrics, hit rate being the most well-known. Recent works take the processing overhead of queries into account when evaluating the performance of result caching strategies and propose cost-aware caching strategies. In this paper, we propose a financial cost metric that goes one step beyond and takes also the hourly electricity prices into account when computing the cost. We evaluate the most well-known static, dynamic, and hybrid result caching strategies under this new metric. Moreover, we propose a financial-cost-aware version of the well-known LRU strategy and show that it outperforms the original LRU strategy in terms of the financial cost metric. Copyright © 2013 ACM

An Optimal Trade-off between Content Freshness and Refresh Cost

Author: Cho
Cohen
Jie Mi
Lee
Notess
Ross
Wessels
Yibei Ling
Publication venue: 'Applied Probability Trust'
Publication date: 02/08/2010
Field of study

Caching is an effective mechanism for reducing bandwidth usage and alleviating server load. However, the use of caching entails a compromise between content freshness and refresh cost. An excessive refresh allows a high degree of content freshness at a greater cost of system resource. Conversely, a deficient refresh inhibits content freshness but saves the cost of resource usages. To address the freshness-cost problem, we formulate the refresh scheduling problem with a generic cost model and use this cost model to determine an optimal refresh frequency that gives the best tradeoff between refresh cost and content freshness. We prove the existence and uniqueness of an optimal refresh frequency under the assumptions that the arrival of content update is Poisson and the age-related cost monotonically increases with decreasing freshness. In addition, we provide an analytic comparison of system performance under fixed refresh scheduling and random refresh scheduling, showing that with the same average refresh frequency two refresh schedulings are mathematically equivalent in terms of the long-run average cost

arXiv.org e-Print Archive

I Know Why You Went to the Clinic: Risks and Realization of HTTPS Traffic Analysis

Author: Huang Ling
Joseph A. D.
Miller Brad
Tygar J. D.
Publication venue
Publication date: 01/01/2014
Field of study

Revelations of large scale electronic surveillance and data mining by governments and corporations have fueled increased adoption of HTTPS. We present a traffic analysis attack against over 6000 webpages spanning the HTTPS deployments of 10 widely used, industry-leading websites in areas such as healthcare, finance, legal services and streaming video. Our attack identifies individual pages in the same website with 89% accuracy, exposing personal details including medical conditions, financial and legal affairs and sexual orientation. We examine evaluation methodology and reveal accuracy variations as large as 18% caused by assumptions affecting caching and cookies. We present a novel defense reducing attack accuracy to 27% with a 9% traffic increase, and demonstrate significantly increased effectiveness of prior defenses in our evaluation context, inclusive of enabled caching, user-specific cookies and pages within the same website

arXiv.org e-Print Archive

CiteSeerX

Second chance: A hybrid approach for dynamic result caching and prefetching in search engines

Author: Altingovde I. S.
Cambazoglu B. B.
Ozcan R.
Ulusoy O.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/12/2013
Field of study

Cataloged from PDF version of article.Web search engines are known to cache the results of previously issued queries. The stored results typically contain the document summaries and some data that is used to construct the final search result page returned to the user. An alternative strategy is to store in the cache only the result document IDs, which take much less space, allowing results of more queries to be cached. These two strategies lead to an interesting trade-off between the hit rate and the average query response latency. In this work, in order to exploit this trade-off, we propose a hybrid result caching strategy where a dynamic result cache is split into two sections: an HTML cache and a docID cache. Moreover, using a realistic cost model, we evaluate the performance of different result prefetching strategies for the proposed hybrid cache and the baseline HTML-only cache. Finally, we propose a machine learning approach to predict singleton queries, which occur only once in the query stream. We show that when the proposed hybrid result caching strategy is coupled with the singleton query predictor, the hit rate is further improved. © 2013 ACM

Web Replica Hosting Systems

Author: Pierre G.
Sivasubramanian S.
Szymaniak M.
Publication venue
Publication date: 01/01/2004
Field of study

Replication for Web Hosting Systems

Replication is a well-known technique to improve the accessibility of Web sites. It generally offers reduced client latencies and increases a site’s availability. However, applying replication techniques is not trivial, and various Content Delivery Networks (CDNs) have been created to facilitate replication for digital content providers. Th

CiteSeerX