3,645 research outputs found
A financial cost metric for result caching
Web search engines cache results of frequent and/or recent queries. Result caching strategies can be evaluated using different metrics, hit rate being the most well-known. Recent works take the processing overhead of queries into account when evaluating the performance of result caching strategies and propose cost-aware caching strategies. In this paper, we propose a financial cost metric that goes one step beyond and takes also the hourly electricity prices into account when computing the cost. We evaluate the most well-known static, dynamic, and hybrid result caching strategies under this new metric. Moreover, we propose a financial-cost-aware version of the well-known LRU strategy and show that it outperforms the original LRU strategy in terms of the financial cost metric. Copyright © 2013 ACM
An Optimal Trade-off between Content Freshness and Refresh Cost
Caching is an effective mechanism for reducing bandwidth usage and
alleviating server load. However, the use of caching entails a compromise
between content freshness and refresh cost. An excessive refresh allows a high
degree of content freshness at a greater cost of system resource. Conversely, a
deficient refresh inhibits content freshness but saves the cost of resource
usages. To address the freshness-cost problem, we formulate the refresh
scheduling problem with a generic cost model and use this cost model to
determine an optimal refresh frequency that gives the best tradeoff between
refresh cost and content freshness. We prove the existence and uniqueness of an
optimal refresh frequency under the assumptions that the arrival of content
update is Poisson and the age-related cost monotonically increases with
decreasing freshness. In addition, we provide an analytic comparison of system
performance under fixed refresh scheduling and random refresh scheduling,
showing that with the same average refresh frequency two refresh schedulings
are mathematically equivalent in terms of the long-run average cost
I Know Why You Went to the Clinic: Risks and Realization of HTTPS Traffic Analysis
Revelations of large scale electronic surveillance and data mining by
governments and corporations have fueled increased adoption of HTTPS. We
present a traffic analysis attack against over 6000 webpages spanning the HTTPS
deployments of 10 widely used, industry-leading websites in areas such as
healthcare, finance, legal services and streaming video. Our attack identifies
individual pages in the same website with 89% accuracy, exposing personal
details including medical conditions, financial and legal affairs and sexual
orientation. We examine evaluation methodology and reveal accuracy variations
as large as 18% caused by assumptions affecting caching and cookies. We present
a novel defense reducing attack accuracy to 27% with a 9% traffic increase, and
demonstrate significantly increased effectiveness of prior defenses in our
evaluation context, inclusive of enabled caching, user-specific cookies and
pages within the same website
Second chance: A hybrid approach for dynamic result caching and prefetching in search engines
Cataloged from PDF version of article.Web search engines are known to cache the results of previously issued queries. The stored results typically contain the document summaries and some data that is used to construct the final search result page returned to the user. An alternative strategy is to store in the cache only the result document IDs, which take much less space, allowing results of more queries to be cached. These two strategies lead to an interesting trade-off between the hit rate and the average query response latency. In this work, in order to exploit this trade-off, we propose a hybrid result caching strategy where a dynamic result cache is split into two sections: an HTML cache and a docID cache. Moreover, using a realistic cost model, we evaluate the performance of different result prefetching strategies for the proposed hybrid cache and the baseline HTML-only cache. Finally, we propose a machine learning approach to predict singleton queries, which occur only once in the query stream. We show that when the proposed hybrid result caching strategy is coupled with the singleton query predictor, the hit rate is further improved. © 2013 ACM
Replication for Web Hosting Systems
Replication is a well-known technique to improve the accessibility of Web sites. It generally offers reduced client latencies and increases a site’s availability. However, applying replication techniques is not trivial, and various Content Delivery Networks (CDNs) have been created to facilitate replication for digital content providers. Th
- …