109 research outputs found
Elastic Provisioning of Cloud Caches: a Cost-aware TTL Approach
We consider elastic resource provisioning in the cloud, focusing on in-memory
key-value stores used as caches. Our goal is to dynamically scale resources to
the traffic pattern minimizing the overall cost, which includes not only the
storage cost, but also the cost due to misses. In fact, a small variation on
the cache miss ratio may have a significant impact on user perceived
performance in modern web services, which in turn has an impact on the overall
revenues for the content provider that uses those services. We propose and
study a dynamic algorithm for TTL caches, which is able to obtain
close-to-minimal costs. Since high-throughput caches require low complexity
operations, we discuss a practical implementation of such a scheme requiring
constant overhead per request independently from the cache size. We evaluate
our solution with real-world traces collected from Akamai, and show that we are
able to obtain a 17% decrease in the overall cost compared to a baseline static
configuration
Optimizing Replacement Policies for Content Delivery Network Caching: Beyond Belady to Attain A Seemingly Unattainable Byte Miss Ratio
When facing objects/files of differing sizes in content delivery networks
(CDNs) caches, pursuing an optimal object miss ratio (OMR) by approximating
Belady no longer ensures an optimal byte miss ratio (BMR), creating confusion
about how to achieve a superior BMR in CDNs. To address this issue, we
experimentally observe that there exists a time window to delay the eviction of
the object with the longest reuse distance to improve BMR without increasing
OMR. As a result, we introduce a deep reinforcement learning (RL) model to
capture this time window by dynamically monitoring the changes in OMR and BMR,
and implementing a BMR-friendly policy in the time window. Based on this
policy, we propose a Belady and Size Eviction (LRU-BaSE) algorithm, reducing
BMR while maintaining OMR. To make LRU-BaSE efficient and practical, we address
the feedback delay problem of RL with a two-pronged approach. On the one hand,
our observation of a rear section of the LRU cache queue containing most of the
eviction candidates allows LRU-BaSE to shorten the decision region. On the
other hand, the request distribution on CDNs makes it feasible to divide the
learning region into multiple sub-regions that are each learned with reduced
time and increased accuracy. In real CDN systems, compared to LRU, LRU-BaSE can
reduce "backing to OS" traffic and access latency by 30.05\% and 17.07\%,
respectively, on average. The results on the simulator confirm that LRU-BaSE
outperforms the state-of-the-art cache replacement policies, where LRU-BaSE's
BMR is 0.63\% and 0.33\% less than that of Belady and Practical Flow-based
Offline Optimal (PFOO), respectively, on average. In addition, compared to
Learning Relaxed Belady (LRB), LRU-BaSE can yield relatively stable performance
when facing workload drift
Improved caching for HTTP-based video on demand using scalable video coding
HTTP-based delivery for Video on Demand (VoD) has been gaining popularity within recent years. Progressive Download over HTTP, typically used in VoD, takes advantage of the widely deployed network caches to release video servers from sending the same content to a high number of users in the same VoD service. However, due to the inherent heterogeneity of user demands, which may result in requesting the same video content in different resolutions or qualities, the caching efficiency is expected to decrease due to a higher variety in requested media files. The use of Scalable Video Coding allows different representations of the same content to be combined in a single file, whose parts, aka layers, are requested sequentially by a user up to the maximum desired quality. In this paper we show the benefits of using Scalable Video Coding to maintain the same set of possible video content representations, while at the same time maximizing the caching efficiency
Optimization and Evaluation of Service Speed and Reliability in Modern Caching Applications
The performance of caching systems in general, and Internet caches in particular, is evaluated by means of the user-perceived service speed, reliability of downloaded content, and system scalability. In this dissertation, we focus on optimizing the speed of service, as well as on evaluating the reliability and quality of data sent to users.
In order to optimize the service speed, we seek optimal replacement policies in the first part of the dissertation, as it is well known that download delays are a direct product of document availability at the cache; in demand-driven caches, the cache content is completely determined by the cache replacement policy. In the literature, many ad-hoc policies that utilize document sizes, retrieval latency, probability of references, and temporal locality of requests, have been proposed. However, the problem of finding optimal policies under these factors has not been pursued in any systematic manner. Here, we take a step in that direction: Still under the Independent Reference Model, we show that a simple Markov stationary policy minimizes the long-run average metric induced by non-uniform documents under optional cache replacement. We then use this result to propose a framework for operating caches under multiple performance metrics, by solving a constrained caching problem with a single constraint.
The second part of the dissertation is devoted to studying data reliability and cache consistency issues: A cache object is termed consistent if it is identical to the master document at the origin server, at the time it is served to users. Cached objects become stale after the master is modified, and stale copies remain served to users until the cache is refreshed, subject to network transmit delays. However, the performance of Internet consistency algorithms is evaluated through the cache hit rate and network traffic load that do not inform on data staleness. To remedy this, we formalize a framework and the novel hit* rate measure, which captures consistent downloads from the cache. To demonstrate this new methodology, we calculate the hit and hit* rates produced by two TTL algorithms, under zero and non-zero delays, and evaluate the hit and hit* rates in applications
Efficient Peer-to-Peer Namespace Searches
In this paper we describe new methods for efficient and exact search
(keyword and full-text) in distributed namespaces. Our methods can be
used in conjunction with existing distributed lookup schemes, such as
Distributed Hash Tables, and distributed directories. We describe how
indexes for implementing distributed searches can be efficiently
created, located, and stored. We describe techniques for creating
approximate indexes that can be used to bound the space requirement at
individual hosts; such techniques are particularly useful for full-text
searches that may require a very large number of individual indexes to
be created and maintained.
Our methods use a new distributed data structure called the view tree.
View trees can be used to efficiently cache and locate results from
prior queries. We describe how view trees are created, and maintained.
We present experimental results, using large namespaces and realistic
data, showing that the techniques introduced in this paper can reduce
search overheads (both network and processing costs) by more than an
order of magnitude.
(UMIACS-TR-2004-13
- …