109 research outputs found

    Elastic Provisioning of Cloud Caches: a Cost-aware TTL Approach

    Get PDF
    We consider elastic resource provisioning in the cloud, focusing on in-memory key-value stores used as caches. Our goal is to dynamically scale resources to the traffic pattern minimizing the overall cost, which includes not only the storage cost, but also the cost due to misses. In fact, a small variation on the cache miss ratio may have a significant impact on user perceived performance in modern web services, which in turn has an impact on the overall revenues for the content provider that uses those services. We propose and study a dynamic algorithm for TTL caches, which is able to obtain close-to-minimal costs. Since high-throughput caches require low complexity operations, we discuss a practical implementation of such a scheme requiring constant overhead per request independently from the cache size. We evaluate our solution with real-world traces collected from Akamai, and show that we are able to obtain a 17% decrease in the overall cost compared to a baseline static configuration

    Optimizing Replacement Policies for Content Delivery Network Caching: Beyond Belady to Attain A Seemingly Unattainable Byte Miss Ratio

    Full text link
    When facing objects/files of differing sizes in content delivery networks (CDNs) caches, pursuing an optimal object miss ratio (OMR) by approximating Belady no longer ensures an optimal byte miss ratio (BMR), creating confusion about how to achieve a superior BMR in CDNs. To address this issue, we experimentally observe that there exists a time window to delay the eviction of the object with the longest reuse distance to improve BMR without increasing OMR. As a result, we introduce a deep reinforcement learning (RL) model to capture this time window by dynamically monitoring the changes in OMR and BMR, and implementing a BMR-friendly policy in the time window. Based on this policy, we propose a Belady and Size Eviction (LRU-BaSE) algorithm, reducing BMR while maintaining OMR. To make LRU-BaSE efficient and practical, we address the feedback delay problem of RL with a two-pronged approach. On the one hand, our observation of a rear section of the LRU cache queue containing most of the eviction candidates allows LRU-BaSE to shorten the decision region. On the other hand, the request distribution on CDNs makes it feasible to divide the learning region into multiple sub-regions that are each learned with reduced time and increased accuracy. In real CDN systems, compared to LRU, LRU-BaSE can reduce "backing to OS" traffic and access latency by 30.05\% and 17.07\%, respectively, on average. The results on the simulator confirm that LRU-BaSE outperforms the state-of-the-art cache replacement policies, where LRU-BaSE's BMR is 0.63\% and 0.33\% less than that of Belady and Practical Flow-based Offline Optimal (PFOO), respectively, on average. In addition, compared to Learning Relaxed Belady (LRB), LRU-BaSE can yield relatively stable performance when facing workload drift

    Improved caching for HTTP-based video on demand using scalable video coding

    Get PDF
    HTTP-based delivery for Video on Demand (VoD) has been gaining popularity within recent years. Progressive Download over HTTP, typically used in VoD, takes advantage of the widely deployed network caches to release video servers from sending the same content to a high number of users in the same VoD service. However, due to the inherent heterogeneity of user demands, which may result in requesting the same video content in different resolutions or qualities, the caching efficiency is expected to decrease due to a higher variety in requested media files. The use of Scalable Video Coding allows different representations of the same content to be combined in a single file, whose parts, aka layers, are requested sequentially by a user up to the maximum desired quality. In this paper we show the benefits of using Scalable Video Coding to maintain the same set of possible video content representations, while at the same time maximizing the caching efficiency

    The multikey Web cache simulator: a platform for designing proxy cache management techniques

    Full text link

    Optimization and Evaluation of Service Speed and Reliability in Modern Caching Applications

    Get PDF
    The performance of caching systems in general, and Internet caches in particular, is evaluated by means of the user-perceived service speed, reliability of downloaded content, and system scalability. In this dissertation, we focus on optimizing the speed of service, as well as on evaluating the reliability and quality of data sent to users. In order to optimize the service speed, we seek optimal replacement policies in the first part of the dissertation, as it is well known that download delays are a direct product of document availability at the cache; in demand-driven caches, the cache content is completely determined by the cache replacement policy. In the literature, many ad-hoc policies that utilize document sizes, retrieval latency, probability of references, and temporal locality of requests, have been proposed. However, the problem of finding optimal policies under these factors has not been pursued in any systematic manner. Here, we take a step in that direction: Still under the Independent Reference Model, we show that a simple Markov stationary policy minimizes the long-run average metric induced by non-uniform documents under optional cache replacement. We then use this result to propose a framework for operating caches under multiple performance metrics, by solving a constrained caching problem with a single constraint. The second part of the dissertation is devoted to studying data reliability and cache consistency issues: A cache object is termed consistent if it is identical to the master document at the origin server, at the time it is served to users. Cached objects become stale after the master is modified, and stale copies remain served to users until the cache is refreshed, subject to network transmit delays. However, the performance of Internet consistency algorithms is evaluated through the cache hit rate and network traffic load that do not inform on data staleness. To remedy this, we formalize a framework and the novel hit* rate measure, which captures consistent downloads from the cache. To demonstrate this new methodology, we calculate the hit and hit* rates produced by two TTL algorithms, under zero and non-zero delays, and evaluate the hit and hit* rates in applications

    Efficient Peer-to-Peer Namespace Searches

    Get PDF
    In this paper we describe new methods for efficient and exact search (keyword and full-text) in distributed namespaces. Our methods can be used in conjunction with existing distributed lookup schemes, such as Distributed Hash Tables, and distributed directories. We describe how indexes for implementing distributed searches can be efficiently created, located, and stored. We describe techniques for creating approximate indexes that can be used to bound the space requirement at individual hosts; such techniques are particularly useful for full-text searches that may require a very large number of individual indexes to be created and maintained. Our methods use a new distributed data structure called the view tree. View trees can be used to efficiently cache and locate results from prior queries. We describe how view trees are created, and maintained. We present experimental results, using large namespaces and realistic data, showing that the techniques introduced in this paper can reduce search overheads (both network and processing costs) by more than an order of magnitude. (UMIACS-TR-2004-13
    corecore