3,394 research outputs found

    Malleable coding for updatable cloud caching

    Full text link
    In software-as-a-service applications provisioned through cloud computing, locally cached data are often modified with updates from new versions. In some cases, with each edit, one may want to preserve both the original and new versions. In this paper, we focus on cases in which only the latest version must be preserved. Furthermore, it is desirable for the data to not only be compressed but to also be easily modified during updates, since representing information and modifying the representation both incur cost. We examine whether it is possible to have both compression efficiency and ease of alteration, in order to promote codeword reuse. In other words, we study the feasibility of a malleable and efficient coding scheme. The tradeoff between compression efficiency and malleability cost-the difficulty of synchronizing compressed versions-is measured as the length of a reused prefix portion. The region of achievable rates and malleability is found. Drawing from prior work on common information problems, we show that efficient data compression may not be the best engineering design principle when storing software-as-a-service data. In the general case, goals of efficiency and malleability are fundamentally in conflict.This work was supported in part by an NSF Graduate Research Fellowship (LRV), Grant CCR-0325774, and Grant CCF-0729069. This work was presented at the 2011 IEEE International Symposium on Information Theory [1] and the 2014 IEEE International Conference on Cloud Engineering [2]. The associate editor coordinating the review of this paper and approving it for publication was R. Thobaben. (CCR-0325774 - NSF Graduate Research Fellowship; CCF-0729069 - NSF Graduate Research Fellowship)Accepted manuscrip

    Fast and secure laptop backups with encrypted de-duplication

    Get PDF
    Many people now store large quantities of personal and corporate data on laptops or home computers. These often have poor or intermittent connectivity, and are vulnerable to theft or hardware failure. Conventional backup solutions are not well suited to this environment, and backup regimes are frequently inadequate. This paper describes an algorithm which takes advantage of the data which is common between users to increase the speed of backups, and reduce the storage requirements. This algorithm supports client-end per-user encryption which is necessary for confidential personal data. It also supports a unique feature which allows immediate detection of common subtrees, avoiding the need to query the backup system for every file. We describe a prototype implementation of this algorithm for Apple OS X, and present an analysis of the potential effectiveness, using real data obtained from a set of typical users. Finally, we discuss the use of this prototype in conjunction with remote cloud storage, and present an analysis of the typical cost savings.

    Cache Serializability: Reducing Inconsistency in Edge Transactions

    Full text link
    Read-only caches are widely used in cloud infrastructures to reduce access latency and load on backend databases. Operators view coherent caches as impractical at genuinely large scale and many client-facing caches are updated in an asynchronous manner with best-effort pipelines. Existing solutions that support cache consistency are inapplicable to this scenario since they require a round trip to the database on every cache transaction. Existing incoherent cache technologies are oblivious to transactional data access, even if the backend database supports transactions. We propose T-Cache, a novel caching policy for read-only transactions in which inconsistency is tolerable (won't cause safety violations) but undesirable (has a cost). T-Cache improves cache consistency despite asynchronous and unreliable communication between the cache and the database. We define cache-serializability, a variant of serializability that is suitable for incoherent caches, and prove that with unbounded resources T-Cache implements this new specification. With limited resources, T-Cache allows the system manager to choose a trade-off between performance and consistency. Our evaluation shows that T-Cache detects many inconsistencies with only nominal overhead. We use synthetic workloads to demonstrate the efficacy of T-Cache when data accesses are clustered and its adaptive reaction to workload changes. With workloads based on the real-world topologies, T-Cache detects 43-70% of the inconsistencies and increases the rate of consistent transactions by 33-58%.Comment: Ittay Eyal, Ken Birman, Robbert van Renesse, "Cache Serializability: Reducing Inconsistency in Edge Transactions," Distributed Computing Systems (ICDCS), IEEE 35th International Conference on, June~29 2015--July~2 201

    ISP-friendly Peer-assisted On-demand Streaming of Long Duration Content in BBC iPlayer

    Full text link
    In search of scalable solutions, CDNs are exploring P2P support. However, the benefits of peer assistance can be limited by various obstacle factors such as ISP friendliness - requiring peers to be within the same ISP, bitrate stratification - the need to match peers with others needing similar bitrate, and partial participation - some peers choosing not to redistribute content. This work relates potential gains from peer assistance to the average number of users in a swarm, its capacity, and empirically studies the effects of these obstacle factors at scale, using a month-long trace of over 2 million users in London accessing BBC shows online. Results indicate that even when P2P swarms are localised within ISPs, up to 88% of traffic can be saved. Surprisingly, bitrate stratification results in 2 large sub-swarms and does not significantly affect savings. However, partial participation, and the need for a minimum swarm size do affect gains. We investigate improvements to gain from increasing content availability through two well-studied techniques: content bundling - combining multiple items to increase availability, and historical caching of previously watched items. Bundling proves ineffective as increased server traffic from larger bundles outweighs benefits of availability, but simple caching can considerably boost traffic gains from peer assistance.Comment: In Proceedings of IEEE INFOCOM 201

    On I/O Performance and Cost Efficiency of Cloud Storage: A Client\u27s Perspective

    Get PDF
    Cloud storage has gained increasing popularity in the past few years. In cloud storage, data are stored in the service provider’s data centers; users access data via the network and pay the fees based on the service usage. For such a new storage model, our prior wisdom and optimization schemes on conventional storage may not remain valid nor applicable to the emerging cloud storage. In this dissertation, we focus on understanding and optimizing the I/O performance and cost efficiency of cloud storage from a client’s perspective. We first conduct a comprehensive study to gain insight into the I/O performance behaviors of cloud storage from the client side. Through extensive experiments, we have obtained several critical findings and useful implications for system optimization. We then design a client cache framework, called Pacaca, to further improve end-to-end performance of cloud storage. Pacaca seamlessly integrates parallelized prefetching and cost-aware caching by utilizing the parallelism potential and object correlations of cloud storage. In addition to improving system performance, we have also made efforts to reduce the monetary cost of using cloud storage services by proposing a latency- and cost-aware client caching scheme, called GDS-LC, which can achieve two optimization goals for using cloud storage services: low access latency and low monetary cost. Our experimental results show that our proposed client-side solutions significantly outperform traditional methods. Our study contributes to inspiring the community to reconsider system optimization methods in the cloud environment, especially for the purpose of integrating cloud storage into the current storage stack as a primary storage layer
    • …
    corecore