3,394 research outputs found
Malleable coding for updatable cloud caching
In software-as-a-service applications provisioned through cloud computing, locally cached data are often modified with updates from new versions. In some cases, with each edit, one may want to preserve both the original and new versions. In this paper, we focus on cases in which only the latest version must be preserved. Furthermore, it is desirable for the data to not only be compressed but to also be easily modified during updates, since representing information and modifying the representation both incur cost. We examine whether it is possible to have both compression efficiency and ease of alteration, in order to promote codeword reuse. In other words, we study the feasibility of a malleable and efficient coding scheme. The tradeoff between compression efficiency and malleability cost-the difficulty of synchronizing compressed versions-is measured as the length of a reused prefix portion. The region of achievable rates and malleability is found. Drawing from prior work on common information problems, we show that efficient data compression may not be the best engineering design principle when storing software-as-a-service data. In the general case, goals of efficiency and malleability are fundamentally in conflict.This work was supported in part by an NSF Graduate Research Fellowship (LRV), Grant CCR-0325774, and Grant CCF-0729069. This work was presented at the 2011 IEEE International Symposium on Information Theory [1] and the 2014 IEEE International Conference on Cloud Engineering [2]. The associate editor coordinating the review of this paper and approving it for publication was R. Thobaben. (CCR-0325774 - NSF Graduate Research Fellowship; CCF-0729069 - NSF Graduate Research Fellowship)Accepted manuscrip
Fast and secure laptop backups with encrypted de-duplication
Many people now store large quantities of personal and corporate data on laptops or home computers. These often have poor or intermittent connectivity, and are vulnerable to theft or hardware failure. Conventional backup solutions are not well suited to this environment, and backup regimes are frequently inadequate. This paper describes an algorithm which takes advantage of the data which is common between users to increase the speed of backups, and reduce the storage requirements. This algorithm supports client-end per-user encryption which is necessary for confidential personal data. It also supports a unique feature which allows immediate detection of common subtrees, avoiding the need to query the backup system for every file. We describe a prototype implementation of this algorithm for Apple OS X, and present an analysis of the potential effectiveness, using real data obtained from a set of typical users. Finally, we discuss the use of this prototype in conjunction with remote cloud storage, and present an analysis of the typical cost savings.
Cache Serializability: Reducing Inconsistency in Edge Transactions
Read-only caches are widely used in cloud infrastructures to reduce access
latency and load on backend databases. Operators view coherent caches as
impractical at genuinely large scale and many client-facing caches are updated
in an asynchronous manner with best-effort pipelines. Existing solutions that
support cache consistency are inapplicable to this scenario since they require
a round trip to the database on every cache transaction.
Existing incoherent cache technologies are oblivious to transactional data
access, even if the backend database supports transactions. We propose T-Cache,
a novel caching policy for read-only transactions in which inconsistency is
tolerable (won't cause safety violations) but undesirable (has a cost). T-Cache
improves cache consistency despite asynchronous and unreliable communication
between the cache and the database. We define cache-serializability, a variant
of serializability that is suitable for incoherent caches, and prove that with
unbounded resources T-Cache implements this new specification. With limited
resources, T-Cache allows the system manager to choose a trade-off between
performance and consistency.
Our evaluation shows that T-Cache detects many inconsistencies with only
nominal overhead. We use synthetic workloads to demonstrate the efficacy of
T-Cache when data accesses are clustered and its adaptive reaction to workload
changes. With workloads based on the real-world topologies, T-Cache detects
43-70% of the inconsistencies and increases the rate of consistent transactions
by 33-58%.Comment: Ittay Eyal, Ken Birman, Robbert van Renesse, "Cache Serializability:
Reducing Inconsistency in Edge Transactions," Distributed Computing Systems
(ICDCS), IEEE 35th International Conference on, June~29 2015--July~2 201
ISP-friendly Peer-assisted On-demand Streaming of Long Duration Content in BBC iPlayer
In search of scalable solutions, CDNs are exploring P2P support. However, the
benefits of peer assistance can be limited by various obstacle factors such as
ISP friendliness - requiring peers to be within the same ISP, bitrate
stratification - the need to match peers with others needing similar bitrate,
and partial participation - some peers choosing not to redistribute content.
This work relates potential gains from peer assistance to the average number
of users in a swarm, its capacity, and empirically studies the effects of these
obstacle factors at scale, using a month-long trace of over 2 million users in
London accessing BBC shows online. Results indicate that even when P2P swarms
are localised within ISPs, up to 88% of traffic can be saved. Surprisingly,
bitrate stratification results in 2 large sub-swarms and does not significantly
affect savings. However, partial participation, and the need for a minimum
swarm size do affect gains. We investigate improvements to gain from increasing
content availability through two well-studied techniques: content bundling -
combining multiple items to increase availability, and historical caching of
previously watched items. Bundling proves ineffective as increased server
traffic from larger bundles outweighs benefits of availability, but simple
caching can considerably boost traffic gains from peer assistance.Comment: In Proceedings of IEEE INFOCOM 201
On I/O Performance and Cost Efficiency of Cloud Storage: A Client\u27s Perspective
Cloud storage has gained increasing popularity in the past few years. In cloud storage, data are stored in the service provider’s data centers; users access data via the network and pay the fees based on the service usage. For such a new storage model, our prior wisdom and optimization schemes on conventional storage may not remain valid nor applicable to the emerging cloud storage.
In this dissertation, we focus on understanding and optimizing the I/O performance and cost efficiency of cloud storage from a client’s perspective. We first conduct a comprehensive study to gain insight into the I/O performance behaviors of cloud storage from the client side. Through extensive experiments, we have obtained several critical findings and useful implications for system optimization. We then design a client cache framework, called Pacaca, to further improve end-to-end performance of cloud storage. Pacaca seamlessly integrates parallelized prefetching and cost-aware caching by utilizing the parallelism potential and object correlations of cloud storage. In addition to improving system performance, we have also made efforts to reduce the monetary cost of using cloud storage services by proposing a latency- and cost-aware client caching scheme, called GDS-LC, which can achieve two optimization goals for using cloud storage services: low access latency and low monetary cost. Our experimental results show that our proposed client-side solutions significantly outperform traditional methods. Our study contributes to inspiring the community to reconsider system optimization methods in the cloud environment, especially for the purpose of integrating cloud storage into the current storage stack as a primary storage layer
- …