717 research outputs found

    Taming tail latency for erasure-coded, distributed storage systems

    Get PDF
    Nowadays, in distributed storage systems, long tails of responsible time are of particular concern. Modern large companies like Bing, Facebook and Amazon Web Service show that 99.9th percentile response times being orders of magnitude worse than the mean. With the advantages of maintaining high data reliability and ensur- ing enough space eciency, erasure code has become a popular storage method in distributed storage systems. However, due to the lack of mathematical models for analyzing erasure-coded based distributed storage systems, taming tail latency is still an open problem. In this research, we quantify tail latency in such systems by deriving a closed upper bounds on tail latency for general service time distribution and heterogeneous files. Later we specified service time to shifted exponentially distributed. Based on this model, we developed an optimization problem to minimize weighted tail latency probability of deriving all files. We propose an alternating minimization algorithm for this problem. Our simulation results have shown significant reduction on tail latency of erasure-coded distributed storage systems with realistic environment workload

    TOFEC: Achieving Optimal Throughput-Delay Trade-off of Cloud Storage Using Erasure Codes

    Full text link
    Our paper presents solutions using erasure coding, parallel connections to storage cloud and limited chunking (i.e., dividing the object into a few smaller segments) together to significantly improve the delay performance of uploading and downloading data in and out of cloud storage. TOFEC is a strategy that helps front-end proxy adapt to level of workload by treating scalable cloud storage (e.g. Amazon S3) as a shared resource requiring admission control. Under light workloads, TOFEC creates more smaller chunks and uses more parallel connections per file, minimizing service delay. Under heavy workloads, TOFEC automatically reduces the level of chunking (fewer chunks with increased size) and uses fewer parallel connections to reduce overhead, resulting in higher throughput and preventing queueing delay. Our trace-driven simulation results show that TOFEC's adaptation mechanism converges to an appropriate code that provides the optimal delay-throughput trade-off without reducing system capacity. Compared to a non-adaptive strategy optimized for throughput, TOFEC delivers 2.5x lower latency under light workloads; compared to a non-adaptive strategy optimized for latency, TOFEC can scale to support over 3x as many requests

    Fast Lean Erasure-Coded Atomic Memory Object

    Get PDF
    In this work, we propose FLECKS, an algorithm which implements atomic memory objects in a multi-writer multi-reader (MWMR) setting in asynchronous networks and server failures. FLECKS substantially reduces storage and communication costs over its replication-based counterparts by employing erasure-codes. FLECKS outperforms the previously proposed algorithms in terms of the metrics that to deliver good performance such as storage cost per object, communication cost a high fault-tolerance of clients and servers, guaranteed liveness of operation, and a given number of communication rounds per operation, etc. We provide proofs for liveness and atomicity properties of FLECKS and derive worst-case latency bounds for the operations. We implemented and deployed FLECKS in cloud-based clusters and demonstrate that FLECKS has substantially lower storage and bandwidth costs, and significantly lower latency of operations than the replication-based mechanisms
    • …
    corecore