57 research outputs found

    Taming tail latency for erasure-coded, distributed storage systems

    Get PDF
    Nowadays, in distributed storage systems, long tails of responsible time are of particular concern. Modern large companies like Bing, Facebook and Amazon Web Service show that 99.9th percentile response times being orders of magnitude worse than the mean. With the advantages of maintaining high data reliability and ensur- ing enough space eciency, erasure code has become a popular storage method in distributed storage systems. However, due to the lack of mathematical models for analyzing erasure-coded based distributed storage systems, taming tail latency is still an open problem. In this research, we quantify tail latency in such systems by deriving a closed upper bounds on tail latency for general service time distribution and heterogeneous files. Later we specified service time to shifted exponentially distributed. Based on this model, we developed an optimization problem to minimize weighted tail latency probability of deriving all files. We propose an alternating minimization algorithm for this problem. Our simulation results have shown significant reduction on tail latency of erasure-coded distributed storage systems with realistic environment workload

    TailX: Scheduling Heterogeneous Multiget Queries to Improve Tail Latencies in Key-Value Stores

    Get PDF
    International audienceUsers of interactive services such as e-commerce platforms have high expectations for the performance and responsiveness of these services. Tail latency, denoting the worst service times, contributes greatly to user dissatisfaction and should be minimized. Maintaining low tail latency for interactive services is challenging because a request is not complete until all its operations are completed. The challenge is to identify bottleneck operations and schedule them on uncoordinated backend servers with minimal overhead, when the duration of these operations are heterogeneous and unpredictable. In this paper, we focus on improving the latency of multiget operations in cloud data stores. We present TailX, a task-aware multiget scheduling algorithm that improves tail latencies under heterogeneous workloads. TailX schedules operations according to an estimation of the size of the corresponding data, and allows itself to procrastinate some operations to give way to higher priority ones. We implement TailX in Cassandra, a widely used key-value store. The result is an improved overall performance of the cloud data stores for a wide variety of heterogeneous workloads. Specifically, our experiments under heterogeneous YCSB workloads show that TailX outperforms state-of-the-art solutions and reduces tail latencies by up to 70% and median latencies by up to 75%
    • …
    corecore