Search CORE

3 research outputs found

Measurement and Analysis of TCP Throughput Collapse in Cluster-based Storage Systems (CMU-PDL-07-105)

Author: Amar Phanishayee (5426672)
David G. Andersen (5356964)
Elie Krevat (5426669)
Garth A. Gibson (5419415)
Gregory R. Ganger (5409485)
Srinivasan Seshan (5416934)
Vijay Vasudevan (5416592)
Publication venue
Publication date: 30/06/2018
Field of study

Cluster-based and iSCSI-based storage systems rely on standard TCP/IP-over-Ethernet for client access to data. Unfortunately, when data is striped over multiple networked storage nodes, a client can experience a TCP throughput collapse that results in much lower read bandwidth than should be provided by the available network links. Conceptually, this problem arises because the client simultaneously reads fragments of a data block from multiple sources that together send enough data to overload the switch buffers on the client’s link. This paper analyzes this Incast problem, explores its sensitivity to various system parameters, and examines the effectiveness of alternative TCP- and Ethernet-level strategies in mitigating the TCP throughput collapse

A (In)Cast of Thousands: Scaling Datacenter TCP to Kiloservers and Gigabits (CMU-PDL-09-101)

Author: Amar Phanishayee (5426672)
David G. Anderson (3360659)
Elie Krevat (5426669)
Garth A. Gibson (5419415)
Gregory R. Ganger (5409485)
Hiral Shah (5426675)
Vijay Vasudevan (5416592)
Publication venue
Publication date: 30/06/2018
Field of study

This paper presents a practical solution to the problem of high-fan-in, high-bandwidth synchronized TCP workloads in datacenter Ethernets—the Incast problem. In these networks, receivers often experience a drastic reduction in throughput when simultaneously requesting data from many servers using TCP. Inbound data overfills small switch buffers, leading to TCP timeouts lasting hundreds of milliseconds. For many datacenter workloads that have a synchronization requirement (e.g., filesystem reads and parallel dataintensive queries), incast can reduce throughput by up to 90%. Our solution for incast uses high-resolution timers in TCP to allow for microsecond-granularity timeouts. We show that this technique is effective in avoiding incast using simulation and real-world experiments. Last, we show that eliminating the minimum retransmission timeout bound is safe for all environments, including the wide-area

Tashi: Location-Aware Cluster Management

Author: David O’Hallaron (5427068)
Elie Krevat (5426669)
Gregory R. Ganger (5409485)
James Cipar (5363999)
Julio Lopez (5357006)
Michael Stroucken (5356982)
Michael A. Kozuch (5358098)
Michael P. Ryan (3692182)
Richard Glass (5427071)
Steven W. Schlosser (5416214)
Publication venue
Publication date: 30/06/2018
Field of study

Big Data applications, those that require large data corpora either for correctness or for fidelity, are becoming increasingly prevalent. Tashi is a cluster management system designed particularly for enabling cloud computing applications to operate on repositories of Big Data. These applications are extremely scalable but also have very high resource demands. A key technique for making such applications perform well is Location-Awareness. This paper demonstrates that location-aware applications can outperform those that are not location aware by factors of 3-11 and describes two general services developed for Tashi to provide location-awareness independently of the storage system