7,545 research outputs found

    ATP: a Datacenter Approximate Transmission Protocol

    Full text link
    Many datacenter applications such as machine learning and streaming systems do not need the complete set of data to perform their computation. Current approximate applications in datacenters run on a reliable network layer like TCP. To improve performance, they either let sender select a subset of data and transmit them to the receiver or transmit all the data and let receiver drop some of them. These approaches are network oblivious and unnecessarily transmit more data, affecting both application runtime and network bandwidth usage. On the other hand, running approximate application on a lossy network with UDP cannot guarantee the accuracy of application computation. We propose to run approximate applications on a lossy network and to allow packet loss in a controlled manner. Specifically, we designed a new network protocol called Approximate Transmission Protocol, or ATP, for datacenter approximate applications. ATP opportunistically exploits available network bandwidth as much as possible, while performing a loss-based rate control algorithm to avoid bandwidth waste and re-transmission. It also ensures bandwidth fair sharing across flows and improves accurate applications' performance by leaving more switch buffer space to accurate flows. We evaluated ATP with both simulation and real implementation using two macro-benchmarks and two real applications, Apache Kafka and Flink. Our evaluation results show that ATP reduces application runtime by 13.9% to 74.6% compared to a TCP-based solution that drops packets at sender, and it improves accuracy by up to 94.0% compared to UDP

    Energy-aware dynamic virtual machine consolidation for cloud datacenters

    Get PDF

    RepFlow: Minimizing Flow Completion Times with Replicated Flows in Data Centers

    Full text link
    Short TCP flows that are critical for many interactive applications in data centers are plagued by large flows and head-of-line blocking in switches. Hash-based load balancing schemes such as ECMP aggravate the matter and result in long-tailed flow completion times (FCT). Previous work on reducing FCT usually requires custom switch hardware and/or protocol changes. We propose RepFlow, a simple yet practically effective approach that replicates each short flow to reduce the completion times, without any change to switches or host kernels. With ECMP the original and replicated flows traverse distinct paths with different congestion levels, thereby reducing the probability of having long queueing delay. We develop a simple analytical model to demonstrate the potential improvement of RepFlow. Extensive NS-3 simulations and Mininet implementation show that RepFlow provides 50%--70% speedup in both mean and 99-th percentile FCT for all loads, and offers near-optimal FCT when used with DCTCP.Comment: To appear in IEEE INFOCOM 201

    Towards Hybrid Cloud-assisted Crowdsourced Live Streaming: Measurement and Analysis

    Full text link
    Crowdsourced Live Streaming (CLS), most notably Twitch.tv, has seen explosive growth in its popularity in the past few years. In such systems, any user can lively broadcast video content of interest to others, e.g., from a game player to many online viewers. To fulfill the demands from both massive and heterogeneous broadcasters and viewers, expensive server clusters have been deployed to provide video ingesting and transcoding services. Despite the existence of highly popular channels, a significant portion of the channels is indeed unpopular. Yet as our measurement shows, these broadcasters are consuming considerable system resources; in particular, 25% (resp. 30%) of bandwidth (resp. computation) resources are used by the broadcasters who do not have any viewers at all. In this paper, we closely examine the challenge of handling unpopular live-broadcasting channels in CLS systems and present a comprehensive solution for service partitioning on hybrid cloud. The trace-driven evaluation shows that our hybrid cloud-assisted design can smartly assign ingesting and transcoding tasks to the elastic cloud virtual machines, providing flexible system deployment cost-effectively
    corecore