Search CORE

269 research outputs found

ATP: a Datacenter Approximate Transmission Protocol

Author: Liu Ke
Tsai Shin-Yeh
Zhang Yiying
Publication venue
Publication date: 06/01/2019
Field of study

Many datacenter applications such as machine learning and streaming systems do not need the complete set of data to perform their computation. Current approximate applications in datacenters run on a reliable network layer like TCP. To improve performance, they either let sender select a subset of data and transmit them to the receiver or transmit all the data and let receiver drop some of them. These approaches are network oblivious and unnecessarily transmit more data, affecting both application runtime and network bandwidth usage. On the other hand, running approximate application on a lossy network with UDP cannot guarantee the accuracy of application computation. We propose to run approximate applications on a lossy network and to allow packet loss in a controlled manner. Specifically, we designed a new network protocol called Approximate Transmission Protocol, or ATP, for datacenter approximate applications. ATP opportunistically exploits available network bandwidth as much as possible, while performing a loss-based rate control algorithm to avoid bandwidth waste and re-transmission. It also ensures bandwidth fair sharing across flows and improves accurate applications' performance by leaving more switch buffer space to accurate flows. We evaluated ATP with both simulation and real implementation using two macro-benchmarks and two real applications, Apache Kafka and Flink. Our evaluation results show that ATP reduces application runtime by 13.9% to 74.6% compared to a TCP-based solution that drops packets at sender, and it improves accuracy by up to 94.0% compared to UDP

arXiv.org e-Print Archive

PrismDB: Read-aware Log-structured Merge Trees for Heterogeneous Storage

Author: Cidon Asaf
Freedman Michael J.
Jamieson Kyle
Raina Ashwini
Publication venue
Publication date: 24/09/2020
Field of study

In recent years, emerging hardware storage technologies have focused on divergent goals: better performance or lower cost-per-bit of storage. Correspondingly, data systems that employ these new technologies are optimized either to be fast (but expensive) or cheap (but slow). We take a different approach: by combining multiple tiers of fast and low-cost storage technologies within the same system, we can achieve a Pareto-efficient balance between performance and cost-per-bit. This paper presents the design and implementation of PrismDB, a novel log-structured merge tree based key-value store that exploits a full spectrum of heterogeneous storage technologies (from 3D XPoint to QLC NAND). We introduce the notion of "read-awareness" to log-structured merge trees, which allows hot objects to be pinned to faster storage, achieving better tiering and hot-cold separation of objects. Compared to the standard use of RocksDB on flash in datacenters today, PrismDB's average throughput on heterogeneous storage is 2.3

\times

faster and its tail latency is more than an order of magnitude better, using hardware than is half the cost

arXiv.org e-Print Archive

Datacenter Traffic Control: Understanding Techniques and Trade-offs

Author: Noormohammadpour Mohammad
Raghavendra Cauligi S.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 10/12/2017
Field of study

Datacenters provide cost-effective and flexible access to scalable compute and storage resources necessary for today's cloud computing needs. A typical datacenter is made up of thousands of servers connected with a large network and usually managed by one operator. To provide quality access to the variety of applications and services hosted on datacenters and maximize performance, it deems necessary to use datacenter networks effectively and efficiently. Datacenter traffic is often a mix of several classes with different priorities and requirements. This includes user-generated interactive traffic, traffic with deadlines, and long-running traffic. To this end, custom transport protocols and traffic management techniques have been developed to improve datacenter network performance. In this tutorial paper, we review the general architecture of datacenter networks, various topologies proposed for them, their traffic properties, general traffic control challenges in datacenters and general traffic control objectives. The purpose of this paper is to bring out the important characteristics of traffic control in datacenters and not to survey all existing solutions (as it is virtually impossible due to massive body of existing research). We hope to provide readers with a wide range of options and factors while considering a variety of traffic control mechanisms. We discuss various characteristics of datacenter traffic control including management schemes, transmission control, traffic shaping, prioritization, load balancing, multipathing, and traffic scheduling. Next, we point to several open challenges as well as new and interesting networking paradigms. At the end of this paper, we briefly review inter-datacenter networks that connect geographically dispersed datacenters which have been receiving increasing attention recently and pose interesting and novel research problems.Comment: Accepted for Publication in IEEE Communications Surveys and Tutorial

arXiv.org e-Print Archive

ZENODO

FigShare

Control What You Include! Server-Side Protection against Third Party Web Tracking

Author: Bielova Nataliia
Rezk Tamara
Somé Dolière Francis
Publication venue
Publication date: 22/03/2017
Field of study

Third party tracking is the practice by which third parties recognize users accross different websites as they browse the web. Recent studies show that 90% of websites contain third party content that is tracking its users across the web. Website developers often need to include third party content in order to provide basic functionality. However, when a developer includes a third party content, she cannot know whether the third party contains tracking mechanisms. If a website developer wants to protect her users from being tracked, the only solution is to exclude any third-party content, thus trading functionality for privacy. We describe and implement a privacy-preserving web architecture that gives website developers a control over third party tracking: developers are able to include functionally useful third party content, the same time ensuring that the end users are not tracked by the third parties

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server