16 research outputs found

    Datacenter Traffic Control: Understanding Techniques and Trade-offs

    Get PDF
    Datacenters provide cost-effective and flexible access to scalable compute and storage resources necessary for today's cloud computing needs. A typical datacenter is made up of thousands of servers connected with a large network and usually managed by one operator. To provide quality access to the variety of applications and services hosted on datacenters and maximize performance, it deems necessary to use datacenter networks effectively and efficiently. Datacenter traffic is often a mix of several classes with different priorities and requirements. This includes user-generated interactive traffic, traffic with deadlines, and long-running traffic. To this end, custom transport protocols and traffic management techniques have been developed to improve datacenter network performance. In this tutorial paper, we review the general architecture of datacenter networks, various topologies proposed for them, their traffic properties, general traffic control challenges in datacenters and general traffic control objectives. The purpose of this paper is to bring out the important characteristics of traffic control in datacenters and not to survey all existing solutions (as it is virtually impossible due to massive body of existing research). We hope to provide readers with a wide range of options and factors while considering a variety of traffic control mechanisms. We discuss various characteristics of datacenter traffic control including management schemes, transmission control, traffic shaping, prioritization, load balancing, multipathing, and traffic scheduling. Next, we point to several open challenges as well as new and interesting networking paradigms. At the end of this paper, we briefly review inter-datacenter networks that connect geographically dispersed datacenters which have been receiving increasing attention recently and pose interesting and novel research problems.Comment: Accepted for Publication in IEEE Communications Surveys and Tutorial

    Saba: Rethinking Datacenter Network Allocation from Application鈥檚 Perspective

    Get PDF

    An adaptable workload-agnostic flow scheduling mechanism for Data Center Networks

    Get PDF
    Cloud applications are an important phenomenon on the modern use of Internet. Search engines, social networks, content delivery and retail and e-commerce sites belong to this group of applications. These applications run on specialized facilities called data centers. An important element in the architecture of data centers is the communication infrastructure, commonly known as data center network (DCN). One of the challenges that DCN have to address is the satisfaction of service requirements of the applications expressed in terms of high responsiveness and high performance. In order to address this challenge, the traffic associated to these applications needs an special handling due to its properties which makes it essentially different to the traffic of other Internet applications such as mail or multimedia services. In order to contribute to the achievement of the previously mentioned performance goals, DCN should be able to prioritize the short flows (a few KB) over the long flows (several MB). However, given the time and space variations that the traffic presents, the information about flow sizes is not available in advance in order to plan the flow scheduling. In this thesis we present an adaptable workload-agnostic flow scheduling mechanism called AWAFS. It is an adaptable approach capable to agnostically adjust the scheduling configuration within DCN switches. This agnostic adjustment contributes to reduce the Flow Completion Time (FCT) of those short flows representing around 85 % of the traffic handled by cloud applications. Our results show that AWAFS can reduce the average FCT of short flows up to 24 % when compared to an agnostic non-adaptable state-of-the-art solution. Indeed, it can provide improvements of up to 60 % for medium flows and 39 % for long flows. Also, AWAFS can improve the FCT for short flows in scenarios with high heterogeneity in the traffic present in the network with a reduction of up to 35 %.Resumen: Las denominadas aplicaciones en nube son un fen贸meno importante en el uso moderno de internet. Los motores de b煤squeda, las redes sociales, los sistemas de distribuci贸n de contenido y los sitios de comercio electr贸nico, entre otros, pertenecen a este tipo de aplicaciones. 脡stas corren en instalaciones especializadas denominadas centros de datos. Un elemento importante en la arquitectura de los centros de datos es la infraestructura de comunicaciones, conocida como la red del centro de datos. Un desafio cr谋虂tico que la red de centro de datos tiene que abordar es el procesamiento del tr谩fico de las aplicaciones, el cual debido a sus propiedades es esencialmente diferente de el de otras aplicaciones de Internet tales como el correo electr贸nico o los servicios multimediales. Para contribuir al logro de las metas de desempe帽o de alta capacidad de respuesta y alto desempe帽o, la red del centro de datos deber谋虂a ser capaz de diferenciar y priorizar adecuadamente los flujos peque帽os (Unos cuantos Kilobytes) con respecto a los flujos grandes (Varios Megabytes). Sin embargo, dadas las variaciones espacio temporales que presenta el tr谩fico de las aplicaciones, la informaci贸n de los tama帽os de los flujos no est谩 disponible de antemano para poder programar la tranmisi贸n de los mismos. En esta tesis presentamos un mecanismo de conmutaci贸n de flujos adaptable y agn贸stico con respecto a las cargas de trabajo presentes en la red denominado AWAFS por su sigla en ingl茅s. AWAFS plantea un enfoque adaptable, capaz de ajustar de manera agn贸stica la configuraci贸n de conmutaci贸n al interior de los suiches de la red del centro de datos. Este ajuste agn贸stico contribuye a reducir el tiempo de completaci贸n de los flujos peque帽os, los cuales representan entre un 85 % y un 95 % del tr谩fico manejado por las aplicaciones. Nuestros resultados muestran que AWAFS puede reducir hasta en un 24 % el tiempo promedio de completaci贸n de los flujos cuando se compara con una t茅cnica agn贸stica no adaptable presentada en el estado del arte, sin inducir inanicci贸n en los flujos grandes. En efecto, AWAFS puede proporcionar mejoras de hasta un 60 % para los flujos medios y 39 % para los flujos grandes. Por su adaptabilidad, AWAFS tambi茅n logra obtener esta mejora en escenarios con alta heterogeneidad en el tr谩fico presente en la red, ofreciendo una reducci贸n de hasta 35 % en el tiempo promedio de completaci贸n para los flujos peque帽os.Doctorad

    Homa: A Receiver-Driven Low-Latency Transport Protocol Using Network Priorities (Complete Version)

    Full text link
    Homa is a new transport protocol for datacenter networks. It provides exceptionally low latency, especially for workloads with a high volume of very short messages, and it also supports large messages and high network utilization. Homa uses in-network priority queues to ensure low latency for short messages; priority allocation is managed dynamically by each receiver and integrated with a receiver-driven flow control mechanism. Homa also uses controlled overcommitment of receiver downlinks to ensure efficient bandwidth utilization at high load. Our implementation of Homa delivers 99th percentile round-trip times less than 15{\mu}s for short messages on a 10 Gbps network running at 80% load. These latencies are almost 100x lower than the best published measurements of an implementation. In simulations, Homa's latency is roughly equal to pFabric and significantly better than pHost, PIAS, and NDP for almost all message sizes and workloads. Homa can also sustain higher network loads than pFabric, pHost, or PIAS.Comment: This paper is an extended version of the paper on Homa that was published in ACM SIGCOMM 2018. Material had to be removed from Sections 5.1 and 5.2 to meet the SIGCOMM page restrictions; this version restores the missing material. This paper is 18 pages, plus two pages of reference

    Towards A Workload-Driven Flow Scheduler For Modern Datacenters

    Get PDF
    Modern datacenters run different applications with various communication requirements in terms of bandwidth and deadlines. Of particular interest are deadlines that are driving web-search workloads e.g. when submitting requests to Bing search engine or loading Facebook home page. Serving the submitted requests in a timely fashion relies on meeting the deadlines of the generated scatter/gather flows for each request. The current flow-schedulers are deadline unaware, and they just start flows as soon as they arrive when the bandwidth resource is available. In this thesis, we present Artemis: a workload-driven flow-scheduler at the end-hosts that learns via reinforcement how to schedule flows to meet their deadlines. The flow-scheduling policy in Artemis is not hard-coded and is instead computed in real-time based on a reinforcement-learning control loop. In Artemis, we model flow-scheduling as a deep reinforcement learning problem, and we use the actor-critic architecture to solve it. Flows in Artemis do not start as soon as they arrive, and a source starts sending a particular flow upon requesting and acquiring a token from the destination node. The token-request is issued by the source node and it exposes the flow's requirements to the destination. At the destination side, Artemis flow-scheduler is a decision-making agent that learns how to serve the awaiting token-requests based on their embedded requirements, using the deep reinforcement learning actor-critic model. We use two gather workloads to demonstrate (1) Artemis's ability to learn how to schedule deadline flows on its own and (2) its effectiveness to meet flow deadlines. We compare the performance of Artemis against Earliest Deadline First (EDF), and two other rule-based flow-scheduling policies that, unlike EDF, are aware of both the sizes and the deadlines of the flows: Largest Size Deadline ratio First (LSDF) and Smallest Size Deadline ratio First (SSDF). LSDF schedules arrived flows with largest size deadline ratio first, while LSDF does the inverse logic. Our experimental results show that Artemis flow-scheduler is able to capture the structure of the gather workloads, maps the requirements of the arrived flows to the order at which they need be served and computes a flow-scheduling strategy based on that. Using the first gather workload that has an equal distribution of flows with (size, deadline) pairs that are equal to (350KB, 40ms) and (250KB, 50ms), Artemis met +35.58% more deadlines than EDF, +24.93% more than SSDF, and performed marginally better than LSDF with +4.42%. For the second workload, 60% of flows have a (size, deadline) pair equals to (350KB, 40ms) and 40% flows with (250KB, 50ms), Artemis outperformed all three flows-schedulers, meeting +16.34% more deadlines than the second best SSDF
    corecore