Search CORE

16 research outputs found

Datacenter Traffic Control: Understanding Techniques and Trade-offs

Author: Noormohammadpour Mohammad
Raghavendra Cauligi S.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 10/12/2017
Field of study

Datacenters provide cost-effective and flexible access to scalable compute and storage resources necessary for today's cloud computing needs. A typical datacenter is made up of thousands of servers connected with a large network and usually managed by one operator. To provide quality access to the variety of applications and services hosted on datacenters and maximize performance, it deems necessary to use datacenter networks effectively and efficiently. Datacenter traffic is often a mix of several classes with different priorities and requirements. This includes user-generated interactive traffic, traffic with deadlines, and long-running traffic. To this end, custom transport protocols and traffic management techniques have been developed to improve datacenter network performance. In this tutorial paper, we review the general architecture of datacenter networks, various topologies proposed for them, their traffic properties, general traffic control challenges in datacenters and general traffic control objectives. The purpose of this paper is to bring out the important characteristics of traffic control in datacenters and not to survey all existing solutions (as it is virtually impossible due to massive body of existing research). We hope to provide readers with a wide range of options and factors while considering a variety of traffic control mechanisms. We discuss various characteristics of datacenter traffic control including management schemes, transmission control, traffic shaping, prioritization, load balancing, multipathing, and traffic scheduling. Next, we point to several open challenges as well as new and interesting networking paradigms. At the end of this paper, we briefly review inter-datacenter networks that connect geographically dispersed datacenters which have been receiving increasing attention recently and pose interesting and novel research problems.Comment: Accepted for Publication in IEEE Communications Surveys and Tutorial

arXiv.org e-Print Archive

Crossref

OSF Preprints

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

FigShare

Saba: Rethinking Datacenter Network Allocation from Application’s Perspective

Author: Costa Paolo
Grot Boris
Katebzadeh Siavash
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 08/05/2023
Field of study

Edinburgh Research Explorer

An adaptable workload-agnostic flow scheduling mechanism for Data Center Networks

Author: Gutiérrez Betancur Sergio Armando
Publication venue
Publication date: 30/04/2018
Field of study

Cloud applications are an important phenomenon on the modern use of Internet. Search engines, social networks, content delivery and retail and e-commerce sites belong to this group of applications. These applications run on specialized facilities called data centers. An important element in the architecture of data centers is the communication infrastructure, commonly known as data center network (DCN). One of the challenges that DCN have to address is the satisfaction of service requirements of the applications expressed in terms of high responsiveness and high performance. In order to address this challenge, the traffic associated to these applications needs an special handling due to its properties which makes it essentially different to the traffic of other Internet applications such as mail or multimedia services. In order to contribute to the achievement of the previously mentioned performance goals, DCN should be able to prioritize the short flows (a few KB) over the long flows (several MB). However, given the time and space variations that the traffic presents, the information about flow sizes is not available in advance in order to plan the flow scheduling. In this thesis we present an adaptable workload-agnostic flow scheduling mechanism called AWAFS. It is an adaptable approach capable to agnostically adjust the scheduling configuration within DCN switches. This agnostic adjustment contributes to reduce the Flow Completion Time (FCT) of those short flows representing around 85 % of the traffic handled by cloud applications. Our results show that AWAFS can reduce the average FCT of short flows up to 24 % when compared to an agnostic non-adaptable state-of-the-art solution. Indeed, it can provide improvements of up to 60 % for medium flows and 39 % for long flows. Also, AWAFS can improve the FCT for short flows in scenarios with high heterogeneity in the traffic present in the network with a reduction of up to 35 %.Resumen: Las denominadas aplicaciones en nube son un fenómeno importante en el uso moderno de internet. Los motores de búsqueda, las redes sociales, los sistemas de distribución de contenido y los sitios de comercio electrónico, entre otros, pertenecen a este tipo de aplicaciones. Éstas corren en instalaciones especializadas denominadas centros de datos. Un elemento importante en la arquitectura de los centros de datos es la infraestructura de comunicaciones, conocida como la red del centro de datos. Un desafio crı́tico que la red de centro de datos tiene que abordar es el procesamiento del tráfico de las aplicaciones, el cual debido a sus propiedades es esencialmente diferente de el de otras aplicaciones de Internet tales como el correo electrónico o los servicios multimediales. Para contribuir al logro de las metas de desempeño de alta capacidad de respuesta y alto desempeño, la red del centro de datos deberı́a ser capaz de diferenciar y priorizar adecuadamente los flujos pequeños (Unos cuantos Kilobytes) con respecto a los flujos grandes (Varios Megabytes). Sin embargo, dadas las variaciones espacio temporales que presenta el tráfico de las aplicaciones, la información de los tamaños de los flujos no está disponible de antemano para poder programar la tranmisión de los mismos. En esta tesis presentamos un mecanismo de conmutación de flujos adaptable y agnóstico con respecto a las cargas de trabajo presentes en la red denominado AWAFS por su sigla en inglés. AWAFS plantea un enfoque adaptable, capaz de ajustar de manera agnóstica la configuración de conmutación al interior de los suiches de la red del centro de datos. Este ajuste agnóstico contribuye a reducir el tiempo de completación de los flujos pequeños, los cuales representan entre un 85 % y un 95 % del tráfico manejado por las aplicaciones. Nuestros resultados muestran que AWAFS puede reducir hasta en un 24 % el tiempo promedio de completación de los flujos cuando se compara con una técnica agnóstica no adaptable presentada en el estado del arte, sin inducir inanicción en los flujos grandes. En efecto, AWAFS puede proporcionar mejoras de hasta un 60 % para los flujos medios y 39 % para los flujos grandes. Por su adaptabilidad, AWAFS también logra obtener esta mejora en escenarios con alta heterogeneidad en el tráfico presente en la red, ofreciendo una reducción de hasta 35 % en el tiempo promedio de completación para los flujos pequeños.Doctorad

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Nacional De Colombia - Repositorio Institucional UN

Homa: A Receiver-Driven Low-Latency Transport Protocol Using Network Priorities (Complete Version)

Author: Alizadeh Mohammad
Li Yilong
Montazeri Behnam
Ousterhout John
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 27/06/2018
Field of study

Homa is a new transport protocol for datacenter networks. It provides exceptionally low latency, especially for workloads with a high volume of very short messages, and it also supports large messages and high network utilization. Homa uses in-network priority queues to ensure low latency for short messages; priority allocation is managed dynamically by each receiver and integrated with a receiver-driven flow control mechanism. Homa also uses controlled overcommitment of receiver downlinks to ensure efficient bandwidth utilization at high load. Our implementation of Homa delivers 99th percentile round-trip times less than 15{\mu}s for short messages on a 10 Gbps network running at 80% load. These latencies are almost 100x lower than the best published measurements of an implementation. In simulations, Homa's latency is roughly equal to pFabric and significantly better than pHost, PIAS, and NDP for almost all message sizes and workloads. Homa can also sustain higher network loads than pFabric, pHost, or PIAS.Comment: This paper is an extended version of the paper on Homa that was published in ACM SIGCOMM 2018. Material had to be removed from Sections 5.1 and 5.2 to meet the SIGCOMM page restrictions; this version restores the missing material. This paper is 18 pages, plus two pages of reference

arXiv.org e-Print Archive

Crossref

DSpace@MIT

Towards A Workload-Driven Flow Scheduler For Modern Datacenters

Author: Naouach Mohamed Malek
Publication venue: 'University of Waterloo'
Publication date: 26/09/2018
Field of study

Modern datacenters run different applications with various communication requirements in terms of bandwidth and deadlines. Of particular interest are deadlines that are driving web-search workloads e.g. when submitting requests to Bing search engine or loading Facebook home page. Serving the submitted requests in a timely fashion relies on meeting the deadlines of the generated scatter/gather flows for each request. The current flow-schedulers are deadline unaware, and they just start flows as soon as they arrive when the bandwidth resource is available. In this thesis, we present Artemis: a workload-driven flow-scheduler at the end-hosts that learns via reinforcement how to schedule flows to meet their deadlines. The flow-scheduling policy in Artemis is not hard-coded and is instead computed in real-time based on a reinforcement-learning control loop. In Artemis, we model flow-scheduling as a deep reinforcement learning problem, and we use the actor-critic architecture to solve it. Flows in Artemis do not start as soon as they arrive, and a source starts sending a particular flow upon requesting and acquiring a token from the destination node. The token-request is issued by the source node and it exposes the flow's requirements to the destination. At the destination side, Artemis flow-scheduler is a decision-making agent that learns how to serve the awaiting token-requests based on their embedded requirements, using the deep reinforcement learning actor-critic model. We use two gather workloads to demonstrate (1) Artemis's ability to learn how to schedule deadline flows on its own and (2) its effectiveness to meet flow deadlines. We compare the performance of Artemis against Earliest Deadline First (EDF), and two other rule-based flow-scheduling policies that, unlike EDF, are aware of both the sizes and the deadlines of the flows: Largest Size Deadline ratio First (LSDF) and Smallest Size Deadline ratio First (SSDF). LSDF schedules arrived flows with largest size deadline ratio first, while LSDF does the inverse logic. Our experimental results show that Artemis flow-scheduler is able to capture the structure of the gather workloads, maps the requirements of the arrived flows to the order at which they need be served and computes a flow-scheduling strategy based on that. Using the first gather workload that has an equal distribution of flows with (size, deadline) pairs that are equal to (350KB, 40ms) and (250KB, 50ms), Artemis met +35.58% more deadlines than EDF, +24.93% more than SSDF, and performed marginally better than LSDF with +4.42%. For the second workload, 60% of flows have a (size, deadline) pair equals to (350KB, 40ms) and 40% flows with (250KB, 50ms), Artemis outperformed all three flows-schedulers, meeting +16.34% more deadlines than the second best SSDF

University of Waterloo's Institutional Repository