23 research outputs found

    Fastpass: A Centralized “Zero-Queue” Datacenter Network

    Get PDF
    An ideal datacenter network should provide several properties, including low median and tail latency, high utilization (throughput), fair allocation of network resources between users or applications, deadline-aware scheduling, and congestion (loss) avoidance. Current datacenter networks inherit the principles that went into the design of the Internet, where packet transmission and path selection decisions are distributed among the endpoints and routers. Instead, we propose that each sender should delegate control—to a centralized arbiter—of when each packet should be transmitted and what path it should follow. This paper describes Fastpass, a datacenter network architecture built using this principle. Fastpass incorporates two fast algorithms: the first determines the time at which each packet should be transmitted, while the second determines the path to use for that packet. In addition, Fastpass uses an efficient protocol between the endpoints and the arbiter and an arbiter replication strategy for fault-tolerant failover. We deployed and evaluated Fastpass in a portion of Facebook’s datacenter network. Our results show that Fastpass achieves high throughput comparable to current networks at a 240 reduction is queue lengths (4.35 Mbytes reducing to 18 Kbytes), achieves much fairer and consistent flow throughputs than the baseline TCP (5200 reduction in the standard deviation of per-flow throughput with five concurrent connections), scalability from 1 to 8 cores in the arbiter implementation with the ability to schedule 2.21 Terabits/s of traffic in software on eight cores, and a 2.5 reduction in the number of TCP retransmissions in a latency-sensitive service at Facebook.National Science Foundation (U.S.) (grant IIS-1065219)Irwin Mark Jacobs and Joan Klein Jacobs Presidential FellowshipHertz Foundation (Fellowship

    A Survey on Data Plane Programming with P4: Fundamentals, Advances, and Applied Research

    Full text link
    With traditional networking, users can configure control plane protocols to match the specific network configuration, but without the ability to fundamentally change the underlying algorithms. With SDN, the users may provide their own control plane, that can control network devices through their data plane APIs. Programmable data planes allow users to define their own data plane algorithms for network devices including appropriate data plane APIs which may be leveraged by user-defined SDN control. Thus, programmable data planes and SDN offer great flexibility for network customization, be it for specialized, commercial appliances, e.g., in 5G or data center networks, or for rapid prototyping in industrial and academic research. Programming protocol-independent packet processors (P4) has emerged as the currently most widespread abstraction, programming language, and concept for data plane programming. It is developed and standardized by an open community and it is supported by various software and hardware platforms. In this paper, we survey the literature from 2015 to 2020 on data plane programming with P4. Our survey covers 497 references of which 367 are scientific publications. We organize our work into two parts. In the first part, we give an overview of data plane programming models, the programming language, architectures, compilers, targets, and data plane APIs. We also consider research efforts to advance P4 technology. In the second part, we analyze a large body of literature considering P4-based applied research. We categorize 241 research papers into different application domains, summarize their contributions, and extract prototypes, target platforms, and source code availability.Comment: Submitted to IEEE Communications Surveys and Tutorials (COMS) on 2021-01-2

    RDNA: Arquitetura Definida por Resíduos para Redes de Data Centers

    Get PDF
    "Recentemente, temos observado o crescente uso das tecnologias de informação e da comunicação. Instituições e usuários simplesmente necessitam de alta qualidade na conectividade de seus dados, com expectativa de acesso instantâneo a qualquer hora e em qualquer lugar. Um elemento essencial para garantir qualidade na conectividade da nuvem é a arquitetura da rede de comunicação no Data Center (DCNs - Data Center Networks). Isso ocorre porque uma parte significativa do tráfego da Internet é baseada na comunicação de dados e no processamento que acontece dentro da infraestrutura do Data Center (DC). No entanto, os protocolos de roteamento, a forma de encaminhamento e gerenciamento que são executados atualmente, se revelam insuficientes para atender as demandas atuais por conectividade na nuvem. Isto ocorre principalmente pela dependência da operação de busca nas tabelas de encaminhamento, levando à um incremento de latência fim a fim, ademais, mecanismos de recuperação tradicionais utilizam estados adicionais nas tabelas, aumentando a complexidade nas rotinas de gerenciamento, além de reduzir drasticamente a escalabilidade de proteção nas rotas. Outra dificuldade é a comunicação multicast dentro do DC, as soluções existentes são complexas de implementar e não suportam a configuração dos grupos nas taxas atuais requeridas. Neste contexto, essa tese explora o sistema numérico de resíduos centrado no Teorema Chinês do Resto (TCR) como fundamento, aplicado no projeto de um novo sistema de roteamento para DCN. Mais especificamente, introduzimos a arquitetura RDNA que avança o estado da arte a partir de uma simplificação do modelo de encaminhamento para o núcleo, baseado em uma operação de resíduo (resto da divisão). Nesse sentido, a rota é definida como resíduo entre um identificador de rota e identificadores locais (números primos) atribuídos aos switches de núcleo. Os switches de borda, recebem entradas configurando os fluxos de acordo com a política de rede definida pelo controlador. Cada fluxo é mapeado na borda, através de um identificador de rota principal e um emergencial. Essas operações de resíduos permitem encaminhar os pacotes pela respectiva porta de saída. Em situações de falha, o identificador de rota emergencial viabiliza rápida recuperação enviando os pacotes por uma porta de saída alternativa. A RDNA é escalável assumindo uma topologia 2-tier Clos Network amplamente utilizada em DCNs. Com o objetivo de confrontar a RDNA com outros trabalhos da literatura, analisamos a escalabilidade em termos de número de bits necessário para comunicação unicast e multicast. Na análise, variou-se o número de nós na rede, o grau dos nós e o número de hosts físicos para cada topologia. Na comunicação unicast, a RDNA reduziu em 4.5 vezes o tamanho do cabeçalho, comparada à proposta COXCast. Na comunicação multicast, um modelo de programação linear foi concebido para minimizar uma função polinomial. A RDNA reduziu em até 50% o tamanho do cabeçalho comparando com a mesma quantidade de membros por grupo. Como prova de conceito, dois protótipos foram implementados, um no ambiente emulado Mininet e outro na plataforma NetFPGA SUME. Os resultados mostraram que a RDNA alcança latência determinística no encaminhamento dos pacotes, 600 nanosegundos no tempo de comutação por elemento de núcleo, recuperação de falha ultra-rápida na ordem de microssegundos e sem variação de latência (jitter) no núcleo da rede.

    Resilient and Scalable Forwarding for Software-Defined Networks with P4-Programmable Switches

    Get PDF
    Traditional networking devices support only fixed features and limited configurability. Network softwarization leverages programmable software and hardware platforms to remove those limitations. In this context the concept of programmable data planes allows directly to program the packet processing pipeline of networking devices and create custom control plane algorithms. This flexibility enables the design of novel networking mechanisms where the status quo struggles to meet high demands of next-generation networks like 5G, Internet of Things, cloud computing, and industry 4.0. P4 is the most popular technology to implement programmable data planes. However, programmable data planes, and in particular, the P4 technology, emerged only recently. Thus, P4 support for some well-established networking concepts is still lacking and several issues remain unsolved due to the different characteristics of programmable data planes in comparison to traditional networking. The research of this thesis focuses on two open issues of programmable data planes. First, it develops resilient and efficient forwarding mechanisms for the P4 data plane as there are no satisfying state of the art best practices yet. Second, it enables BIER in high-performance P4 data planes. BIER is a novel, scalable, and efficient transport mechanism for IP multicast traffic which has only very limited support of high-performance forwarding platforms yet. The main results of this thesis are published as 8 peer-reviewed and one post-publication peer-reviewed publication. The results cover the development of suitable resilience mechanisms for P4 data planes, the development and implementation of resilient BIER forwarding in P4, and the extensive evaluations of all developed and implemented mechanisms. Furthermore, the results contain a comprehensive P4 literature study. Two more peer-reviewed papers contain additional content that is not directly related to the main results. They implement congestion avoidance mechanisms in P4 and develop a scheduling concept to find cost-optimized load schedules based on day-ahead forecasts

    Doctor of Philosophy

    Get PDF
    dissertationIn the past few years, we have seen a tremendous increase in digital data being generated. By 2011, storage vendors had shipped 905 PB of purpose-built backup appliances. By 2013, the number of objects stored in Amazon S3 had reached 2 trillion. Facebook had stored 20 PB of photos by 2010. All of these require an efficient storage solution. To improve space efficiency, compression and deduplication are being widely used. Compression works by identifying repeated strings and replacing them with more compact encodings while deduplication partitions data into fixed-size or variable-size chunks and removes duplicate blocks. While we have seen great improvements in space efficiency from these two approaches, there are still some limitations. First, traditional compressors are limited in their ability to detect redundancy across a large range since they search for redundant data in a fine-grain level (string level). For deduplication, metadata embedded in an input file changes more frequently, and this introduces more unnecessary unique chunks, leading to poor deduplication. Cloud storage systems suffer from unpredictable and inefficient performance because of interference among different types of workloads. This dissertation proposes techniques to improve the effectiveness of traditional compressors and deduplication in improving space efficiency, and a new IO scheduling algorithm to improve performance predictability and efficiency for cloud storage systems. The common idea is to utilize similarity. To improve the effectiveness of compression and deduplication, similarity in content is used to transform an input file into a compression- or deduplication-friendly format. We propose Migratory Compression, a generic data transformation that identifies similar data in a coarse-grain level (block level) and then groups similar blocks together. It can be used as a preprocessing stage for any traditional compressor. We find metadata have a huge impact in reducing the benefit of deduplication. To isolate the impact from metadata, we propose to separate metadata from data. Three approaches are presented for use cases with different constrains. For the commonly used tar format, we propose Migratory Tar: a data transformation and also a new tar format that deduplicates better. We also present a case study where we use deduplication to reduce storage consumption for storing disk images, while at the same time achieving high performance in image deployment. Finally, we apply the same principle of utilizing similarity in IO scheduling to prevent interference between random and sequential workloads, leading to efficient, consistent, and predictable performance for sequential workloads and a high disk utilization
    corecore