4 research outputs found

    Configurable data center switch architectures

    Get PDF
    In this thesis, we explore alternative architectures for implementing con_gurable Data Center Switches along with the advantages that can be provided by such switches. Our first contribution centers around determining switch architectures that can be implemented on Field Programmable Gate Array (FPGA) to provide configurable switching protocols. In the process, we identify a gap in the availability of frameworks to realistically evaluate the performance of switch architectures in data centers and contribute a simulation framework that relies on realistic data center traffic patterns. Our framework is then used to evaluate the performance of currently existing as well as newly proposed FPGA-amenable switch designs. Through collaborative work with Meng and Papaphilippou, we establish that only small-medium range switches can be implemented on today's FPGAs. Our second contribution is a novel switch architecture that integrates a custom in-network hardware accelerator with a generic switch to accelerate Deep Neural Network training applications in data centers. Our proposed accelerator architecture is prototyped on an FPGA, and a scalability study is conducted to demonstrate the trade-offs of an FPGA implementation when compared to an ASIC implementation. In addition to the hardware prototype, we contribute a light weight load-balancing and congestion control protocol that leverages the unique communication patterns of ML data-parallel jobs to enable fair sharing of network resources across different jobs. Our large-scale simulations demonstrate the ability of our novel switch architecture and light weight congestion control protocol to both accelerate the training time of machine learning jobs by up to 1.34x and benefit other latency-sensitive applications by reducing their 99%-tile completion time by up to 4.5x. As for our final contribution, we identify the main requirements of in-network applications and propose a Network-on-Chip (NoC)-based architecture for supporting a heterogeneous set of applications. Observing the lack of tools to support such research, we provide a tool that can be used to evaluate NoC-based switch architectures.Open Acces

    Implémentation d'une mémoire cache supportant la recherche IP

    Get PDF
    RÉSUMÉ La croissance explosive du trafic Internet a été accompagnée d’une croissance exponentielle de la bande passante des liens de transmission des équipements de traitement de données, à savoir les routeurs, rendue possible par le déploiement de la fibre optique. Les routeurs sont devenus le principal goulot d’étranglement du traitement des paquets circulant dans le réseau. L’implémentation matérielle permet aux routeurs de satisfaire les exigences de performance d’un routeur à haute vitesse en exploitant l’abondant parallélisme disponible dans le traitement des paquets. Les ASIC (« Application-Specific Integrated Circuits »), qui sont des circuits intégrés spécialisés, sont alors utilisés pour leur performance. Ces circuits induisent cependant des coûts : comme les routeurs sont construits à partir de matériel spécialisé, ce sont des périphériques à fonction fixe qui ne peuvent pas être programmés. Beaucoup d’efforts et de travaux ont été réalisés afin de rendre les ASIC plus programmables, cependant, cela est encore insuffisant pour exprimer de nombreux algorithmes. Dû à la nécessité de mieux contrôler les opérations du réseau et à la demande constante d’offrir de nouvelles fonctionnalités, la contrainte de la programmabilité des routeurs est devenue aussi importante que la performance. Les routeurs logiciels sont considérés plus appropriés dans le contexte où la programmabilité prime sur la performance et ils peuvent bénéficier d’une performance intéressante, comme l’incarnent les processeurs de réseau. Les processeurs réseau utilisent des accélérateurs matériels pour implémenter des fonctions spécifiques comme l’utilisation des mémoires TCAM (« Ternary Content Addressable Memory ») pour effectuer des recherches de types LPM (« Longest Prefix Match »), nécessaires pour la transmission des paquets. La TCAM satisfait le requis de débit exigé par le LPM, qui est le facteur de performance le plus limitant dans la transmission de paquets au vu de sa complexité, et elle s’est imposée comme la solution standard dans l’industrie. Cependant, elle présente des inconvénients graves : sa consommation d’énergie élevée, sa faible flexibilité (opérations de mise à jour lente) et son coût financier (coût par bit plus élevé par rapport aux autres types de mémoire). La consommation d’énergie de la TCAM est critique dans les routeurs, qui ont des budgets de puissance énergétique limités.----------ABSTRACT The Internet’s traffic growth has been accompanied by an exponential growth in the bandwidth of the data processing equipment transmission links, made possible by the deployment of optical fiber. Routers have, therefore, become the main bottleneck in the packets processing speed across the network. The hardware implementation allows routers to meet performance requirements of a high-speed router by exploiting the abundant parallelism available in packet processing. ASICs (Application-Specific Integrated Circuit) are specialized integrated circuits used for their performance, but they have a cost: as routers are built from specialized hardware, they are fixed-function devices that do not cannot be programmed. Much effort and work has been done to make ASICs more programmable, however, this is still insufficient to express many algorithms. Due to the need to better control network operations and the constant demand to support new features, the constraint of router programmability has become as important as performance. Hardware specification is the only way to achieve performance requirements for high-speed routers, however some contexts involve high computing requirements with lower link speeds. Software routers are considered more appropriate in the context where programmability takes precedence over performance and can benefit from interesting performance, as incarnated by network processors. Network processors use hardware accelerators to implement specific functions like TCAM (Ternary Content Addressable Memory) memories to perform LPM (Longest Prefix Match) lookup, required for packet transmission. The TCAM meets the speed required by the LPM, which is the most limiting performance factor in packet transmission due to its complexity, and has been an effective standard in the industry. However, TCAMs have some serious disadvantages: their high-power consumption, their poor scalability and their higher cost per bit compared to other memory types. The motivation of our work is to explore the concept of a generalized cache memory that can support the LPM. The on-chip memory of a network processor acts as a cache memory and is implemented using SRAM technology (Static Random-Access Memory) which is a much less expensive memory than TCAM memory. In order to speed up LPM lookup, the cache memory stores the prefixes consulted recently, in order to reduce the access time to the routing table. In this thesis, an architecture is proposed which relies on associative memories implemented by hash functions

    Flexible and intelligent network programming for cloud networks

    Get PDF
    As modern online services are evolving promptly and involving larger amount of data and computation than ever, the demand for cloud networks keeps growing rapidly, which also brings new challenges to network programming. Network programming is a complicated and crucial task for building high-performance cloud networks. Current network programming mainly presents two shortcomings: (1) it is inflexible as adding new data-plane features usually takes several years; (2) it is unintelligent as it heavily depends on human-designed heuristic algorithms to solve production-scale problems. To overcome these shortcomings, this dissertation realizes flexible and intelligent network programming by leveraging the recent development of new technologies both in hardware and software. Specifically, it presents four systems with new performance features that cannot be achieved by conventional network programming: (i) Harmonia: A new replicated storage architecture that provides near-linear scalability without sacrificing consistency. By exploiting the programming flexibility of new-generation programmable switches, Harmonia checks read-write conflicts in network for guaranteeing consistency, and enables any replica to serve reads for objects with no pending writes for near-linear scalability. (ii) RackSched: A microsecond-scale scheduler for rack-scale computers. It proposes a two-layer scheduling framework that integrates the inter-server scheduler in the top-of-rack (ToR) switch with intra-server schedulers on each server. The in-network inter-server scheduler is programmed to realize power-of-k-choices, ensure request affinity, and track server loads accurately and efficiently. (iii) NetVRM: A network management system that supports dynamic register memory sharing in the network. It orchestrates the register memory allocation between multiple concurrent network applications to optimize the multiplexing benefits. This goal is achieved with three major features: a virtual register memory abstraction, a dynamic memory allocation algorithm, and a domain-specific programming language extension. (iv) NeuroPlan: Automated and efficient network planning with deep reinforcement learning (RL). It leverages a two-stage hybrid approach that first uses deep RL to prune a large and complex search space and then uses an Integer Linear Programming (ILP) solver to find the final solution. Such an automated approach avoids human efforts to design heuristic algorithms manually and reduces network plan cost efficiently. We have done theoretical analysis, built testbeds, and evaluated these systems with prototype experiments and simulations under realistic setups from production networks
    corecore