2,736 research outputs found

    High-radix Packet-Switching Architecture for Data Center Networks

    Get PDF
    We propose a highly scalable packet-switching architecture that suits for demanding Data center Networks (DCNs). The design falls into the category of buffered multistage switches. It affiliates a three-stage Clos-network and the Networks-on-Chip (NoC) paradigm. We also suggest a congestion-aware routing algorithm that shares the traffic load among the switch's central modules via interleaved connecting links. Unlike conventional switches, the current proposal provides better path diversity, simple scheduling, speedup and robustness to load variation. Simulation results show that the switch is scalable with the portcount and traffic fluctuation, and that it outperforms different switches under many traffic patterns

    High-Capacity Clos-Network Switch for Data Center Networks

    Get PDF
    Scaling-up Data Center Networks (DCNs) should be done at the network level as well as the switching elements level. The glaring reason for this, is that switches/routers deployed in the DCN can bound the network capacity and affect its performance if improperly chosen. Many multistage switching architectures have been proposed to fit for the next-generation networking needs. However all of them are either performance limited or too complex to be implemented. Targeting scalability and performance, we propose the design of a large-capacity switch in which we affiliate a multistage design with a Networks-on- Chip (NoC) design. The proposal falls into the category of buffered multistage switches. Still, it has a different architectural aspect and scheduling process. Dissimilar to common point-to-point crossbars, NoCs used at the heart of the three-stage Clos-network allow multiple packets simultaneously in the modules where they can be adaptively transported using a pipelined scheduling scheme. Our simulations show that the switch scales well with the load and size variation. It outperforms a variety of architectures under a range of traffic arrivals

    Multistage Switching Architectures for Software Routers

    Get PDF
    Software routers based on personal computer (PC) architectures are becoming an important alternative to proprietary and expensive network devices. However, software routers suffer from many limitations of the PC architecture, including, among others, limited bus and central processing unit (CPU) bandwidth, high memory access latency, limited scalability in terms of number of network interface cards, and lack of resilience mechanisms. Multistage PC-based architectures can be an interesting alternative since they permit us to i) increase the performance of single software routers, ii) scale router size, iii) distribute packet manipulation and control functionality, iv) recover from single-component failures, and v) incrementally upgrade router performance. We propose a specific multistage architecture, exploiting PC-based routers as switching elements, to build a high-speed, largesize,scalable, and reliable software router. A small-scale prototype of the multistage router is currently up and running in our labs, and performance evaluation is under wa

    Configurable data center switch architectures

    Get PDF
    In this thesis, we explore alternative architectures for implementing con_gurable Data Center Switches along with the advantages that can be provided by such switches. Our first contribution centers around determining switch architectures that can be implemented on Field Programmable Gate Array (FPGA) to provide configurable switching protocols. In the process, we identify a gap in the availability of frameworks to realistically evaluate the performance of switch architectures in data centers and contribute a simulation framework that relies on realistic data center traffic patterns. Our framework is then used to evaluate the performance of currently existing as well as newly proposed FPGA-amenable switch designs. Through collaborative work with Meng and Papaphilippou, we establish that only small-medium range switches can be implemented on today's FPGAs. Our second contribution is a novel switch architecture that integrates a custom in-network hardware accelerator with a generic switch to accelerate Deep Neural Network training applications in data centers. Our proposed accelerator architecture is prototyped on an FPGA, and a scalability study is conducted to demonstrate the trade-offs of an FPGA implementation when compared to an ASIC implementation. In addition to the hardware prototype, we contribute a light weight load-balancing and congestion control protocol that leverages the unique communication patterns of ML data-parallel jobs to enable fair sharing of network resources across different jobs. Our large-scale simulations demonstrate the ability of our novel switch architecture and light weight congestion control protocol to both accelerate the training time of machine learning jobs by up to 1.34x and benefit other latency-sensitive applications by reducing their 99%-tile completion time by up to 4.5x. As for our final contribution, we identify the main requirements of in-network applications and propose a Network-on-Chip (NoC)-based architecture for supporting a heterogeneous set of applications. Observing the lack of tools to support such research, we provide a tool that can be used to evaluate NoC-based switch architectures.Open Acces

    Fabric-on-a-Chip: Toward Consolidating Packet Switching Functions on Silicon

    Get PDF
    The switching capacity of an Internet router is often dictated by the memory bandwidth required to bu¤er arriving packets. With the demand for greater capacity and improved service provisioning, inherent memory bandwidth limitations are encountered rendering input queued (IQ) switches and combined input and output queued (CIOQ) architectures more practical. Output-queued (OQ) switches, on the other hand, offer several highly desirable performance characteristics, including minimal average packet delay, controllable Quality of Service (QoS) provisioning and work-conservation under any admissible traffic conditions. However, the memory bandwidth requirements of such systems is O(NR), where N denotes the number of ports and R the data rate of each port. Clearly, for high port densities and data rates, this constraint dramatically limits the scalability of the switch. In an effort to retain the desirable attributes of output-queued switches, while significantly reducing the memory bandwidth requirements, distributed shared memory architectures, such as the parallel shared memory (PSM) switch/router, have recently received much attention. The principle advantage of the PSM architecture is derived from the use of slow-running memory units operating in parallel to distribute the memory bandwidth requirement. At the core of the PSM architecture is a memory management algorithm that determines, for each arriving packet, the memory unit in which it will be placed. However, to date, the computational complexity of this algorithm is O(N), thereby limiting the scalability of PSM switches. In an effort to overcome the scalability limitations, it is the goal of this dissertation to extend existing shared-memory architecture results while introducing the notion of Fabric on a Chip (FoC). In taking advantage of recent advancements in integrated circuit technologies, FoC aims to facilitate the consolidation of as many packet switching functions as possible on a single chip. Accordingly, this dissertation introduces a novel pipelined memory management algorithm, which plays a key role in the context of on-chip output- queued switch emulation. We discuss in detail the fundamental properties of the proposed scheme, along with hardware-based implementation results that illustrate its scalability and performance attributes. To complement the main effort and further support the notion of FoC, we provide performance analysis of output queued cell switches with heterogeneous traffic. The result is a flexible tool for obtaining bounds on the memory requirements in output queued switches under a wide range of tra¢ c scenarios. Additionally, we present a reconfigurable high-speed hardware architecture for real-time generation of packets for the various traffic scenarios. The work presented in this thesis aims at providing pragmatic foundations for designing next-generation, high-performance Internet switches and routers

    A Scalable Packet-Switch Based on Output-Queued NoCs for Data Centre Networks

    Get PDF
    The switch fabric in a Data-Center Network (DCN) handles constantly variable loads. This is stressing the need for high-performance packet switches able to keep pace with climbing throughput while maintaining resiliency and scalability. Conventional multistage switches with their space-memory variants proved to be performance limited as they do not scale well with the proliferating DC requirements. Most proposals are either too complex to implement or not cost effective. In this paper, we present a highly scalable multistage switching architecture for DC switching fabrics. We describe a three-stage Clos packet-switch fabric with Output-Queued Unidirectional NoC (OQ-UDN) modules and Round-Robin packets dispatching scheme. The proposed OQ Clos-UDN architecture avoids the need for complex and costly input modules and simplifies the scheduling process. Thanks to a dynamic packets dispatching and the multi-hop nature of the UDN modules, the switch provides load balancing and path-diversity. We compared our proposed architecture to state-of-the art previous architectures under extensive uniform and non-uniform DC traffic settings. Simulations of various switch settings have shown that the proposed OQ Clos-UDN outperforms previous proposals and maintains high throughput and latency performance

    Quarc: a high-efficiency network on-chip architecture

    Get PDF
    The novel Quarc NoC architecture, inspired by the Spidergon scheme is introduced as a NoC architecture that is highly efficient in performing collective communication operations including broadcast and multicast. The efficiency of the Quarc architecture is achieved through balancing the traffic which is the result of the modifications applied to the topology and the routing elements of the Spidergon NoC. This paper provides an ASIC implementation of both architectures using UMCpsilas 0.13 mum CMOS technology and demonstrates an analysis and comparison of the cost and performance between the Quarc and the Spidergon NoCs
    corecore