10 research outputs found

    Feedback-based scheduling for load-balanced two-stage switches

    Get PDF
    A framework for designing feedback-based scheduling algorithms is proposed for elegantly solving the notorious packet missequencing problem of a load-balanced switch. Unlike existing approaches, we show that the efforts made in load balancing and keeping packets in order can complement each other. Specifically, at each middle-stage port between the two switch fabrics of a load-balanced switch, only a single-packet buffer for each virtual output queueing (VOQ) is required. Although packets belonging to the same flow pass through different middle-stage VOQs, the delays they experience at different middle-stage ports will be identical. This is made possible by properly selecting and coordinating the two sequences of switch configurations to form a joint sequence with both staggered symmetry property and in-order packet delivery property. Based on the staggered symmetry property, an efficient feedback mechanism is designed to allow the right middle-stage port occupancy vector to be delivered to the right input port at the right time. As a result, the performance of load balancing as well as the switch throughput is significantly improved. We further extend this feedback mechanism to support the multicabinet implementation of a load-balanced switch, where the propagation delay between switch linecards and switch fabrics is nonnegligible. As compared to the existing load-balanced switch architectures and scheduling algorithms, our solutions impose a modest requirement on switch hardware, but consistently yield better delay-throughput performance. Last but not least, some extensions and refinements are made to address the scalability, implementation, and fairness issues of our solutions. © 2009 IEEE.published_or_final_versio

    Congestion-Aware Multistage Packet-Switch Architecture for Data Center Networks

    Get PDF
    Data Center Networks (DCNs) have gone through major evolutionary changes over the past decades. Yet, it is still difficult to predict loads fluctuation and congestion spikes in the network switching fabric. Conventional multistage switches/routers used in data center fabrics barely deal with load balancing. Congestion management is often processed at the edge modules. However, neither the architecture of switches/routers, nor their inner routing algorithms tend to consider traffic balancing and congestion management. In this paper, we propose a flexible design of a scalable multistage switch with crossconnected UniDirectional Network-on-Chip based central blocs (UDNs). We also introduce a congestion-aware routing to forward packets adaptively. We compare the current switch architecture to the state-of-the art previous multistage switches under different traffic types. Simulations of various switch settings have shown that the proposed architecture maintains high throughput and low latency performance

    Work-Conserving Distributed Schedulers

    Get PDF
    Buffered multistage interconnection networks offer one of the most scalable and cost-effective approaches to building high capacity routers and switches. Unfortunately, the performance of such systems has been difficult to predict in the presence of the extreme traffic conditions that can arise in Internet routers. Recent work introduced the idea of distributed scheduling, to regulate the flow of traffic in such systems. This work demonstrated (using simulation and experimental measurements) that distributed scheduling can en-able robust performance, even in the presence of adversarial traffic patterns. In this paper, we show that appropriately designed distributed scheduling algorithms are provably work-conserving for speedups of 2 or more. Two of the three algorithms presented were inspired by algorithms previously developed for crossbar scheduling. The third has no direct counterpart in the crossbar scheduling context. In our analysis, we show that distributed schedulers based on blocking flows in small-depth acyclic flow graphs can be work-conserving, just as certain crossbar schedulers based on maximal bipartite matchings have been shown to be work-conserving. We also study the performance of practical variants of the work-conserving algorithms with speedups less than 2, using simulation. These studies demonstrate that distributed scheduling ensures excellent performance under extreme traffic conditions for speedups of less than 1.5

    Multistage Packet-Switching Fabrics for Data Center Networks

    Get PDF
    Recent applications have imposed stringent requirements within the Data Center Network (DCN) switches in terms of scalability, throughput and latency. In this thesis, the architectural design of the packet-switches is tackled in different ways to enable the expansion in both the number of connected endpoints and traffic volume. A cost-effective Clos-network switch with partially buffered units is proposed and two packet scheduling algorithms are described. The first algorithm adopts many simple and distributed arbiters, while the second approach relies on a central arbiter to guarantee an ordered packet delivery. For an improved scalability, the Clos switch is build using a Network-on-Chip (NoC) fabric instead of the common crossbar units. The Clos-UDN architecture made with Input-Queued (IQ) Uni-Directional NoC modules (UDNs) simplifies the input line cards and obviates the need for the costly Virtual Output Queues (VOQs). It also avoids the need for complex, and synchronized scheduling processes, and offers speedup, load balancing, and good path diversity. Under skewed traffic, a reliable micro load-balancing contributes to boosting the overall network performance. Taking advantage of the NoC paradigm, a wrapped-around multistage switch with fully interconnected Central Modules (CMs) is proposed. The architecture operates with a congestion-aware routing algorithm that proactively distributes the traffic load across the switching modules, and enhances the switch performance under critical packet arrivals. The implementation of small on-chip buffers has been made perfectly feasible using the current technology. This motivated the implementation of a large switching architecture with an Output-Queued (OQ) NoC fabric. The design merges assets of the output queuing, and NoCs to provide high throughput, and smooth latency variations. An approximate analytical model of the switch performance is also proposed. To further exploit the potential of the NoC fabrics and their modularity features, a high capacity Clos switch with Multi-Directional NoC (MDN) modules is presented. The Clos-MDN switching architecture exhibits a more compact layout than the Clos-UDN switch. It scales better and faster in port count and traffic load. Results achieved in this thesis demonstrate the high performance, expandability and programmability features of the proposed packet-switches which makes them promising candidates for the next-generation data center networking infrastructure

    Multistage Packet-Switching Fabrics for Data Center Networks

    Get PDF
    Recent applications have imposed stringent requirements within the Data Center Network (DCN) switches in terms of scalability, throughput and latency. In this thesis, the architectural design of the packet-switches is tackled in different ways to enable the expansion in both the number of connected endpoints and traffic volume. A cost-effective Clos-network switch with partially buffered units is proposed and two packet scheduling algorithms are described. The first algorithm adopts many simple and distributed arbiters, while the second approach relies on a central arbiter to guarantee an ordered packet delivery. For an improved scalability, the Clos switch is build using a Network-on-Chip (NoC) fabric instead of the common crossbar units. The Clos-UDN architecture made with Input-Queued (IQ) Uni-Directional NoC modules (UDNs) simplifies the input line cards and obviates the need for the costly Virtual Output Queues (VOQs). It also avoids the need for complex, and synchronized scheduling processes, and offers speedup, load balancing, and good path diversity. Under skewed traffic, a reliable micro load-balancing contributes to boosting the overall network performance. Taking advantage of the NoC paradigm, a wrapped-around multistage switch with fully interconnected Central Modules (CMs) is proposed. The architecture operates with a congestion-aware routing algorithm that proactively distributes the traffic load across the switching modules, and enhances the switch performance under critical packet arrivals. The implementation of small on-chip buffers has been made perfectly feasible using the current technology. This motivated the implementation of a large switching architecture with an Output-Queued (OQ) NoC fabric. The design merges assets of the output queuing, and NoCs to provide high throughput, and smooth latency variations. An approximate analytical model of the switch performance is also proposed. To further exploit the potential of the NoC fabrics and their modularity features, a high capacity Clos switch with Multi-Directional NoC (MDN) modules is presented. The Clos-MDN switching architecture exhibits a more compact layout than the Clos-UDN switch. It scales better and faster in port count and traffic load. Results achieved in this thesis demonstrate the high performance, expandability and programmability features of the proposed packet-switches which makes them promising candidates for the next-generation data center networking infrastructure

    Load Balancing for the Agile All-Photonic Network

    Get PDF
    The Agile All-Photonic Network (AAPN) uses Time Division Multiplexing (TDM) to better utilize the bandwidth of Wavelength Division Multiplexing (WDM) systems. It uses agile all-photonic switches as advances in the photonic switching technology made the design of all-photonic devices with switching latency in the sub-microseconds feasible. The network has a simplified overlaid star architecture that can be deployed in a Metropolitan Area Network (MAN) or a Wide Area Network (WAN) environment. This overlaid architecture, as opposed to general mesh architecture, scales network capacity to multiples of Tera bits per second, simplif�ies routing, increases reliability, eliminates wavelength conversion, and the need for accurate traffic engineering. The objective of this thesis is to propose and analyze dif�ferent load balancing methods for the deployment of the AAPN network in a WAN environment. The analysis should provide interested Internet Service Providers (ISPs) with a comprehensive study of load balancing methods for using the AAPN network as their backbone network. The methods balance the load at the ow level to reduce packet reordering. The methods are stateless and can compute routes quickly based on the packet flow identi�er. This is an important issue when deploying AAPN as an Internet backbone network where the number of flows is large and storing ow state in lookup tables can limit the network performance. The load balancing methods, deployed at the edge nodes, require reliable signaling with the bandwidth schedulers at the core nodes. To provide a reliable channel between the edge and core nodes, the Control Messages Delivery Protocol (CMDP) is proposed as part of this thesis work. The protocol is designed to work in environments where propagation delays are long and/or the error rates are high. It is used to deliver a burst of short messages in sequence and with no errors. Combined with the reliable routing protocol proposed previously for the AAPN network, they form the control plane for the network. To extend the applicability of the load balancing methods to topologies beyond AAPN overlaid star topology, the Valiant Load Balancing (VLB) method is used to build an overlaid star topology on top of the physical network. The VLB method provides guaranteed performance for highly variable tra�c matrices within the hose traffic model constraints. In addition to the guaranteed performance, deploying the VLB method in the AAPN network, eliminates signaling and replaces the dynamic core schedulers with static scheduler that can accommodate all tra�c matrices within the hose tra�c model boundaries

    Load balanced Birkhoff-von Neumann switches with resequencing

    No full text
    corecore