127 research outputs found

    Providing flow based performance guarantees for buffered crossbar switches

    Full text link
    Buffered crossbar switches are a special type of com-bined input-output queued switches with each crosspoint of the crossbar having small on-chip buffers. The introduc-tion of crosspoint buffers greatly simplifies the scheduling process of buffered crossbar switches, and furthermore en-ables buffered crossbar switches with speedup of two to eas-ily provide port based performance guarantees. However, recent research results have indicated that, in order to pro-vide flow based performance guarantees, buffered crossbar switches have to either increase the speedup of the cross-bar to three or greatly increase the total number of cross-point buffers, both adding significant hardware complexity. In this paper, we present scheduling algorithms for buffered crossbar switches to achieve flow based performance guar-antees with speedup of two and with only one or two buffers at each crosspoint. When there is no crosspoint blocking in a specific time slot, only the simple and distributed in-put scheduling and output scheduling are necessary. Other-wise, the special urgent matching is introduced to guarantee the on-time delivery of crosspoint blocked cells. With the proposed algorithms, buffered crossbar switches can pro-vide flow based performance guarantees by emulating push-in-first-out output queued switches, and we use the counting method to formally prove the perfect emulation. For the special urgent matching, we present sequential and paral-lel matching algorithms. Both algorithms converge with N iterations in the worst case, and the latter needs less itera-tions in the average case. Finally, we discuss an alternative backup-buffer implementation scheme to the bypass path, and compare our algorithms with existing algorithms in the literature

    A Clos-Network Switch Architecture based on Partially-Buffered Crossbar Fabrics

    Get PDF
    Modern Data Center Networks (DCNs) that scale to thousands of servers require high performance switches/routers to handle high traffic loads with minimum delays. Today’s switches need be scalable, have good performance and -more importantly- be cost-effective. This paper describes a novel threestage Clos-network switching fabric with partially-buffered crossbar modules and different scheduling algorithms. Compared to conventional fully buffered and buffer-less switches, the proposed architecture fits a nice model between both designs and takes the best of both: i) less hardware requirements which considerably reduces both the cost and the implementation complexity, ii) the existence of few internal buffers allows for simple and highperformance scheduling. Two alternative scheduling algorithms are presented. The first is scalable, it disperses the control function over multiple switching elements in the Clos-network. The second is simpler. It places some control on a central scheduler to ensure an ordered packets delivery. Simulations for various switch settings and traffic profiles have shown that the proposed architecture is scalable. It maintains high throughput, low latency performance for less hardware used

    Design and stability analysis of high performance packet switches

    Get PDF
    With the rapid development of optical interconnection technology, high-performance packet switches are required to resolve contentions in a fast manner to satisfy the demand for high throughput and high speed rates. Combined input-crosspoint buffered (CICB) switches are an alternative to input-buffered (IB) packet switches to provide high-performance switching and to relax arbitration timing for packet switches with high-speed ports. A maximum weight matching (MWM) scheme can provide 100% throughput under admissible traffic for lB switches. However, the high complexity of MWM prohibits its implementation in high-speed switches. In this dissertation, a feedback-based arbitration scheme for CICB switches is studied, where cell selection is based on the provided service to virtual output queues (VOQs). The feedback-based scheme is named round-robin with adaptable frame size (RR-AF) arbitration. The frame size in RR-AF is adaptably changed by the serviced and unserviced traffic. If a switch is stable, the switch provides 100% throughput. Here, it is proved that RR-AF can achieve 100% throughput under uniform admissible traffic. Switches with crosspoint buffers need to consider the transmission delays, or round-trip times to define the crosspoint buffer size. As the buffered crossbar switch can be physically located far from the input ports, actual round-trip times can be non-negligible. To support non-negligible round-trip times in a buffered crossbar switch, the crosspoint buffer size needs to be increased. To satisfy this demand, this dissertation investigates how to select the crosspoint buffer size under non-negligible round trip times and under uniform traffic. With the analysis of stability margin, the relationship between the crosspoint buffer size and round-trip time is derived. Considering that CICB switches deliver higher performance than lB switches and require no speedup, this dissertation investigates the maximum throughput performance that these switches can achieve. It is shown that CICB switches without speedup achieve 100% throughput under any admissible traffic through a fluid model. In addition, a new hybrid scheme, based on longest queue-first (as input arbitration) and longest column occupancy first (as output arbitration) is proposed, which achieves 100% throughput under uniform and non-uniform traffic patterns. In order to give a better insight of the feedback nature of arbitration scheme for CICB switches, a frame-based round-robin arbitration scheme with explicit feedback control (FRE) is introduced. FRE dynamically sets the frame size according to the input load and to the accumulation of cells in a VOQ. FRE is used as the input arbitration scheme and it is combined with RR, PRR, and FRE as output arbitration schemes. These combined schemes deliver high performance under uniform and nonuniform traffic models using a buffered crossbar with one-cell crosspoint buffers. The novelty of FRE lies in that each VOQ sets the frame size by an adjustable parameter, Δ(i,j) which indicates the degree of service needed by VOQ(i, j). This value is adjusted according to the input loading and the accumulation of cells experienced in previous service cycles. This dissertation also explores an analysis technique based on feedback control theory. This methodology is proposed to study the stability of arbitration and matching schemes for packet switches. A continuous system is used and a control model is used to emulate a queuing system. The technique is applied to a matching scheme. In addition, the study shows that the dwell time, which is defined as the time a queue receives service in a service opportunity, is a factor that affects the stability of a queuing system. This feedback control model is an alternative approach to evaluate the stability of arbitration and matching schemes

    Architecture design and performance analysis of practical buffered-crossbar packet switches

    Get PDF
    Combined input crosspoint buffered (CICB) packet switches were introduced to relax inputoutput arbitration timing and provide high throughput under admissible traffic. However, the amount of memory required in the crossbar of an N x N switch is N2x k x L, where k is the crosspoint buffer size and needs to be of size RTT in cells, L is the packet size. RTT is the round-trip time which is defined by the distance between line cards and switch fabric. When the switch size is large or RTT is not negligible, the memory amount required makes the implementation costly or infeasible for buffered crossbar switches. To reduce the required memory amount, a family of shared memory combined-input crosspoint-buffered (SMCB) packet switches, where the crosspoint buffers are shared among inputs, are introduced in this thesis. One of the proposed switches uses a memory speedup of in and dynamic memory allocation, and the other switch avoids speedup by arbitrating the access of inputs to the crosspoint buffers. These two switches reduce the required memory of the buffered crossbar by 50% or more and achieve equivalent throughput under independent and identical traffic with uniform distributions when using random selections. The proposed mSMCB switch is extended to support differentiated services and long RTT. To support P traffic classes with different priorities, CICB switches have been reported to use N2x k x L x P amount of memory to avoid blocking of high priority cells.The proposed SMCB switch with support for differentiated services requires 1/mP of the memory amount in the buffered crossbar and achieves similar throughput performance to that of a CICB switch with similar priority management, while using no speedup in the shared memory. The throughput performance of SMCB switch with crosspoint buffers shared by inputs (I-SMCB) is studied under multicast traffic. An output-based shared-memory crosspoint buffered (O-SMCB) packet switch is proposed where the crosspoint buffers are shared by two outputs and use no speedup. The proposed O-SMCB switch provides high performance under admissible uniform and nonuniform multicast traffic models while using 50% of the memory used in CICB switches. Furthermore, the O-SMCB switch provides higher throughput than the I-SMCB switch. As SMCB switches can efficiently support an RTT twice as long as that supported by CICB switches and as the performance of SMCB switches is bounded by a matching between inputs and crosspoint buffers, a new family of CICB switches with flexible access to crosspoint buffers are proposed to support longer RTTs than SMCB switches and to provide higher throughput under a wide variety of admissible traffic models. The CICB switches with flexible access allow an input to use any available crosspoint buffer at a given output. The proposed switches reduce the required crosspoint buffer size by a factor of N , keep the service of cells in sequence, and use no speedup. This new class of switches achieve higher throughput performance than CICB switches under a large variety of traffic models, while supporting long RTTs. Crosspoint buffered switches that are implemented in single chips have limited scalability. To support a large number of ports in crosspoint buffered switches, memory-memory-memory (MMM) Clos-network switches are an alternative. The MMM switches that use minimum memory amount at the central module is studied. Although, this switch can provide a moderate throughput, MMM switch may serve cells out of sequence. As keeping cells in sequence in an MMM switch may require buffers be distributed per flow, an MMM with extended memory in the switch modules is studied. To solve the out of sequence problem in MMM switches, a queuing architecture is proposed for an MMM switch. The service of cells in sequence is analyzed

    Multicast scheduling in feedback-based two-stage switch

    Get PDF
    Proceedings of the IEEE Workshop on High Performance Switching and Routing, 2009, p. 28-33Scalability is of paramount importance in high-speed switch design. Two limiting factors are the complexity of switch fabric and the need for a sophisticated central scheduler. In this paper, we focus on designing a scalable multicast switch. Given the fact that the majority traffic on the Internet is unicast, a cost-effective solution is to adopt a unicast switch fabric for handling both unicast and multicast traffic. Unlike existing approaches, we choose to base our multicast switch design on the load-balanced two-stage switch architecture because it does not require a central scheduler, and its unicast switch fabric only needs to realize N switch configurations. Specifically, we adopt the feedback-based two-stage switch architecture [10], because it elegantly solves the notorious packet mis-sequencing problem, and yet renders an excellent throughput-delay performance. By slightly modifying the operation of the original feedback-based two-stage switch, a simple distributed multicast scheduling algorithm is proposed. Simulation results show that with packet duplication at both input ports and middle-stage ports, the proposed multicast scheduling algorithm significantly cuts down the average packet delay and delay variation among different copies of the same multicast packet. Keywords-Feedback-based two-stage switch, scalable multicast switch, load-balanced switch. © 2009 IEEE.published_or_final_versio

    Size-Based Flow Scheduling in a CICQ Switch

    Get PDF
    In the context of flow-aware networking, size-based (SB) scheduling policies have been shown to improve response times of small flows, without degrading the performance of large flows. But these differentiating policies are designed for Output-queued (OQ) switch architecture, which is known to have scalability issues. On the other hand, the buffered-crossbar (BX) switch architecture is currently being pursued as a potential next-generation scalable switch architecture. This work looks into the problem of performing SB scheduling in BX switches. In particular, the design goals, with respect to each output port, are (i) to transmit high-priority packet(s) as long as there is at least one present, and (ii) to respect the FIFO order among high-priority packets. In this direction, we propose a CICQ switches using a single PIFO queue at each crosspoint to schedule packets according to the priority assigned. pCICQ-1 switch uses a simple design to guarantee that packet-priorities are respected once they are in the crosspoint queues. But it does not maintain the FIFO order of high-priority packets, besides letting a bounded number low-priority packets to depart through an output, when there are one or more high-priority packets for the same output. To solve this, we propose an enhancement in pCICQ-2 switch, that uses a sequence controller to respect packet-priorities as well as arrival order for high-priority packets

    Strong Performance Guarantees for Asynchronous Buffered Crossbar Schedulers

    Get PDF
    Crossbar-based switches are commonly used to implement routers with throughputs up to about 1 Tb/s. The advent of crossbar scheduling algorithms that provide strong performance guarantees now makes it possible to engineer systems that perform well, even under extreme traffic conditions. Until recently, such performance guarantees have only been developed for crossbars that switch cells rather than variable length packets. Cell-based crossbars incur a worst-case bandwidth penalty of up to a factor of two, since they must fragment variable length packets into fixed length cells. In addition, schedulers for cell-based crossbars may fail to deliver the expected performance guarantees when used in routers that forward packets. We show how to obtain performance guarantees for asynchronous crossbars that are directly comparable to those previously developed for synchronous, cell-based crossbars. In particular we define derivatives of the Group by Virtual Output Queue (GVOQ) scheduler of Chuang et al. and the Least Occupied Output First Scheduler of Krishna et al. and show that both can provide strong performance guarantees in systems with speedup 2. Specifically, we show that these schedulers are work-conserving and that they can emulate an output-queued switch using any queueing discipline in the class of restricted Push-In, First-Out queueing disciplines. We also show that there are schedulers for segment-based crossbars, (introduced recently by Katevenis and Passas) that can deliver strong performance guarantees with small buffer requirements and no bandwidth fragmentation
    • …
    corecore