199 research outputs found

    Strong Performance Guarantees for Asynchronous Buffered Crossbar Schedulers

    Get PDF
    Crossbar-based switches are commonly used to implement routers with throughputs up to about 1 Tb/s. The advent of crossbar scheduling algorithms that provide strong performance guarantees now makes it possible to engineer systems that perform well, even under extreme traffic conditions. Until recently, such performance guarantees have only been developed for crossbars that switch cells rather than variable length packets. Cell-based crossbars incur a worst-case bandwidth penalty of up to a factor of two, since they must fragment variable length packets into fixed length cells. In addition, schedulers for cell-based crossbars may fail to deliver the expected performance guarantees when used in routers that forward packets. We show how to obtain performance guarantees for asynchronous crossbars that are directly comparable to those previously developed for synchronous, cell-based crossbars. In particular we define derivatives of the Group by Virtual Output Queue (GVOQ) scheduler of Chuang et al. and the Least Occupied Output First Scheduler of Krishna et al. and show that both can provide strong performance guarantees in systems with speedup 2. Specifically, we show that these schedulers are work-conserving and that they can emulate an output-queued switch using any queueing discipline in the class of restricted Push-In, First-Out queueing disciplines. We also show that there are schedulers for segment-based crossbars, (introduced recently by Katevenis and Passas) that can deliver strong performance guarantees with small buffer requirements and no bandwidth fragmentation

    Network Coding in a Multicast Switch

    Full text link
    We consider the problem of serving multicast flows in a crossbar switch. We show that linear network coding across packets of a flow can sustain traffic patterns that cannot be served if network coding were not allowed. Thus, network coding leads to a larger rate region in a multicast crossbar switch. We demonstrate a traffic pattern which requires a switch speedup if coding is not allowed, whereas, with coding the speedup requirement is eliminated completely. In addition to throughput benefits, coding simplifies the characterization of the rate region. We give a graph-theoretic characterization of the rate region with fanout splitting and intra-flow coding, in terms of the stable set polytope of the 'enhanced conflict graph' of the traffic pattern. Such a formulation is not known in the case of fanout splitting without coding. We show that computing the offline schedule (i.e. using prior knowledge of the flow arrival rates) can be reduced to certain graph coloring problems. Finally, we propose online algorithms (i.e. using only the current queue occupancy information) for multicast scheduling based on our graph-theoretic formulation. In particular, we show that a maximum weighted stable set algorithm stabilizes the queues for all rates within the rate region.Comment: 9 pages, submitted to IEEE INFOCOM 200

    Scheduling algorithms for high-speed switches

    Get PDF
    The virtual output queued (VOQ) switching architecture was adopted for high speed switch implementation owing to its scalability and high throughput. An ideal VOQ algorithm should provide Quality of Service (QoS) with low complexity. However, none of the existing algorithms can meet these requirements. Several algorithms for VOQ switches are introduced in this dissertation in order to improve upon existing algorithms in terms of implementation or QoS features. Initially, the earliest due date first matching (EDDFM) algorithm, which is stable for both uniform and non-uniform traffic patterns, is proposed. EDDFM has lower probability of cell overdue than other existing maximum weight matching algorithms. Then, the shadow departure time algorithm (SDTA) and iterative SDTA (ISDTA) are introduced. The QoS features of SDTA and ISDTA are better than other existing algorithms with the same computational complexity. Simulations show that the performance of a VOQ switch using ISDTA with a speedup of 1.5 is similar to that of an output queued (OQ) switch in terms of cell delay and throughput. Later, the enhanced Birkhoff-von Neumann decomposition (EBVND) algorithm based on the Birkhoff-von Neumann decomposition (BVND) algorithm, which can provide rate and cell delay guarantees, is introduced. Theoretical analysis shows that the performance of EBVND is better than BVND in terms of throughput and cell delay. Finally, the maximum credit first (MCF), the Enhanced MCF (EMCF), and the iterative MCF (IMCF) algorithms are presented. These new algorithms have the similar performance as BNVD, yet are easier to implement in practice

    A Multi-Stage Packet-Switch Based on NoC Fabrics for Data Center Networks

    Get PDF
    Bandwidth-hungry applications such as Cloud computing, video sharing and social networking drive the creation of more powerful Data Centers (DCs) to manage the large amount of packetized traffic. Data center network (DCN) topologies rely on thousands of servers that exchange data via the switching backbone. Cluster switches and routers are employed to provide interconnectivity between elements of the same DC and inter DCs and must be able to handle the continuously variable loads. Hence, robust and scalable switching modules are needed. Conventional DCN switches adopt crossbars or/and blocks of memories in multistage interconnection architectures (commonly 2-Tiers or 3-Tiers). However, current multistage packet switch architectures, with their space-memory variants, are either too complex to implement, have poor performance, or not cost effective. In this paper, we propose a novel and highly scalable multistage packet-switch design based on Networks-on-Chip (NoC) fabrics for DCNs. In particular, we describe a novel three-stage packet-switch fabric with a Round-Robin packets dispatching scheme where each central stage module is based on a Unidirectional NoC (UDN), instead of a conventional single hop crossbar fabric. The proposed design, referred to as Clos- UDN, overcomes all the shortcomings of conventional multistage architectures. In particular, as we shall demonstrate, the proposed Clos-UDN architecture: (i) Obviates the need for a complex and costly input modules, by means of few, yet simple, input FIFO queues. (ii) Avoids the need for a complex and synchronized scheduling process over a high number of input-output modules and/or port pairs. (iii) Provides speedup, load balancing and path-diversity thanks to a dynamic dispatching scheme as well as the NoC based fabric nature. Extensive simulation studies are conducted to compare the proposed Clos-UDN switch to conventional multistage switches. Simulation results show that the Clos-UDN outperforms conventional design under a wide range of input traffic scenarios, making it highly appealing for ultra-high capacity DC networks

    Design and stability analysis of high performance packet switches

    Get PDF
    With the rapid development of optical interconnection technology, high-performance packet switches are required to resolve contentions in a fast manner to satisfy the demand for high throughput and high speed rates. Combined input-crosspoint buffered (CICB) switches are an alternative to input-buffered (IB) packet switches to provide high-performance switching and to relax arbitration timing for packet switches with high-speed ports. A maximum weight matching (MWM) scheme can provide 100% throughput under admissible traffic for lB switches. However, the high complexity of MWM prohibits its implementation in high-speed switches. In this dissertation, a feedback-based arbitration scheme for CICB switches is studied, where cell selection is based on the provided service to virtual output queues (VOQs). The feedback-based scheme is named round-robin with adaptable frame size (RR-AF) arbitration. The frame size in RR-AF is adaptably changed by the serviced and unserviced traffic. If a switch is stable, the switch provides 100% throughput. Here, it is proved that RR-AF can achieve 100% throughput under uniform admissible traffic. Switches with crosspoint buffers need to consider the transmission delays, or round-trip times to define the crosspoint buffer size. As the buffered crossbar switch can be physically located far from the input ports, actual round-trip times can be non-negligible. To support non-negligible round-trip times in a buffered crossbar switch, the crosspoint buffer size needs to be increased. To satisfy this demand, this dissertation investigates how to select the crosspoint buffer size under non-negligible round trip times and under uniform traffic. With the analysis of stability margin, the relationship between the crosspoint buffer size and round-trip time is derived. Considering that CICB switches deliver higher performance than lB switches and require no speedup, this dissertation investigates the maximum throughput performance that these switches can achieve. It is shown that CICB switches without speedup achieve 100% throughput under any admissible traffic through a fluid model. In addition, a new hybrid scheme, based on longest queue-first (as input arbitration) and longest column occupancy first (as output arbitration) is proposed, which achieves 100% throughput under uniform and non-uniform traffic patterns. In order to give a better insight of the feedback nature of arbitration scheme for CICB switches, a frame-based round-robin arbitration scheme with explicit feedback control (FRE) is introduced. FRE dynamically sets the frame size according to the input load and to the accumulation of cells in a VOQ. FRE is used as the input arbitration scheme and it is combined with RR, PRR, and FRE as output arbitration schemes. These combined schemes deliver high performance under uniform and nonuniform traffic models using a buffered crossbar with one-cell crosspoint buffers. The novelty of FRE lies in that each VOQ sets the frame size by an adjustable parameter, Δ(i,j) which indicates the degree of service needed by VOQ(i, j). This value is adjusted according to the input loading and the accumulation of cells experienced in previous service cycles. This dissertation also explores an analysis technique based on feedback control theory. This methodology is proposed to study the stability of arbitration and matching schemes for packet switches. A continuous system is used and a control model is used to emulate a queuing system. The technique is applied to a matching scheme. In addition, the study shows that the dwell time, which is defined as the time a queue receives service in a service opportunity, is a factor that affects the stability of a queuing system. This feedback control model is an alternative approach to evaluate the stability of arbitration and matching schemes

    On the Stability of Isolated and Interconnected Input-Queued Switches under Multiclass Traffic

    Get PDF
    In this correspondence, we discuss the stability of scheduling algorithms for input-queueing (IQ) and combined input/output queueing (CIOQ) packet switches. First, we show that a wide class of IQ schedulers operating on multiple traffic classes can achieve 100 % throughput. Then, we address the problem of the maximum throughput achievable in a network of interconnected IQ switches and CIOQ switches loaded by multiclass traffic, and we devise some simple scheduling policies that guarantee 100 % throughput. Both the Lyapunov function methodology and the fluid modeling approach are used to obtain our results

    Providing flow based performance guarantees for buffered crossbar switches

    Full text link
    Buffered crossbar switches are a special type of com-bined input-output queued switches with each crosspoint of the crossbar having small on-chip buffers. The introduc-tion of crosspoint buffers greatly simplifies the scheduling process of buffered crossbar switches, and furthermore en-ables buffered crossbar switches with speedup of two to eas-ily provide port based performance guarantees. However, recent research results have indicated that, in order to pro-vide flow based performance guarantees, buffered crossbar switches have to either increase the speedup of the cross-bar to three or greatly increase the total number of cross-point buffers, both adding significant hardware complexity. In this paper, we present scheduling algorithms for buffered crossbar switches to achieve flow based performance guar-antees with speedup of two and with only one or two buffers at each crosspoint. When there is no crosspoint blocking in a specific time slot, only the simple and distributed in-put scheduling and output scheduling are necessary. Other-wise, the special urgent matching is introduced to guarantee the on-time delivery of crosspoint blocked cells. With the proposed algorithms, buffered crossbar switches can pro-vide flow based performance guarantees by emulating push-in-first-out output queued switches, and we use the counting method to formally prove the perfect emulation. For the special urgent matching, we present sequential and paral-lel matching algorithms. Both algorithms converge with N iterations in the worst case, and the latter needs less itera-tions in the average case. Finally, we discuss an alternative backup-buffer implementation scheme to the bypass path, and compare our algorithms with existing algorithms in the literature
    • …
    corecore