1,263 research outputs found

    On scheduling input queued cell switches

    Get PDF
    Output-queued switching, though is able to offer high throughput, guaranteed delay and fairness, lacks scalability owing to the speed up problem. Input-queued switching, on the other hand, is scalable, and is thus becoming an attractive alternative. This dissertation presents three approaches toward resolving the major problem encountered in input-queued switching that has prohibited the provision of quality of service guarantees. First, we proposed a maximum size matching based algorithm, referred to as min-max fair input queueing (MFIQ), which minimizes the additional delay caused by back pressure, and at the same time provides fair service among competing sessions. Like any maximum size matching algorithm, MFIQ performs well for uniform traffic, in which the destinations of the incoming cells are uniformly distributed over all the outputs, but is not stable for non-uniform traffic. Subse-quently, we proposed two maximum weight matching based algorithms, longest normalized queue first (LNQF) and earliest due date first matching (EDDFM), which are stable for both uniform and non-uniform traffic. LNQF provides fairer service than longest queue first (LQF) and better traffic shaping than oldest cell first (OCF), and EDDEM has lower probability of delay overdue than LQF, LNQF, and OCF. Our third approach, referred to as store-sort-and-forward (SSF), is a frame based scheduling algorithm. SSF is proved to be able to achieve strict sense 100% throughput, and provide bounded delay and delay jitter for input-queued switches if the traffic conforms to the (r, T) model

    On packet switch design

    Get PDF

    Strong Performance Guarantees for Asynchronous Buffered Crossbar Schedulers

    Get PDF
    Crossbar-based switches are commonly used to implement routers with throughputs up to about 1 Tb/s. The advent of crossbar scheduling algorithms that provide strong performance guarantees now makes it possible to engineer systems that perform well, even under extreme traffic conditions. Until recently, such performance guarantees have only been developed for crossbars that switch cells rather than variable length packets. Cell-based crossbars incur a worst-case bandwidth penalty of up to a factor of two, since they must fragment variable length packets into fixed length cells. In addition, schedulers for cell-based crossbars may fail to deliver the expected performance guarantees when used in routers that forward packets. We show how to obtain performance guarantees for asynchronous crossbars that are directly comparable to those previously developed for synchronous, cell-based crossbars. In particular we define derivatives of the Group by Virtual Output Queue (GVOQ) scheduler of Chuang et al. and the Least Occupied Output First Scheduler of Krishna et al. and show that both can provide strong performance guarantees in systems with speedup 2. Specifically, we show that these schedulers are work-conserving and that they can emulate an output-queued switch using any queueing discipline in the class of restricted Push-In, First-Out queueing disciplines. We also show that there are schedulers for segment-based crossbars, (introduced recently by Katevenis and Passas) that can deliver strong performance guarantees with small buffer requirements and no bandwidth fragmentation

    1 Asynchronous vs Synchronous Input-Queued Switches

    Get PDF
    Abstract—Input-queued (IQ) switches are one of the reference architectures for the design of high-speed packet switches. Classical results in this field refer to the scenario in which the whole switch transfers the packets in a synchronous fashion, in phase with a sequence of fixedsize timeslots, tailored to transport a minimum-size packet. However, for switches with large number of ports and high bandwidth, maintaining an accurate global synchronization and transferring all the packets in a synchronous fashion is becoming more and more challenging. Furthermore, variable size packets (as in the traffic present in the Internet) require rather complex segmentation and reassembly processes and some switching capacity is wasted due to partial filling of timeslots. Thus, in this work we consider a switch able to natively transfer packets in an asynchronous fashion thanks to a simple and distributed packet scheduler. We investigate the performance of asynchronous IQ switches and show that, despite their simplicity, their performance is comparable or even better than those of synchronous switches. These results highlight the great potential of the asynchronous approach for the design of high-performance switches.

    An occam Style Communications System for UNIX Networks

    Get PDF
    This document describes the design of a communications system which provides occam style communications primitives under a Unix environment, using TCP/IP protocols, and any number of other protocols deemed suitable as underlying transport layers. The system will integrate with a low overhead scheduler/kernel without incurring significant costs to the execution of processes within the run time environment. A survey of relevant occam and occam3 features and related research is followed by a look at the Unix and TCP/IP facilities which determine our working constraints, and a description of the T9000 transputer's Virtual Channel Processor, which was instrumental in our formulation. Drawing from the information presented here, a design for the communications system is subsequently proposed. Finally, a preliminary investigation of methods for lightweight access control to shared resources in an environment which does not provide support for critical sections, semaphores, or busy waiting, is made. This is presented with relevance to mutual exclusion problems which arise within the proposed design. Future directions for the evolution of this project are discussed in conclusion

    Scheduling algorithms for high-speed switches

    Get PDF
    The virtual output queued (VOQ) switching architecture was adopted for high speed switch implementation owing to its scalability and high throughput. An ideal VOQ algorithm should provide Quality of Service (QoS) with low complexity. However, none of the existing algorithms can meet these requirements. Several algorithms for VOQ switches are introduced in this dissertation in order to improve upon existing algorithms in terms of implementation or QoS features. Initially, the earliest due date first matching (EDDFM) algorithm, which is stable for both uniform and non-uniform traffic patterns, is proposed. EDDFM has lower probability of cell overdue than other existing maximum weight matching algorithms. Then, the shadow departure time algorithm (SDTA) and iterative SDTA (ISDTA) are introduced. The QoS features of SDTA and ISDTA are better than other existing algorithms with the same computational complexity. Simulations show that the performance of a VOQ switch using ISDTA with a speedup of 1.5 is similar to that of an output queued (OQ) switch in terms of cell delay and throughput. Later, the enhanced Birkhoff-von Neumann decomposition (EBVND) algorithm based on the Birkhoff-von Neumann decomposition (BVND) algorithm, which can provide rate and cell delay guarantees, is introduced. Theoretical analysis shows that the performance of EBVND is better than BVND in terms of throughput and cell delay. Finally, the maximum credit first (MCF), the Enhanced MCF (EMCF), and the iterative MCF (IMCF) algorithms are presented. These new algorithms have the similar performance as BNVD, yet are easier to implement in practice

    Design and stability analysis of high performance packet switches

    Get PDF
    With the rapid development of optical interconnection technology, high-performance packet switches are required to resolve contentions in a fast manner to satisfy the demand for high throughput and high speed rates. Combined input-crosspoint buffered (CICB) switches are an alternative to input-buffered (IB) packet switches to provide high-performance switching and to relax arbitration timing for packet switches with high-speed ports. A maximum weight matching (MWM) scheme can provide 100% throughput under admissible traffic for lB switches. However, the high complexity of MWM prohibits its implementation in high-speed switches. In this dissertation, a feedback-based arbitration scheme for CICB switches is studied, where cell selection is based on the provided service to virtual output queues (VOQs). The feedback-based scheme is named round-robin with adaptable frame size (RR-AF) arbitration. The frame size in RR-AF is adaptably changed by the serviced and unserviced traffic. If a switch is stable, the switch provides 100% throughput. Here, it is proved that RR-AF can achieve 100% throughput under uniform admissible traffic. Switches with crosspoint buffers need to consider the transmission delays, or round-trip times to define the crosspoint buffer size. As the buffered crossbar switch can be physically located far from the input ports, actual round-trip times can be non-negligible. To support non-negligible round-trip times in a buffered crossbar switch, the crosspoint buffer size needs to be increased. To satisfy this demand, this dissertation investigates how to select the crosspoint buffer size under non-negligible round trip times and under uniform traffic. With the analysis of stability margin, the relationship between the crosspoint buffer size and round-trip time is derived. Considering that CICB switches deliver higher performance than lB switches and require no speedup, this dissertation investigates the maximum throughput performance that these switches can achieve. It is shown that CICB switches without speedup achieve 100% throughput under any admissible traffic through a fluid model. In addition, a new hybrid scheme, based on longest queue-first (as input arbitration) and longest column occupancy first (as output arbitration) is proposed, which achieves 100% throughput under uniform and non-uniform traffic patterns. In order to give a better insight of the feedback nature of arbitration scheme for CICB switches, a frame-based round-robin arbitration scheme with explicit feedback control (FRE) is introduced. FRE dynamically sets the frame size according to the input load and to the accumulation of cells in a VOQ. FRE is used as the input arbitration scheme and it is combined with RR, PRR, and FRE as output arbitration schemes. These combined schemes deliver high performance under uniform and nonuniform traffic models using a buffered crossbar with one-cell crosspoint buffers. The novelty of FRE lies in that each VOQ sets the frame size by an adjustable parameter, Δ(i,j) which indicates the degree of service needed by VOQ(i, j). This value is adjusted according to the input loading and the accumulation of cells experienced in previous service cycles. This dissertation also explores an analysis technique based on feedback control theory. This methodology is proposed to study the stability of arbitration and matching schemes for packet switches. A continuous system is used and a control model is used to emulate a queuing system. The technique is applied to a matching scheme. In addition, the study shows that the dwell time, which is defined as the time a queue receives service in a service opportunity, is a factor that affects the stability of a queuing system. This feedback control model is an alternative approach to evaluate the stability of arbitration and matching schemes

    Characterization of the Burst Stabilization Protocol for the RR/RR CICQ Switch

    Full text link
    Input buffered switches with Virtual Output Queueing (VOQ) can be unstable when presented with unbalanced loads. Existing scheduling algorithms, including iSLIP for Input Queued (IQ) switches and Round Robin (RR) for Combined Input and Crossbar Queued (CICQ) switches, exhibit instability for some schedulable loads. We investigate the use of a queue length threshold and bursting mechanism to achieve stability without requiring internal speed-up. An analytical model is developed to prove that the burst stabilization protocol achieves stability and to predict the minimum burst value needed as a function of offered load. The analytical model is shown to have very good agreement with simulation results. These results show the advantage of the RR/RR CICQ switch as a contender for the next generation of high-speed switches.Comment: Presented at the 28th Annual IEEE Conference on Local Computer Networks (LCN), Bonn/Konigswinter, Germany, Oct 20-24, 200

    Configurable data center switch architectures

    Get PDF
    In this thesis, we explore alternative architectures for implementing con_gurable Data Center Switches along with the advantages that can be provided by such switches. Our first contribution centers around determining switch architectures that can be implemented on Field Programmable Gate Array (FPGA) to provide configurable switching protocols. In the process, we identify a gap in the availability of frameworks to realistically evaluate the performance of switch architectures in data centers and contribute a simulation framework that relies on realistic data center traffic patterns. Our framework is then used to evaluate the performance of currently existing as well as newly proposed FPGA-amenable switch designs. Through collaborative work with Meng and Papaphilippou, we establish that only small-medium range switches can be implemented on today's FPGAs. Our second contribution is a novel switch architecture that integrates a custom in-network hardware accelerator with a generic switch to accelerate Deep Neural Network training applications in data centers. Our proposed accelerator architecture is prototyped on an FPGA, and a scalability study is conducted to demonstrate the trade-offs of an FPGA implementation when compared to an ASIC implementation. In addition to the hardware prototype, we contribute a light weight load-balancing and congestion control protocol that leverages the unique communication patterns of ML data-parallel jobs to enable fair sharing of network resources across different jobs. Our large-scale simulations demonstrate the ability of our novel switch architecture and light weight congestion control protocol to both accelerate the training time of machine learning jobs by up to 1.34x and benefit other latency-sensitive applications by reducing their 99%-tile completion time by up to 4.5x. As for our final contribution, we identify the main requirements of in-network applications and propose a Network-on-Chip (NoC)-based architecture for supporting a heterogeneous set of applications. Observing the lack of tools to support such research, we provide a tool that can be used to evaluate NoC-based switch architectures.Open Acces
    • …
    corecore