291 research outputs found

    A Multi-Stage Packet-Switch Based on NoC Fabrics for Data Center Networks

    Get PDF
    Bandwidth-hungry applications such as Cloud computing, video sharing and social networking drive the creation of more powerful Data Centers (DCs) to manage the large amount of packetized traffic. Data center network (DCN) topologies rely on thousands of servers that exchange data via the switching backbone. Cluster switches and routers are employed to provide interconnectivity between elements of the same DC and inter DCs and must be able to handle the continuously variable loads. Hence, robust and scalable switching modules are needed. Conventional DCN switches adopt crossbars or/and blocks of memories in multistage interconnection architectures (commonly 2-Tiers or 3-Tiers). However, current multistage packet switch architectures, with their space-memory variants, are either too complex to implement, have poor performance, or not cost effective. In this paper, we propose a novel and highly scalable multistage packet-switch design based on Networks-on-Chip (NoC) fabrics for DCNs. In particular, we describe a novel three-stage packet-switch fabric with a Round-Robin packets dispatching scheme where each central stage module is based on a Unidirectional NoC (UDN), instead of a conventional single hop crossbar fabric. The proposed design, referred to as Clos- UDN, overcomes all the shortcomings of conventional multistage architectures. In particular, as we shall demonstrate, the proposed Clos-UDN architecture: (i) Obviates the need for a complex and costly input modules, by means of few, yet simple, input FIFO queues. (ii) Avoids the need for a complex and synchronized scheduling process over a high number of input-output modules and/or port pairs. (iii) Provides speedup, load balancing and path-diversity thanks to a dynamic dispatching scheme as well as the NoC based fabric nature. Extensive simulation studies are conducted to compare the proposed Clos-UDN switch to conventional multistage switches. Simulation results show that the Clos-UDN outperforms conventional design under a wide range of input traffic scenarios, making it highly appealing for ultra-high capacity DC networks

    Efficient queue-balancing switch for FPGAs

    Get PDF
    This paper presents a novel FPGA-based switch design that achieves high algorithmic performance and an efficient FPGA implementation. Crossbar switches based on virtual output queues (VOQs) and variations have been rather popular for implementing switches on FPGAs, with applications to network-on-chip (NoC) routers and network switches. The efficiency of VOQs is well-documented on ASICs, though we show that their disadvantages can outweigh their advantages on FPGAs. Our proposed design uses an output-queued switch internally for simplifying scheduling, and a queue balancing technique to avoid queue fragmentation and reduce the need for memory-sharing VOQs. Our implementation approaches the scheduling performance of the state-of-the-art, while requiring considerably fewer FPGA resources

    Architecture design and performance analysis of practical buffered-crossbar packet switches

    Get PDF
    Combined input crosspoint buffered (CICB) packet switches were introduced to relax inputoutput arbitration timing and provide high throughput under admissible traffic. However, the amount of memory required in the crossbar of an N x N switch is N2x k x L, where k is the crosspoint buffer size and needs to be of size RTT in cells, L is the packet size. RTT is the round-trip time which is defined by the distance between line cards and switch fabric. When the switch size is large or RTT is not negligible, the memory amount required makes the implementation costly or infeasible for buffered crossbar switches. To reduce the required memory amount, a family of shared memory combined-input crosspoint-buffered (SMCB) packet switches, where the crosspoint buffers are shared among inputs, are introduced in this thesis. One of the proposed switches uses a memory speedup of in and dynamic memory allocation, and the other switch avoids speedup by arbitrating the access of inputs to the crosspoint buffers. These two switches reduce the required memory of the buffered crossbar by 50% or more and achieve equivalent throughput under independent and identical traffic with uniform distributions when using random selections. The proposed mSMCB switch is extended to support differentiated services and long RTT. To support P traffic classes with different priorities, CICB switches have been reported to use N2x k x L x P amount of memory to avoid blocking of high priority cells.The proposed SMCB switch with support for differentiated services requires 1/mP of the memory amount in the buffered crossbar and achieves similar throughput performance to that of a CICB switch with similar priority management, while using no speedup in the shared memory. The throughput performance of SMCB switch with crosspoint buffers shared by inputs (I-SMCB) is studied under multicast traffic. An output-based shared-memory crosspoint buffered (O-SMCB) packet switch is proposed where the crosspoint buffers are shared by two outputs and use no speedup. The proposed O-SMCB switch provides high performance under admissible uniform and nonuniform multicast traffic models while using 50% of the memory used in CICB switches. Furthermore, the O-SMCB switch provides higher throughput than the I-SMCB switch. As SMCB switches can efficiently support an RTT twice as long as that supported by CICB switches and as the performance of SMCB switches is bounded by a matching between inputs and crosspoint buffers, a new family of CICB switches with flexible access to crosspoint buffers are proposed to support longer RTTs than SMCB switches and to provide higher throughput under a wide variety of admissible traffic models. The CICB switches with flexible access allow an input to use any available crosspoint buffer at a given output. The proposed switches reduce the required crosspoint buffer size by a factor of N , keep the service of cells in sequence, and use no speedup. This new class of switches achieve higher throughput performance than CICB switches under a large variety of traffic models, while supporting long RTTs. Crosspoint buffered switches that are implemented in single chips have limited scalability. To support a large number of ports in crosspoint buffered switches, memory-memory-memory (MMM) Clos-network switches are an alternative. The MMM switches that use minimum memory amount at the central module is studied. Although, this switch can provide a moderate throughput, MMM switch may serve cells out of sequence. As keeping cells in sequence in an MMM switch may require buffers be distributed per flow, an MMM with extended memory in the switch modules is studied. To solve the out of sequence problem in MMM switches, a queuing architecture is proposed for an MMM switch. The service of cells in sequence is analyzed

    Scheduling in switches with small internal buffers

    Full text link

    Design and analysis of a scalable terabit multicast packet switch : architecture and scheduling algorithms

    Get PDF
    Internet growth and success not only open a primary route of information exchange for millions of people around the world, but also create unprecedented demand for core network capacity. Existing switches/routers, due to the bottleneck from either switch architecture or arbitration complexity, can reach a capacity on the order of gigabits per second, but few of them are scalable to large capacity of terabits per second. In this dissertation, we propose three novel switch architectures with cooperated scheduling algorithms to design a terabit backbone switch/router which is able to deliver large capacity, multicasting, and high performance along with Quality of Service (QoS). Our switch designs benefit from unique features of modular switch architecture and distributed resource allocation scheme. Switch I is a unique and modular design characterized by input and output link sharing. Link sharing resolves output contention and eliminates speedup requirement for central switch fabric. Hence, the switch architecture is scalable to any large size. We propose a distributed round robin (RR) scheduling algorithm which provides fairness and has very low arbitration complexity. Switch I can achieve good performance under uniform traffic. However, Switch I does not perform well for non-uniform traffic. Switch II, as a modified switch design, employs link sharing as well as a token ring to pursue a solution to overcome the drawback of Switch 1. We propose a round robin prioritized link reservation (RR+POLR) algorithm which results in an improved performance especially under non-uniform traffic. However, RR+POLR algorithm is not flexible enough to adapt to the input traffic. In Switch II, the link reservation rate has a great impact on switch performance. Finally, Switch III is proposed as an enhanced switch design using link sharing and dual round robin rings. Packet forwarding is based on link reservation. We propose a queue occupancy based dynamic link reservation (QOBDLR) algorithm which can adapt to the input traffic to provide a fast and fair link resource allocation. QOBDLR algorithm is a distributed resource allocation scheme in the sense that dynamic link reservation is carried out according to local available information. Arbitration complexity is very low. Compared to the output queued (OQ) switch which is known to offer the best performance under any traffic pattern, Switch III not only achieves performance as good as the OQ switch, but also overcomes speedup problem which seriously limits the OQ switch to be a scalable switch design. Hence, Switch III would be a good choice for high performance, scalable, large-capacity core switches

    Experimental survey of FPGA-based monolithic switches and a novel queue balancer

    Get PDF
    This paper studies small to medium-sized monolithic switches for FPGA implementation and presents a novel switch design that achieves high algorithmic performance and FPGA implementation efficiency. Crossbar switches based on virtual output queues (VOQs) and variations have been rather popular for implementing switches on FPGAs, with applications in network switches, memory interconnects, network-on-chip (NoC) routers etc. The implementation efficiency of crossbar-based switches is well-documented on ASICs, though we show that their disadvantages can outweigh their advantages on FPGAs. One of the most important challenges in such input-queued switches is the requirement for iterative scheduling algorithms. In contrast to ASICs, this is more harmful on FPGAs, as the reduced operating frequency and narrower packets cannot “hide” multiple iterations of scheduling that are required to achieve a modest scheduling performance.Our proposed design uses an output-queued switch internally for simplifying scheduling, and a queue balancing technique to avoid queue fragmentation and reduce the need for memory-sharing VOQs. Its implementation approaches the scheduling performance of a state-of-the-art FPGA-based switch, while requiring considerably fewer resources
    corecore