115 research outputs found

    Implementation aspects of ATM switches

    Get PDF

    Work-Conserving Distributed Schedulers

    Get PDF
    Buffered multistage interconnection networks offer one of the most scalable and cost-effective approaches to building high capacity routers and switches. Unfortunately, the performance of such systems has been difficult to predict in the presence of the extreme traffic conditions that can arise in Internet routers. Recent work introduced the idea of distributed scheduling, to regulate the flow of traffic in such systems. This work demonstrated (using simulation and experimental measurements) that distributed scheduling can en-able robust performance, even in the presence of adversarial traffic patterns. In this paper, we show that appropriately designed distributed scheduling algorithms are provably work-conserving for speedups of 2 or more. Two of the three algorithms presented were inspired by algorithms previously developed for crossbar scheduling. The third has no direct counterpart in the crossbar scheduling context. In our analysis, we show that distributed schedulers based on blocking flows in small-depth acyclic flow graphs can be work-conserving, just as certain crossbar schedulers based on maximal bipartite matchings have been shown to be work-conserving. We also study the performance of practical variants of the work-conserving algorithms with speedups less than 2, using simulation. These studies demonstrate that distributed scheduling ensures excellent performance under extreme traffic conditions for speedups of less than 1.5

    Resilient Cell Resequencing in Terabit Routers

    Get PDF
    Multistage interconnection networks with internal cell buffering and dynamic routing are among the most cost-effective architectures for multi-terabit internet routers. One of the key design issues for such systems is maintaining cell ordering, since cells are subject to varying delays as they pass through the interconnection network. The most flexible and scalable approach to cell resequencing uses timestamps and a time-ordered resequencing buffer at each router output port. Conventional, fixed-threshold resequencers can perform poorly in the presence of extreme traffic conditions. This paper explores alternative resequencer designs that are more tolerant of such traffic. These alternatives include a novel adaptive resequencer that adjusts the time cells spend waiting in the resequencing buffer, based on the recent history of the interconnection network delay. The design is straightforward to implement and requires only constant time per cell, making it suitable for systems with link speeds of up to 40 Gb/s. We show that the combination of adaptive resequencing and appropriately designed inter-connection networks can limit resequencing errors to negligible levels without requiring large resequencing latencies

    A Multi-Stage Packet-Switch Based on NoC Fabrics for Data Center Networks

    Get PDF
    Bandwidth-hungry applications such as Cloud computing, video sharing and social networking drive the creation of more powerful Data Centers (DCs) to manage the large amount of packetized traffic. Data center network (DCN) topologies rely on thousands of servers that exchange data via the switching backbone. Cluster switches and routers are employed to provide interconnectivity between elements of the same DC and inter DCs and must be able to handle the continuously variable loads. Hence, robust and scalable switching modules are needed. Conventional DCN switches adopt crossbars or/and blocks of memories in multistage interconnection architectures (commonly 2-Tiers or 3-Tiers). However, current multistage packet switch architectures, with their space-memory variants, are either too complex to implement, have poor performance, or not cost effective. In this paper, we propose a novel and highly scalable multistage packet-switch design based on Networks-on-Chip (NoC) fabrics for DCNs. In particular, we describe a novel three-stage packet-switch fabric with a Round-Robin packets dispatching scheme where each central stage module is based on a Unidirectional NoC (UDN), instead of a conventional single hop crossbar fabric. The proposed design, referred to as Clos- UDN, overcomes all the shortcomings of conventional multistage architectures. In particular, as we shall demonstrate, the proposed Clos-UDN architecture: (i) Obviates the need for a complex and costly input modules, by means of few, yet simple, input FIFO queues. (ii) Avoids the need for a complex and synchronized scheduling process over a high number of input-output modules and/or port pairs. (iii) Provides speedup, load balancing and path-diversity thanks to a dynamic dispatching scheme as well as the NoC based fabric nature. Extensive simulation studies are conducted to compare the proposed Clos-UDN switch to conventional multistage switches. Simulation results show that the Clos-UDN outperforms conventional design under a wide range of input traffic scenarios, making it highly appealing for ultra-high capacity DC networks

    Experimental Evaluation of a Coarse-Grained Switch Scheduler

    Get PDF
    Modern high performance routers rely on sophisticated interconnection networks to meet ever increasing demands on capacity. Regulating the flow of packets through these interconnects is critical to providing good performance, particularly in the presence of extreme traffic patterns that result in sustained overload at output ports. Previous studies have used a combination of analysis and idealized simulations to show that coarse-grained scheduling of traffic flows can be effective in preventing congestion, while ensuring high utilization. In this paper, we study the performance of a coarse-grained scheduler in a real router with a scalable architecture similar to those found in high performance commercial systems. Our results are obtained by taking fine-grained measurements of an operating router that provide a detailed picture of how the scheduling algorithm behaves under a variety of conditions, giving a more complete and realistic understanding of the short time-scale dynamics than previous studies could provide. We also examine computation and communication overheads of our scheduler implementation to assess its resource usage and to provide the basis for an analysis of how the resource usage scales with system size

    Hypergraph-Based Interconnection Networks for Large Multicomputers

    Get PDF
    This thesis deals with issues pertaining to multicomputer interconnection networks namely topology, technology, switching method, and routing algorithm. It argues that a new class of regular low-dimensional hypergraph networks, the distributed crossbar switch hypermesh (DCSH), represents a promising alternative high-performance interconnection network for future large multicomputers to graph networks such as meshes, tori, and binary n-cubes, which have been widely used in current multicomputers. Channels in existing hypergraph and graph structures suffer from bandwidth limitations imposed by implementation technology. The first part of the thesis shows how the low-dimensional DCSH can use an innovative implementation scheme to alleviate this problem. It relies on the separation of processing and communication functions by physical layering in order to accommodate high wiring density and necessary message buffering, improving performance considerably. Various mathematical models of the DCSH, validated through discrete-event simulation, are then introduced. Effects of different switching methods (e.g., wormhole routing, virtual cut-through, and message switching), routing algorithms (e.g., restricted and random), and different switching element designs are investigated. Further, the impact on performance of different communication patterns, such as those including locality and hot-spots, are assessed. The remainder of the thesis compares the DCSH to other common hypergraph and graph networks assuming different implementation technologies, such as VLSI, multiple-chip technology, and the new layered implementation scheme. More realistic assumptions are introduced such as pipeline-bit transmission and non-zero delays through switching elements. The results show that the proposed structure has superior characteristics assuming equal implementation cost in both VLSI and multiple-chip technology. Furthermore, optimal performance is offered by the new layered implementation
    corecore