156 research outputs found

    Network Coding in a Multicast Switch

    Full text link
    We consider the problem of serving multicast flows in a crossbar switch. We show that linear network coding across packets of a flow can sustain traffic patterns that cannot be served if network coding were not allowed. Thus, network coding leads to a larger rate region in a multicast crossbar switch. We demonstrate a traffic pattern which requires a switch speedup if coding is not allowed, whereas, with coding the speedup requirement is eliminated completely. In addition to throughput benefits, coding simplifies the characterization of the rate region. We give a graph-theoretic characterization of the rate region with fanout splitting and intra-flow coding, in terms of the stable set polytope of the 'enhanced conflict graph' of the traffic pattern. Such a formulation is not known in the case of fanout splitting without coding. We show that computing the offline schedule (i.e. using prior knowledge of the flow arrival rates) can be reduced to certain graph coloring problems. Finally, we propose online algorithms (i.e. using only the current queue occupancy information) for multicast scheduling based on our graph-theoretic formulation. In particular, we show that a maximum weighted stable set algorithm stabilizes the queues for all rates within the rate region.Comment: 9 pages, submitted to IEEE INFOCOM 200

    Architecture design and performance analysis of practical buffered-crossbar packet switches

    Get PDF
    Combined input crosspoint buffered (CICB) packet switches were introduced to relax inputoutput arbitration timing and provide high throughput under admissible traffic. However, the amount of memory required in the crossbar of an N x N switch is N2x k x L, where k is the crosspoint buffer size and needs to be of size RTT in cells, L is the packet size. RTT is the round-trip time which is defined by the distance between line cards and switch fabric. When the switch size is large or RTT is not negligible, the memory amount required makes the implementation costly or infeasible for buffered crossbar switches. To reduce the required memory amount, a family of shared memory combined-input crosspoint-buffered (SMCB) packet switches, where the crosspoint buffers are shared among inputs, are introduced in this thesis. One of the proposed switches uses a memory speedup of in and dynamic memory allocation, and the other switch avoids speedup by arbitrating the access of inputs to the crosspoint buffers. These two switches reduce the required memory of the buffered crossbar by 50% or more and achieve equivalent throughput under independent and identical traffic with uniform distributions when using random selections. The proposed mSMCB switch is extended to support differentiated services and long RTT. To support P traffic classes with different priorities, CICB switches have been reported to use N2x k x L x P amount of memory to avoid blocking of high priority cells.The proposed SMCB switch with support for differentiated services requires 1/mP of the memory amount in the buffered crossbar and achieves similar throughput performance to that of a CICB switch with similar priority management, while using no speedup in the shared memory. The throughput performance of SMCB switch with crosspoint buffers shared by inputs (I-SMCB) is studied under multicast traffic. An output-based shared-memory crosspoint buffered (O-SMCB) packet switch is proposed where the crosspoint buffers are shared by two outputs and use no speedup. The proposed O-SMCB switch provides high performance under admissible uniform and nonuniform multicast traffic models while using 50% of the memory used in CICB switches. Furthermore, the O-SMCB switch provides higher throughput than the I-SMCB switch. As SMCB switches can efficiently support an RTT twice as long as that supported by CICB switches and as the performance of SMCB switches is bounded by a matching between inputs and crosspoint buffers, a new family of CICB switches with flexible access to crosspoint buffers are proposed to support longer RTTs than SMCB switches and to provide higher throughput under a wide variety of admissible traffic models. The CICB switches with flexible access allow an input to use any available crosspoint buffer at a given output. The proposed switches reduce the required crosspoint buffer size by a factor of N , keep the service of cells in sequence, and use no speedup. This new class of switches achieve higher throughput performance than CICB switches under a large variety of traffic models, while supporting long RTTs. Crosspoint buffered switches that are implemented in single chips have limited scalability. To support a large number of ports in crosspoint buffered switches, memory-memory-memory (MMM) Clos-network switches are an alternative. The MMM switches that use minimum memory amount at the central module is studied. Although, this switch can provide a moderate throughput, MMM switch may serve cells out of sequence. As keeping cells in sequence in an MMM switch may require buffers be distributed per flow, an MMM with extended memory in the switch modules is studied. To solve the out of sequence problem in MMM switches, a queuing architecture is proposed for an MMM switch. The service of cells in sequence is analyzed

    On-board B-ISDN fast packet switching architectures. Phase 2: Development. Proof-of-concept architecture definition report

    Get PDF
    For the next-generation packet switched communications satellite system with onboard processing and spot-beam operation, a reliable onboard fast packet switch is essential to route packets from different uplink beams to different downlink beams. The rapid emergence of point-to-point services such as video distribution, and the large demand for video conference, distributed data processing, and network management makes the multicast function essential to a fast packet switch (FPS). The satellite's inherent broadcast features gives the satellite network an advantage over the terrestrial network in providing multicast services. This report evaluates alternate multicast FPS architectures for onboard baseband switching applications and selects a candidate for subsequent breadboard development. Architecture evaluation and selection will be based on the study performed in phase 1, 'Onboard B-ISDN Fast Packet Switching Architectures', and other switch architectures which have become commercially available as large scale integration (LSI) devices

    Fabric-on-a-Chip: Toward Consolidating Packet Switching Functions on Silicon

    Get PDF
    The switching capacity of an Internet router is often dictated by the memory bandwidth required to bu¤er arriving packets. With the demand for greater capacity and improved service provisioning, inherent memory bandwidth limitations are encountered rendering input queued (IQ) switches and combined input and output queued (CIOQ) architectures more practical. Output-queued (OQ) switches, on the other hand, offer several highly desirable performance characteristics, including minimal average packet delay, controllable Quality of Service (QoS) provisioning and work-conservation under any admissible traffic conditions. However, the memory bandwidth requirements of such systems is O(NR), where N denotes the number of ports and R the data rate of each port. Clearly, for high port densities and data rates, this constraint dramatically limits the scalability of the switch. In an effort to retain the desirable attributes of output-queued switches, while significantly reducing the memory bandwidth requirements, distributed shared memory architectures, such as the parallel shared memory (PSM) switch/router, have recently received much attention. The principle advantage of the PSM architecture is derived from the use of slow-running memory units operating in parallel to distribute the memory bandwidth requirement. At the core of the PSM architecture is a memory management algorithm that determines, for each arriving packet, the memory unit in which it will be placed. However, to date, the computational complexity of this algorithm is O(N), thereby limiting the scalability of PSM switches. In an effort to overcome the scalability limitations, it is the goal of this dissertation to extend existing shared-memory architecture results while introducing the notion of Fabric on a Chip (FoC). In taking advantage of recent advancements in integrated circuit technologies, FoC aims to facilitate the consolidation of as many packet switching functions as possible on a single chip. Accordingly, this dissertation introduces a novel pipelined memory management algorithm, which plays a key role in the context of on-chip output- queued switch emulation. We discuss in detail the fundamental properties of the proposed scheme, along with hardware-based implementation results that illustrate its scalability and performance attributes. To complement the main effort and further support the notion of FoC, we provide performance analysis of output queued cell switches with heterogeneous traffic. The result is a flexible tool for obtaining bounds on the memory requirements in output queued switches under a wide range of tra¢ c scenarios. Additionally, we present a reconfigurable high-speed hardware architecture for real-time generation of packets for the various traffic scenarios. The work presented in this thesis aims at providing pragmatic foundations for designing next-generation, high-performance Internet switches and routers

    Traffic Management for Next Generation Transport Networks

    Get PDF

    Design and analysis of a scalable terabit multicast packet switch : architecture and scheduling algorithms

    Get PDF
    Internet growth and success not only open a primary route of information exchange for millions of people around the world, but also create unprecedented demand for core network capacity. Existing switches/routers, due to the bottleneck from either switch architecture or arbitration complexity, can reach a capacity on the order of gigabits per second, but few of them are scalable to large capacity of terabits per second. In this dissertation, we propose three novel switch architectures with cooperated scheduling algorithms to design a terabit backbone switch/router which is able to deliver large capacity, multicasting, and high performance along with Quality of Service (QoS). Our switch designs benefit from unique features of modular switch architecture and distributed resource allocation scheme. Switch I is a unique and modular design characterized by input and output link sharing. Link sharing resolves output contention and eliminates speedup requirement for central switch fabric. Hence, the switch architecture is scalable to any large size. We propose a distributed round robin (RR) scheduling algorithm which provides fairness and has very low arbitration complexity. Switch I can achieve good performance under uniform traffic. However, Switch I does not perform well for non-uniform traffic. Switch II, as a modified switch design, employs link sharing as well as a token ring to pursue a solution to overcome the drawback of Switch 1. We propose a round robin prioritized link reservation (RR+POLR) algorithm which results in an improved performance especially under non-uniform traffic. However, RR+POLR algorithm is not flexible enough to adapt to the input traffic. In Switch II, the link reservation rate has a great impact on switch performance. Finally, Switch III is proposed as an enhanced switch design using link sharing and dual round robin rings. Packet forwarding is based on link reservation. We propose a queue occupancy based dynamic link reservation (QOBDLR) algorithm which can adapt to the input traffic to provide a fast and fair link resource allocation. QOBDLR algorithm is a distributed resource allocation scheme in the sense that dynamic link reservation is carried out according to local available information. Arbitration complexity is very low. Compared to the output queued (OQ) switch which is known to offer the best performance under any traffic pattern, Switch III not only achieves performance as good as the OQ switch, but also overcomes speedup problem which seriously limits the OQ switch to be a scalable switch design. Hence, Switch III would be a good choice for high performance, scalable, large-capacity core switches

    Scheduling and reconfiguration of interconnection network switches

    Get PDF
    Interconnection networks are important parts of modern computing systems, facilitating communication between a system\u27s components. Switches connecting various nodes of an interconnection network serve to move data in the network. The switch\u27s delay and throughput impact the overall performance of the network and thus the system. Scheduling efficient movement of data through a switch and configuring the switch to realize a schedule are the main themes of this research. We consider various interconnection network switches including (i) crossbar-based switches, (ii) circuit-switched tree switches, and (iii) fat-tree switches. For crossbar-based input-queued switches, a recent result established that logarithmic packet delay is possible. However, this result assumes that packet transmission time through the switch is no less than schedule-generation time. We prove that without this assumption (as is the case in practice) packet delay becomes linear. We also report results of simulations that bear out our result for practical switch sizes and indicate that a fast scheduling algorithm reduces not only packet delay but also buffer size. We also propose a fast mesh-of-trees based distributed switch scheduling (maximal-matching based) algorithm that has polylog complexity. A circuit-switched tree (CST) can serve as an interconnect structure for various computing architectures and models such as the self-reconfigurable gate array and the reconfigurable mesh. A CST is a tree structure with source and destination processing elements as leaves and switches as internal nodes. We design several scheduling and configuration algorithms that distributedly partition a given set of communications into non-conflicting subsets and then establish switch settings and paths on the CST corresponding to the communications. A fat-tree is another widely used interconnection structure in many of today\u27s high-performance clusters. We embed a reconfigurable mesh inside a fat-tree switch to generate efficient connections. We present an R-Mesh-based algorithm for a fat-tree switch that creates buses connecting input and output ports corresponding to various communications using that switch
    • …
    corecore