2,646 research outputs found

    A performance model of multicast communication in wormhole-routed networks on-chip

    Get PDF
    Collective communication operations form a part of overall traffic in most applications running on platforms employing direct interconnection networks. This paper presents a novel analytical model to compute communication latency of multicast as a widely used collective communication operation. The novelty of the model lies in its ability to predict the latency of the multicast communication in wormhole-routed architectures employing asynchronous multi-port routers scheme. The model is applied to the Quarc NoC and its validity is verified by comparing the model predictions against the results obtained from a discrete-event simulator developed using OMNET++

    A communication model of broadcast in wormhole-routed networks on-chip

    Get PDF
    This paper presents a novel analytical model to compute communication latency of broadcast as the most fundamental collective communication operation. The novelty of the model lies in its ability to predict the broadcast communication latency in wormhole-routed architectures employing asynchronous multi-port routers scheme. The model is applied to the Quarc NoC and its validity is verified by comparing the model predictions against the results obtained from a discrete-event simulator developed using OMNET++

    High Performance Queueing and Scheduling in Support of Multicasting in Input-Queued Switches

    Get PDF
    Due to its mild requirement on the bandwidth of switching fabric and internal memory, the input-queued architecture is a practical solution for today\u27s very high-speed switches. One of the notoriously difficult problems in the design of input-queued switches with very high link rates is the high performance queueing and scheduling of multicast traffic. This dissertation focuses on proposing novel solutions for this problem. The design challenge stems from the nature of multicast traffic, i.e., a multicast packet typically has multiple destinations. On the one hand, this nature makes queueing and scheduling of multicast traffic much more difficult than that of unicast traffic. For example, virtual output queueing is widely used to completely avoid the head-of-line blocking and achieve 100% throughput for unicast traffic. Nevertheless, the exhaustive, multicast virtual output queueing is impractical and results in out-of-order delivery. On the other hand, in spite of extensive studies in the context of either pure unicast traffic or pure multicast traffic, the results from a study in one context are not applicable to the other context due to the difference between the natures of unicast and multicast traffic. The design of integrated scheduling for both types of traffic remains an open issue. The main contribution of this dissertation is twofold: firstly, the performance of an interesting approach to efficiently mitigate head-of-line blocking for multicast traffic is theoretically analyzed; secondly, two novel algorithms are proposed to efficiently integrate unicast and multicast scheduling within one switching fabric. The research work presented in this dissertation concludes that (1) a small number of queues are sufficient to maximize the saturation throughput and delay performances of a large multicast switch with multiple first-in-first-out queues per input port; (2) the theoretical analysis results are indeed valid for practical large-sized switches; (3) for a large M × N multicast switch, the final achievable saturation throughput decreases as the ratio of M/N decreases; (4) and the two proposed integration algorithms exhibit promising performances in terms of saturation throughput, delay, and packet loss ratio under both uniform Bernoulli and uniform bursty traffic

    Random Linear Network Coding for 5G Mobile Video Delivery

    Get PDF
    An exponential increase in mobile video delivery will continue with the demand for higher resolution, multi-view and large-scale multicast video services. Novel fifth generation (5G) 3GPP New Radio (NR) standard will bring a number of new opportunities for optimizing video delivery across both 5G core and radio access networks. One of the promising approaches for video quality adaptation, throughput enhancement and erasure protection is the use of packet-level random linear network coding (RLNC). In this review paper, we discuss the integration of RLNC into the 5G NR standard, building upon the ideas and opportunities identified in 4G LTE. We explicitly identify and discuss in detail novel 5G NR features that provide support for RLNC-based video delivery in 5G, thus pointing out to the promising avenues for future research.Comment: Invited paper for Special Issue "Network and Rateless Coding for Video Streaming" - MDPI Informatio

    Architecture design and performance analysis of practical buffered-crossbar packet switches

    Get PDF
    Combined input crosspoint buffered (CICB) packet switches were introduced to relax inputoutput arbitration timing and provide high throughput under admissible traffic. However, the amount of memory required in the crossbar of an N x N switch is N2x k x L, where k is the crosspoint buffer size and needs to be of size RTT in cells, L is the packet size. RTT is the round-trip time which is defined by the distance between line cards and switch fabric. When the switch size is large or RTT is not negligible, the memory amount required makes the implementation costly or infeasible for buffered crossbar switches. To reduce the required memory amount, a family of shared memory combined-input crosspoint-buffered (SMCB) packet switches, where the crosspoint buffers are shared among inputs, are introduced in this thesis. One of the proposed switches uses a memory speedup of in and dynamic memory allocation, and the other switch avoids speedup by arbitrating the access of inputs to the crosspoint buffers. These two switches reduce the required memory of the buffered crossbar by 50% or more and achieve equivalent throughput under independent and identical traffic with uniform distributions when using random selections. The proposed mSMCB switch is extended to support differentiated services and long RTT. To support P traffic classes with different priorities, CICB switches have been reported to use N2x k x L x P amount of memory to avoid blocking of high priority cells.The proposed SMCB switch with support for differentiated services requires 1/mP of the memory amount in the buffered crossbar and achieves similar throughput performance to that of a CICB switch with similar priority management, while using no speedup in the shared memory. The throughput performance of SMCB switch with crosspoint buffers shared by inputs (I-SMCB) is studied under multicast traffic. An output-based shared-memory crosspoint buffered (O-SMCB) packet switch is proposed where the crosspoint buffers are shared by two outputs and use no speedup. The proposed O-SMCB switch provides high performance under admissible uniform and nonuniform multicast traffic models while using 50% of the memory used in CICB switches. Furthermore, the O-SMCB switch provides higher throughput than the I-SMCB switch. As SMCB switches can efficiently support an RTT twice as long as that supported by CICB switches and as the performance of SMCB switches is bounded by a matching between inputs and crosspoint buffers, a new family of CICB switches with flexible access to crosspoint buffers are proposed to support longer RTTs than SMCB switches and to provide higher throughput under a wide variety of admissible traffic models. The CICB switches with flexible access allow an input to use any available crosspoint buffer at a given output. The proposed switches reduce the required crosspoint buffer size by a factor of N , keep the service of cells in sequence, and use no speedup. This new class of switches achieve higher throughput performance than CICB switches under a large variety of traffic models, while supporting long RTTs. Crosspoint buffered switches that are implemented in single chips have limited scalability. To support a large number of ports in crosspoint buffered switches, memory-memory-memory (MMM) Clos-network switches are an alternative. The MMM switches that use minimum memory amount at the central module is studied. Although, this switch can provide a moderate throughput, MMM switch may serve cells out of sequence. As keeping cells in sequence in an MMM switch may require buffers be distributed per flow, an MMM with extended memory in the switch modules is studied. To solve the out of sequence problem in MMM switches, a queuing architecture is proposed for an MMM switch. The service of cells in sequence is analyzed

    On packet switch design

    Get PDF
    corecore