2,277 research outputs found

    Network Coding for Speedup in Switches

    Get PDF
    We present a graph theoretic upper bound on speedup needed to achieve 100% throughput in a multicast switch using network coding. By bounding speedup, we show the equivalence between network coding and speedup in multicast switches - i.e. network coding, which is usually implemented using software, can in many cases substitute speedup, which is often achieved by adding extra switch fabrics. This bound is based on an approach to network coding problems called the "enhanced conflict graph". We show that the "imperfection ratio" of the enhanced conflict graph gives an upper bound on speedup. In particular, we apply this result to K-by-N switches with traffic patterns consisting of unicasts and broadcasts only to obtain an upper bound of min{(2K-1)/K, 2N/(N+1)}.Comment: 5 pages, 4 figures, IEEE ISIT 200

    Network Coding in a Multicast Switch

    Full text link
    We consider the problem of serving multicast flows in a crossbar switch. We show that linear network coding across packets of a flow can sustain traffic patterns that cannot be served if network coding were not allowed. Thus, network coding leads to a larger rate region in a multicast crossbar switch. We demonstrate a traffic pattern which requires a switch speedup if coding is not allowed, whereas, with coding the speedup requirement is eliminated completely. In addition to throughput benefits, coding simplifies the characterization of the rate region. We give a graph-theoretic characterization of the rate region with fanout splitting and intra-flow coding, in terms of the stable set polytope of the 'enhanced conflict graph' of the traffic pattern. Such a formulation is not known in the case of fanout splitting without coding. We show that computing the offline schedule (i.e. using prior knowledge of the flow arrival rates) can be reduced to certain graph coloring problems. Finally, we propose online algorithms (i.e. using only the current queue occupancy information) for multicast scheduling based on our graph-theoretic formulation. In particular, we show that a maximum weighted stable set algorithm stabilizes the queues for all rates within the rate region.Comment: 9 pages, submitted to IEEE INFOCOM 200

    On the Stability of Isolated and Interconnected Input-Queued Switches under Multiclass Traffic

    Get PDF
    In this correspondence, we discuss the stability of scheduling algorithms for input-queueing (IQ) and combined input/output queueing (CIOQ) packet switches. First, we show that a wide class of IQ schedulers operating on multiple traffic classes can achieve 100 % throughput. Then, we address the problem of the maximum throughput achievable in a network of interconnected IQ switches and CIOQ switches loaded by multiclass traffic, and we devise some simple scheduling policies that guarantee 100 % throughput. Both the Lyapunov function methodology and the fluid modeling approach are used to obtain our results

    Achieving 100% throughput for multicast traffic in input-queued switches

    Get PDF
    A general approach of designing input-queued multicast switch is to employ multicast switch fabric, where packets can be replicated inside the switch fabric. As compared with unicast switch fabric, the achievable traffic rate region of a switch can be increased, but it is still less than the admissible traffic rate region. In other words, achieving 100% throughput for any admissible multicast traffic pattern is not possible. In this paper, we first revisit the fundamental problems faced by input-queued switch in supporting multicast traffic. We then argue that multicast switch fabric is not necessary if a load-balanced approach is followed. Accordingly, an existing load-balanced two-stage switch architecture [12], consisting of unicast switch fabrics, can be adopted to provide 100% throughput for any admissible multicast traffic pattern. Since the two-stage switch requires no speedup in both switch fabric and packet buffers, we consider it a two-stage input-queued switch. It can be seen that its implementation complexity is much lower than conventional (single-stage) input-queued multicast switches. As compared with the work in [12], our approach is more systematic and we propose a more effective load balancing mechanism. © 2011 IEEE.link_to_subscribed_fulltextProceedings of the IEEE Global Telecommunications Conference (GLOBECOM 2011), Houston, TX, USA, 5-9 December 201

    Natural Compression for Distributed Deep Learning

    Full text link
    Modern deep learning models are often trained in parallel over a collection of distributed machines to reduce training time. In such settings, communication of model updates among machines becomes a significant performance bottleneck and various lossy update compression techniques have been proposed to alleviate this problem. In this work, we introduce a new, simple yet theoretically and practically effective compression technique: {\em natural compression (NC)}. Our technique is applied individually to all entries of the to-be-compressed update vector and works by randomized rounding to the nearest (negative or positive) power of two, which can be computed in a "natural" way by ignoring the mantissa. We show that compared to no compression, NC increases the second moment of the compressed vector by not more than the tiny factor \nicefrac{9}{8}, which means that the effect of NC on the convergence speed of popular training algorithms, such as distributed SGD, is negligible. However, the communications savings enabled by NC are substantial, leading to {\em 33-4×4\times improvement in overall theoretical running time}. For applications requiring more aggressive compression, we generalize NC to {\em natural dithering}, which we prove is {\em exponentially better} than the common random dithering technique. Our compression operators can be used on their own or in combination with existing operators for a more aggressive combined effect, and offer new state-of-the-art both in theory and practice.Comment: 8 pages, 20 pages of Appendix, 6 Tables, 14 Figure
    • …
    corecore