220 research outputs found

    Measuring and Understanding Throughput of Network Topologies

    Full text link
    High throughput is of particular interest in data center and HPC networks. Although myriad network topologies have been proposed, a broad head-to-head comparison across topologies and across traffic patterns is absent, and the right way to compare worst-case throughput performance is a subtle problem. In this paper, we develop a framework to benchmark the throughput of network topologies, using a two-pronged approach. First, we study performance on a variety of synthetic and experimentally-measured traffic matrices (TMs). Second, we show how to measure worst-case throughput by generating a near-worst-case TM for any given topology. We apply the framework to study the performance of these TMs in a wide range of network topologies, revealing insights into the performance of topologies with scaling, robustness of performance across TMs, and the effect of scattered workload placement. Our evaluation code is freely available

    Efficient All-to-All Collective Communication Schedules for Direct-Connect Topologies

    Full text link
    The all-to-all collective communications primitive is widely used in machine learning (ML) and high performance computing (HPC) workloads, and optimizing its performance is of interest to both ML and HPC communities. All-to-all is a particularly challenging workload that can severely strain the underlying interconnect bandwidth at scale. This is mainly because of the quadratic scaling in the number of messages that must be simultaneously serviced combined with large message sizes. This paper takes a holistic approach to optimize the performance of all-to-all collective communications on supercomputer-scale direct-connect interconnects. We address several algorithmic and practical challenges in developing efficient and bandwidth-optimal all-to-all schedules for any topology, lowering the schedules to various backends and fabrics that may or may not expose additional forwarding bandwidth, establishing an upper bound on all-to-all throughput, and exploring novel topologies that deliver near-optimal all-to-all performance

    Unconstraining Graph-Constrained Group Testing

    Get PDF
    In network tomography, one goal is to identify a small set of failed links in a network using as little information as possible. One way of setting up this problem is called graph-constrained group testing. Graph-constrained group testing is a variant of the classical combinatorial group testing problem, where the tests that one is allowed are additionally constrained by a graph. In this case, the graph is given by the underlying network topology. The main contribution of this work is to show that for most graphs, the constraints imposed by the graph are no constraint at all. That is, the number of tests required to identify the failed links in graph-constrained group testing is near-optimal even for the corresponding group testing problem with no graph constraints. Our approach is based on a simple randomized construction of tests. To analyze our construction, we prove new results about the size of giant components in randomly sparsified graphs. Finally, we provide empirical results which suggest that our connected-subgraph tests perform better not just in theory but also in practice, and in particular perform better on a real-world network topology

    Shortest Path versus Multi-Hub Routing in Networks with Uncertain Demand

    Full text link
    We study a class of robust network design problems motivated by the need to scale core networks to meet increasingly dynamic capacity demands. Past work has focused on designing the network to support all hose matrices (all matrices not exceeding marginal bounds at the nodes). This model may be too conservative if additional information on traffic patterns is available. Another extreme is the fixed demand model, where one designs the network to support peak point-to-point demands. We introduce a capped hose model to explore a broader range of traffic matrices which includes the above two as special cases. It is known that optimal designs for the hose model are always determined by single-hub routing, and for the fixed- demand model are based on shortest-path routing. We shed light on the wider space of capped hose matrices in order to see which traffic models are more shortest path-like as opposed to hub-like. To address the space in between, we use hierarchical multi-hub routing templates, a generalization of hub and tree routing. In particular, we show that by adding peak capacities into the hose model, the single-hub tree-routing template is no longer cost-effective. This initiates the study of a class of robust network design (RND) problems restricted to these templates. Our empirical analysis is based on a heuristic for this new hierarchical RND problem. We also propose that it is possible to define a routing indicator that accounts for the strengths of the marginals and peak demands and use this information to choose the appropriate routing template. We benchmark our approach against other well-known routing templates, using representative carrier networks and a variety of different capped hose traffic demands, parameterized by the relative importance of their marginals as opposed to their point-to-point peak demands
    • …
    corecore