220 research outputs found
Measuring and Understanding Throughput of Network Topologies
High throughput is of particular interest in data center and HPC networks.
Although myriad network topologies have been proposed, a broad head-to-head
comparison across topologies and across traffic patterns is absent, and the
right way to compare worst-case throughput performance is a subtle problem.
In this paper, we develop a framework to benchmark the throughput of network
topologies, using a two-pronged approach. First, we study performance on a
variety of synthetic and experimentally-measured traffic matrices (TMs).
Second, we show how to measure worst-case throughput by generating a
near-worst-case TM for any given topology. We apply the framework to study the
performance of these TMs in a wide range of network topologies, revealing
insights into the performance of topologies with scaling, robustness of
performance across TMs, and the effect of scattered workload placement. Our
evaluation code is freely available
Efficient All-to-All Collective Communication Schedules for Direct-Connect Topologies
The all-to-all collective communications primitive is widely used in machine
learning (ML) and high performance computing (HPC) workloads, and optimizing
its performance is of interest to both ML and HPC communities. All-to-all is a
particularly challenging workload that can severely strain the underlying
interconnect bandwidth at scale. This is mainly because of the quadratic
scaling in the number of messages that must be simultaneously serviced combined
with large message sizes. This paper takes a holistic approach to optimize the
performance of all-to-all collective communications on supercomputer-scale
direct-connect interconnects. We address several algorithmic and practical
challenges in developing efficient and bandwidth-optimal all-to-all schedules
for any topology, lowering the schedules to various backends and fabrics that
may or may not expose additional forwarding bandwidth, establishing an upper
bound on all-to-all throughput, and exploring novel topologies that deliver
near-optimal all-to-all performance
Unconstraining Graph-Constrained Group Testing
In network tomography, one goal is to identify a small set of failed links in a network using as little information as possible. One way of setting up this problem is called graph-constrained group testing. Graph-constrained group testing is a variant of the classical combinatorial group testing problem, where the tests that one is allowed are additionally constrained by a graph. In this case, the graph is given by the underlying network topology.
The main contribution of this work is to show that for most graphs, the constraints imposed by the graph are no constraint at all. That is, the number of tests required to identify the failed links in graph-constrained group testing is near-optimal even for the corresponding group testing problem with no graph constraints. Our approach is based on a simple randomized construction of tests. To analyze our construction, we prove new results about the size of giant components in randomly sparsified graphs.
Finally, we provide empirical results which suggest that our connected-subgraph tests perform better not just in theory but also in practice, and in particular perform better on a real-world network topology
Shortest Path versus Multi-Hub Routing in Networks with Uncertain Demand
We study a class of robust network design problems motivated by the need to
scale core networks to meet increasingly dynamic capacity demands. Past work
has focused on designing the network to support all hose matrices (all matrices
not exceeding marginal bounds at the nodes). This model may be too conservative
if additional information on traffic patterns is available. Another extreme is
the fixed demand model, where one designs the network to support peak
point-to-point demands. We introduce a capped hose model to explore a broader
range of traffic matrices which includes the above two as special cases. It is
known that optimal designs for the hose model are always determined by
single-hub routing, and for the fixed- demand model are based on shortest-path
routing. We shed light on the wider space of capped hose matrices in order to
see which traffic models are more shortest path-like as opposed to hub-like. To
address the space in between, we use hierarchical multi-hub routing templates,
a generalization of hub and tree routing. In particular, we show that by adding
peak capacities into the hose model, the single-hub tree-routing template is no
longer cost-effective. This initiates the study of a class of robust network
design (RND) problems restricted to these templates. Our empirical analysis is
based on a heuristic for this new hierarchical RND problem. We also propose
that it is possible to define a routing indicator that accounts for the
strengths of the marginals and peak demands and use this information to choose
the appropriate routing template. We benchmark our approach against other
well-known routing templates, using representative carrier networks and a
variety of different capped hose traffic demands, parameterized by the relative
importance of their marginals as opposed to their point-to-point peak demands
- …