19,787 research outputs found

    Contention-based Nonminimal Adaptive Routing in High-radix Networks

    Get PDF
    Adaptive routing is an efficient congestion avoidance mechanism for modern Datacenter and HPC networks. Congestion detection traditionally relies on the occupancy of the router queues. However, this approach can hinder performance due to coarse-grain measurements with small buffers, and potential routing oscillations with large buffers. We introduce an alternative mechanism, labelled Contention-Based Adaptive Routing. Our mechanism adapts routing based on an estimation of “network contention”, the simultaneity of traffic flows contending for a network port. Our system employs a set of counters which track the demand for each output port. This exploits path diversity thanks to earlier detection of adversarial traffic patterns, and decouples buffer size and queue occupancy from contention detection. We evaluate our mechanism in a Dragonfly network. Our evaluations show this mechanism achieves optimal latency under uniform traffic and similar to best previous routing mechanisms under adversarial patterns, with immediate adaptation to traffic pattern changes

    OFAR-CM: Efficient Dragonfly networks with simple congestion management

    Get PDF
    Dragonfly networks are appealing topologies for large-scale Data center and HPC networks, that provide high throughput with low diameter and moderate cost. However, they are prone to congestion under certain frequent traffic patterns that saturate specific network links. Adaptive non-minimal routing can be used to avoid such congestion. That kind of routing employs longer paths to circumvent local or global congested links. However, if a distance-based deadlock avoidance mechanism is employed, more Virtual Channels (VCs) are required, what increases design complexity and cost. OFAR (On-the-Fly Adaptive Routing) is a previously proposed routing that decouples VCs from deadlock avoidance, making local and global misrouting affordable. However, the severity of congestion with OFAR is higher, as it relies on an escape sub network with low bisection bandwidth. Additionally, OFAR allows for unlimited misroutings on the escape sub network, leading to unbounded paths in the network and long latencies. In this paper we propose and evaluate OFAR-CM, a variant of OFAR combined with a simple congestion management (CM) mechanism which only relies on local information, specifically the credit count of the output ports in the local router. With simple escape sub networks such as a Hamiltonian ring or a tree, OFAR outperforms former proposals with distance-based deadlock avoidance. Additionally, although long paths are allowed in theory, in practice packets arrive at their destination in a small number of hops. Altogether, OFAR-CM constitutes the first practicable mechanism to the date that supports both local and global misrouting in Dragonfly networks.The research leading to these results has received funding from the European Research Council under the European Union’s Seventh Framework Programme (FP/2007-2013) / ERC Grant Agreement n. ERC-2012-Adg-321253- RoMoL, the Spanish Ministry of Science under contracts TIN2010-21291-C02-02, TIN2012-34557, and by the European HiPEAC Network of Excellence. M. García participated in this work while affiliated with the University of Cantabria.Peer ReviewedPostprint (author's final draft

    Modeling the interaction between TCP and Rate Adaptation

    Get PDF
    In this paper, we model and investigate the interaction between the TCP protocol and rate adaptation at intermediate routers. Rate adaptation aims at saving energy by controlling the offered capacity of links and adapting it to the amount of traffic. However, when TCP is used at the transport layer, the control loop of rate adaptation and one of the TCP congestion control mechanism might interact and disturb each other, compromising throughput and Quality of Service (QoS). Our investigation is lead through mathematical modeling consisting in depicting the behavior of TCP and of rate adaption through a set of Delay Differential Equations (DDEs). The model is validated against simulation results and it is shown to be accurate. The results of the sensitivity analysis of the system performance to control parameters show that rate adaptation can be effective but a careful parameter setting is needed to avoid undesired disruptive interaction among controllers at different levels, that impair QoS
    corecore