27 research outputs found

    FlexVC: Flexible virtual channel management in low-diameter networks

    Get PDF
    Deadlock avoidance mechanisms for lossless lowdistance networks typically increase the order of virtual channel (VC) index with each hop. This restricts the number of buffer resources depending on the routing mechanism and limits performance due to an inefficient use. Dynamic buffer organizations increase implementation complexity and only provide small gains in this context because a significant amount of buffering needs to be allocated statically to avoid congestion. We introduce FlexVC, a simple buffer management mechanism which permits a more flexible use of VCs. It combines statically partitioned buffers, opportunistic routing and a relaxed distancebased deadlock avoidance policy. FlexVC mitigates Head-of-Line blocking and reduces up to 50% the memory requirements. Simulation results in a Dragonfly network show congestion reduction and up to 37.8% throughput improvement, outperforming more complex dynamic approaches. FlexVC merges different flows of traffic in the same buffers, which in some cases makes more difficult to identify the traffic pattern in order to support nonminimal adaptive routing. An alternative denoted FlexVCminCred improves congestion sensing for adaptive routing by tracking separately packets routed minimally and nonminimally, rising throughput up to 20.4% with 25% savings in buffer area.This work has been supported by the Spanish Government (grant SEV2015-0493 of the Severo Ochoa Program), the Spanish Ministry of Economy, Industry and Competitiveness (contracts TIN2015-65316), the Spanish Research Agency (AEI/FEDER, UE - TIN2016-76635-C2-2-R), the Spanish Ministry of Education (FPU grant FPU13/00337), the Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014- SGR-1272), the European Union FP7 programme (RoMoL ERC Advanced Grant GA 321253), the European HiPEAC Network of Excellence and the European Union’s Horizon 2020 research and innovation programme (Mont-Blanc project under grant agreement No 671697).Peer ReviewedPostprint (author's final draft

    Characterizing the Communication Demands of the Graph500 Benchmark on a Commodity Cluster

    Get PDF
    Big Data applications have gained importance over the last few years. Such applications focus on the analysis of huge amounts of unstructured information and present a series of differences with traditional High Performance Computing (HPC) applications. For illustrating such dissimilarities, this paper analyzes the behavior of the most scalable version of the Graph500 benchmark when run on a state-of-the-art commodity cluster facility. Our work shows that this new computation paradigm stresses the interconnection subsystem. In this work, we provide both analytical and empirical characterizations of the Graph500 benchmark, showing that its communication needs bound the achieved performance on a cluster facility. Up to our knowledge, our evaluation is the first to consider the impact of message aggregation on the communication overhead and explore a tradeoff that diminishes benchmark execution time, increasing system performance

    Contention-based Nonminimal Adaptive Routing in High-radix Networks

    Get PDF
    Adaptive routing is an efficient congestion avoidance mechanism for modern Datacenter and HPC networks. Congestion detection traditionally relies on the occupancy of the router queues. However, this approach can hinder performance due to coarse-grain measurements with small buffers, and potential routing oscillations with large buffers. We introduce an alternative mechanism, labelled Contention-Based Adaptive Routing. Our mechanism adapts routing based on an estimation of “network contention”, the simultaneity of traffic flows contending for a network port. Our system employs a set of counters which track the demand for each output port. This exploits path diversity thanks to earlier detection of adversarial traffic patterns, and decouples buffer size and queue occupancy from contention detection. We evaluate our mechanism in a Dragonfly network. Our evaluations show this mechanism achieves optimal latency under uniform traffic and similar to best previous routing mechanisms under adversarial patterns, with immediate adaptation to traffic pattern changes

    On-the-Fly Adaptive Routing for dragonfly interconnection networks

    Get PDF
    Adaptive deadlock-free routing mechanisms are required to handle variable traffic patterns in dragonfly networks. However, distance-based deadlock avoidance mechanisms typically employed in Dragonflies increase the router cost and complexity as a function of the maximum allowed path length. This paper presents on-the-fly adaptive routing (OFAR), a routing/flow-control scheme that decouples the routing and the deadlock avoidance mechanisms. OFAR allows for in-transit adaptive routing with local and global misrouting, without imposing dependencies between virtual channels, and relying on a deadlock-free escape subnetwork to avoid deadlock. This model lowers latency, increases throughput, and adapts faster to transient traffic than previously proposed mechanisms. The low capacity of the escape subnetwork makes it prone to congestion. A simple congestion management mechanism based on injection restriction is considered to avoid such issues. Finally, reliability is considered by introducing mechanisms to find multiple edge-disjoint Hamiltonian rings embedded on the dragonfly, allowing to use multiple escape subnetworks

    Discrete Gabor Transformatie

    No full text

    Fair Integrated Scheduling of Unicast and Multicast Traffic in an Input-Queued Switch

    No full text
    Abstract — We present a scheme to concurrently schedule unicast and multicast traffic in an input-queued switch. It aims at providing high performance under any mix of the two traffic types as well as at avoiding starvation of any connection. The key idea is to schedule the two traffic types independently and in parallel, and then arbitrate among them for access to the switching fabric. The unicast and multicast matchings are combined into a single integrated matching. Edges that are excluded from the integrated matching are guaranteed to receive service at a later time, thus preventing starvation. We use simulation to evaluate the performance of a system employing the proposed scheme and show that, despite its simplicity, the scheme achieves the intended goals. We also design an enhanced remainder-service policy to achieve better integration and further improve performance. I

    Speculative Flow Control for High-Radix Datacenter Interconnect Routers

    No full text
    High-radix switches are desirable building blocks for large computer interconnection networks, because they are more suitable to convert chip I/O bandwidth into low latency and low cost than low-radix switches [10]. Unfortunately, most existing switch architectures do not scale well to a large number of ports. For example, the complexity of the buffered crossbar architecture scales quadratically with the number of ports. Compounded with support for long round-trip times and many virtual channels, the overall buffer requirements limit the feasibility of such switches to modest port counts. Compromising on the buffer sizing leads to a drastic increase in latency and reduction in throughput, as long as traditional credit flow control is employed at the link level. We propose a novel link-level flow control protocol that enables high-performance scalable routers based on the increasingly popular buffered crossbar architecture to scale to higher port counts without sacrificing performance. By combining credited and speculative transmission, this scheme achieves reliable delivery, low latency, and high throughput, even with crosspoint buffers that are significantly smaller than the round-trip time.

    Performance evaluation of the Data Vortex photonic switch

    No full text
    corecore