2 research outputs found

    A Fast and Verified Algorithm for Proving Store-and-Forward Networks Deadlock-Free

    Get PDF
    Contains fulltext : 91481.pdf (author's version ) (Open Access)19th International Euromicro Conference on Parallel, Distributed and Network-Based Processing PDP, Ayia Napa, Cyprus, 9-11 Feb. 2011, 9 februari 201

    RDMA over Commodity Ethernet at Scale

    Get PDF
    ABSTRACT Over the past one and half years, we have been using RDMA over commodity Ethernet (RoCEv2) to support some of Microsoft's highly-reliable, latency-sensitive services. This paper describes the challenges we encountered during the process and the solutions we devised to address them. In order to scale RoCEv2 beyond VLAN, we have designed a DSCP-based priority flow-control (PFC) mechanism to ensure large-scale deployment. We have addressed the safety challenges brought by PFCinduced deadlock (yes, it happened!), RDMA transport livelock, and the NIC PFC pause frame storm problem. We have also built the monitoring and management systems to make sure RDMA works as expected. Our experiences show that the safety and scalability issues of running RoCEv2 at scale can all be addressed, and RDMA can replace TCP for intra data center communications and achieve low latency, low CPU overhead, and high throughput
    corecore