15 research outputs found

    OSMOSIS: Enabling Multi-Tenancy in Datacenter SmartNICs

    Full text link
    Multi-tenancy is essential for unleashing SmartNIC's potential in datacenters. Our systematic analysis in this work shows that existing on-path SmartNICs have resource multiplexing limitations. For example, existing solutions lack multi-tenancy capabilities such as performance isolation and QoS provisioning for compute and IO resources. Compared to standard NIC data paths with a well-defined set of offloaded functions, unpredictable execution times of SmartNIC kernels make conventional approaches for multi-tenancy and QoS insufficient. We fill this gap with OSMOSIS, a SmartNICs resource manager co-design. OSMOSIS extends existing OS mechanisms to enable dynamic hardware resource multiplexing on top of the on-path packet processing data plane. We implement OSMOSIS within an open-source RISC-V-based 400Gbit/s SmartNIC. Our performance results demonstrate that OSMOSIS fully supports multi-tenancy and enables broader adoption of SmartNICs in datacenters with low overhead.Comment: 12 pages, 14 figures, 103 reference

    A Comprehensive Study on Off-path SmartNIC

    Full text link
    SmartNIC has recently emerged as an attractive device to accelerate distributed systems. However, there has been no comprehensive characterization of SmartNIC especially on the network part. This paper presents the first comprehensive study of off-path SmartNIC. Our experimental study uncovers the key performance characteristics of the communication among the client, SmartNIC SoC, and the host. We find without considering SmartNIC hardware architecture, communications with it can cause up to 48% bandwidth degradation due to performance anomalies. We also propose implications to address the anomalies.Comment: This is the short version. Full version will appear at OSDI2

    RDMA is Turing complete, we just did not know it yet!

    Full text link
    It is becoming increasingly popular for distributed systems to exploit offload to reduce load on the CPU. Remote Direct Memory Access (RDMA) offload, in particular, has become popular. However, RDMA still requires CPU intervention for complex offloads that go beyond simple remote memory access. As such, the offload potential is limited and RDMA-based systems usually have to work around such limitations. We present RedN, a principled, practical approach to implementing complex RDMA offloads, without requiring any hardware modifications. Using self-modifying RDMA chains, we lift the existing RDMA verbs interface to a Turing complete set of programming abstractions. We explore what is possible in terms of offload complexity and performance with a commodity RDMA NIC. We show how to integrate these RDMA chains into applications, such as the Memcached key-value store, allowing us to offload complex tasks such as key lookups. RedN can reduce the latency of key-value get operations by up to 2.6x compared to state-of-the-art KV designs that use one-sided RDMA primitives (e.g., FaRM-KV), as well as traditional RPC-over-RDMA approaches. Moreover, compared to these baselines, RedN provides performance isolation and, in the presence of contention, can reduce latency by up to 35x while providing applications with failure resiliency to OS and process crashes.Comment: Updated to NSDI 2022 versio

    MEC-Intelligent Agent Support for Low-Latency Data Plane in Private NextG Core

    Full text link
    Private 5G networks will soon be ubiquitous across the future-generation smart wireless access infrastructures hosting a wide range of performance-critical applications. A high-performing User Plane Function (UPF) in the data plane is critical to achieving such stringent performance goals, as it governs fast packet processing and supports several key control-plane operations. Based on a private 5G prototype implementation and analysis, it is imperative to perform dynamic resource management and orchestration at the UPF. This paper leverages Mobile Edge Cloud-Intelligent Agent (MEC-IA), a logically centralized entity that proactively distributes resources at UPF for various service types, significantly reducing the tail latency experienced by the user requests while maximizing resource utilization. Extending the MEC-IA functionality to MEC layers further incurs data plane latency reduction. Based on our extensive simulations, under skewed uRLLC traffic arrival, the MEC-IA assisted bestfit UPF-MEC scheme reduces the worst-case latency of UE requests by up to 77.8% w.r.t. baseline. Additionally, the system can increase uRLLC connectivity gain by 2.40x while obtaining 40% CapEx savings

    Revisiting the Classics: Online RL in the Programmable Dataplane

    Get PDF
    Data-driven networking is becoming more capable and widely researched, partly driven by the efficacy of Deep Reinforcement Learning (DRL) algorithms. Yet the complexity of both DRL inference and learning force these tasks to be pushed away from the dataplane to hosts, harming latency-sensitive applications. Online learning of such policies cannot occur in the dataplane, despite being useful techniques when problems evolve or are hard to model.We present OPaL—On Path Learning—the first work to bring online reinforcement learning to the dataplane. OPaL makes online learning possible in constrained SmartNIC hardware by returning to classical RL techniques—avoiding neural networks. Our design allows weak yet highly parallel SmartNIC NPUs to be competitive against commodity x86 hosts, despite having fewer features and slower cores. Compared to hosts, we achieve a 21 × reduction in 99.99th tail inference times to 34 µs, and 9.9 × improvement in online throughput for real-world policy designs. In-NIC execution eliminates PCIe transfers, and our asynchronous compute model ensures minimal impact on traffic carried by a co-hosted P4 dataplane. OPaL’s design scales with additional resources at compile-time to improve upon both decision latency and throughput, and is quickly reconfigurable at runtime compared to reinstalling device firmware

    High-performance software packet processing

    Full text link
    In today’s Internet, it is highly desirable to have fast and scalable software packet processing solutions for network applications that run on commodity hardware. The advent of cloud computing drives the continued rapid growth of Internet traffic. Moreover, the development of emerging networking techniques, such as Network Function Virtualization, significantly shapes the need for implementing the network functions in software. Finally, with the advancement of modern platforms as well as software frameworks for packet processing, network applications have potential to process 100+ Gbps network traffic on a single commodity server. Representative frameworks include the Click modular router, the RouteBricks scalable routing architecture, and BUFFALO, the software-based Ethernet switch. Beneath this general-purpose routing and switching functionality lie a broad set of network applications, many of which are handled with custom methods to provide cost-effectiveness and flexibility. This thesis considers two long-standing networking applications, IP lookup and distributed denial-of-service (DDoS) mitigation, and proposes efficient software-based methods drawing from this new perspective. In this thesis, we first introduce several optimization techniques to accelerate network applications by taking advantage of modern CPU features. Then, we explore the IP lookup problem to find the longest matching prefix of an IP address in a set of prefixes. An ideal IP lookup algorithm should achieve small constant IP lookup time, and on-chip memory usage. However, no prior IP lookup algorithm achieves both requirements at the same time. We propose SAIL, a splitting approach to IP lookup, and a suite of algorithms for IP lookup based on SAIL framework. We conducted extensive experiments to evaluate our algorithms, and experimental results show that our SAIL algorithms are much faster than well-known IP lookup algorithms. Next, we switch our focus to DDoS, an attempt to disrupt the legitimate traffic of a victim by sending a flood of Internet traffic from different sources. Our solution is Gatekeeper, the first open-source and deployable DDoS mitigation system. We present a series of optimization techniques, including use of modern platforms, group prefetching, coroutines, and hashing, to accelerate Gatekeeper. Experimental results show that these optimization techniques significantly improve its performance over alternative baseline solutions.2022-01-30T00:00:00

    NATRA: Network ACK-Based Traffic Reduction Algorithm

    Get PDF
    Traffic monitoring involves packet capturing and processing at a very high rate of packets per second. Typically, flow records are generated from the packet traffic, such as TCP flow records that feature the number of bytes and packets in each direction, flow duration, number of different ports, and other metrics. Delivering such flow records, about network traffic flowing at tens of Gbps is rather challenging in terms of processing power. To address this problem, traffic thinning can be applied to reduce the input load, by swiftly discarding useless packets at the sniffer NIC or driver level, which effectively reduces the load on software layers that handle traffic processing. This work proposes an algorithm that drops empty ACK packets from TCP traffic, thus achieving a significant reduction in the packets per second that must be handled by each traffic module. The tests discussed below show that the algorithm achieves a 25% decrease in the packets per second rate with minimal information loss.This work was supported by the Spanish Ministry of Economy and Competitiveness and the European Regional Development Fund through the projects TRAFICA under Grant MINECO/FEDER TEC2015-69417-C2-1-R and Procesado Inteligente de Tráfico under Grant MINECO/FEDER TEC2015-69417-C2-2-R
    corecore