Revisiting the Classics: Online RL in the Programmable Dataplane

Abstract

Data-driven networking is becoming more capable and widely researched, partly driven by the efficacy of Deep Reinforcement Learning (DRL) algorithms. Yet the complexity of both DRL inference and learning force these tasks to be pushed away from the dataplane to hosts, harming latency-sensitive applications. Online learning of such policies cannot occur in the dataplane, despite being useful techniques when problems evolve or are hard to model.We present OPaL—On Path Learning—the first work to bring online reinforcement learning to the dataplane. OPaL makes online learning possible in constrained SmartNIC hardware by returning to classical RL techniques—avoiding neural networks. Our design allows weak yet highly parallel SmartNIC NPUs to be competitive against commodity x86 hosts, despite having fewer features and slower cores. Compared to hosts, we achieve a 21 × reduction in 99.99th tail inference times to 34 µs, and 9.9 × improvement in online throughput for real-world policy designs. In-NIC execution eliminates PCIe transfers, and our asynchronous compute model ensures minimal impact on traffic carried by a co-hosted P4 dataplane. OPaL’s design scales with additional resources at compile-time to improve upon both decision latency and throughput, and is quickly reconfigurable at runtime compared to reinstalling device firmware

    Similar works