7 research outputs found
P4-CoDel:Experiences on Programmable Data Plane Hardware
Fixed buffer sizing in computer networks, especially the Internet, is a
compromise between latency and bandwidth. A decision in favor of high
bandwidth, implying larger buffers, subordinates the latency as a consequence
of constantly filled buffers. This phenomenon is called Bufferbloat. Active
Queue Management (AQM) algorithms such as CoDel or PIE, designed for the use on
software based hosts, offer a flow agnostic remedy to Bufferbloat by
controlling the queue filling and hence the latency through subtle packet
drops. In previous work, we have shown that the data plane programming language
P4 is powerful enough to implement the CoDel algorithm. While legacy software
algorithms can be easily compiled onto almost any processing architecture, this
is not generally true for AQM on programmable data plane hardware, i.e.,
programmable packet processors. In this work, we highlight corresponding
challenges, demonstrate how to tackle them, and provide techniques enabling the
implementation of such AQM algorithms on different high speed P4-programmable
data plane hardware targets. In addition, we provide measurement results
created on different P4-programmable data plane targets. The resulting latency
measurements reveal the feasibility and the constraints to be considered to
perform Active Queue Management within these devices. Finally, we release the
source code and instructions to reproduce the results in this paper as open
source to the research community
P4-CoDel: Experiences on Programmable Data Plane Hardware
Fixed buffer sizing in computer networks, especially the Internet, is a
compromise between latency and bandwidth. A decision in favor of high
bandwidth, implying larger buffers, subordinates the latency as a consequence
of constantly filled buffers. This phenomenon is called Bufferbloat. Active
Queue Management (AQM) algorithms such as CoDel or PIE, designed for the use on
software based hosts, offer a flow agnostic remedy to Bufferbloat by
controlling the queue filling and hence the latency through subtle packet
drops. In previous work, we have shown that the data plane programming language
P4 is powerful enough to implement the CoDel algorithm. While legacy software
algorithms can be easily compiled onto almost any processing architecture, this
is not generally true for AQM on programmable data plane hardware, i.e.,
programmable packet processors. In this work, we highlight corresponding
challenges, demonstrate how to tackle them, and provide techniques enabling the
implementation of such AQM algorithms on different high speed P4-programmable
data plane hardware targets. In addition, we provide measurement results
created on different P4-programmable data plane targets. The resulting latency
measurements reveal the feasibility and the constraints to be considered to
perform Active Queue Management within these devices. Finally, we release the
source code and instructions to reproduce the results in this paper as open
source to the research community
RIFO: Pushing the Efficiency of Programmable Packet Schedulers
Packet scheduling is a fundamental networking task that recently received
renewed attention in the context of programmable data planes. Programmable
packet scheduling systems such as those based on Push-In First-Out (PIFO)
abstraction enabled flexible scheduling policies, but are too
resource-expensive for large-scale line rate operation. This prompted research
into practical programmable schedulers (e.g., SP-PIFO, AIFO) approximating PIFO
behavior on regular hardware. Yet, their scalability remains limited due to
extensive number of memory operations. To address this, we design an effective
yet resource-efficient packet scheduler, Range-In First-Out (RIFO), which uses
only three mutable memory cells and one FIFO queue per PIFO queue. RIFO is
based on multi-criteria decision-making principles and uses small guaranteed
admission buffers. Our large-scale simulations in Netbench demonstrate that
despite using fewer resources, RIFO generally achieves competitive flow
completion times across all studied workloads, and is especially effective in
workloads with a significant share of large flows, reducing flow completion
time up to 2.9x in Datamining workloads compared to state-of-the-art solutions.
Our prototype implementation using P4 on Tofino switches requires only 650
lines of code, is scalable, and runs at line rate
Enhancing User Experience by Extracting Application Intelligence from Network Traffic
Internet Service Providers (ISPs) continue to get complaints from users on poor experience for diverse Internet applications ranging from video streaming and gaming to social media and teleconferencing. Identifying and rectifying the root cause of these experience events requires the ISP to know more than just coarse-grained measures like link utilizations and packet losses. Application classification and experience measurement using traditional deep packet inspection (DPI) techniques is starting to fail with the increasing adoption of traffic encryption and is not cost-effective with the explosive growth in traffic rates. This thesis leverages the emerging paradigms of machine learning and programmable networks to design and develop systems that can deliver application-level intelligence to ISPs at scale, cost, and accuracy that has hitherto not been achieved before.
This thesis makes four new contributions. Our first contribution develops a novel transformer-based neural network model that classifies applications based on their traffic shape, agnostic to encryption. We show that this approach has over 97% f1-score for diverse application classes such as video streaming and gaming. Our second contribution builds and validates algorithmic and machine learning models to estimate user experience metrics for on-demand and live video streaming applications such as bitrate, resolution, buffer states, and stalls. For our third contribution, we analyse ten popular latency-sensitive online multiplayer games and develop data structures and algorithms to rapidly and accurately detect each game using automatically generated signatures. By combining this with active latency measurement and geolocation analysis of the game servers, we help ISPs determine better routing paths to reduce game latency. Our fourth and final contribution develops a prototype of a self-driving network that autonomously intervenes just-in-time to alleviate the suffering of applications that are being impacted by transient congestion. We design and build a complete system that extracts application-aware network telemetry from programmable switches and dynamically adapts the QoS policies to manage the bottleneck resources in an application-fair manner. We show that it outperforms known queue management techniques in various traffic scenarios. Taken together, our contributions allow ISPs to measure and tune their networks in an application-aware manner to offer their users the best possible experience
Empowering Cloud Data Centers with Network Programmability
Cloud data centers are a critical infrastructure for modern Internet services such as web search, social networking and e-commerce. However, the gradual slow-down of Moore’s law has put a burden on the growth of data centers’ performance and energy efficiency. In addition, the increasing of millisecond-scale and microsecond-scale tasks also bring higher requirements to the throughput and latency for the cloud applications. Today’s server-based solutions are hard to meet the performance requirements in many scenarios like resource management, scheduling, high-speed traffic monitoring and testing.
In this dissertation, we study these problems from a network perspective. We investigate a new architecture that leverages the programmability of new-generation network switches to improve the performance and reliability of clouds. As programmable switches only provide very limited memory and functionalities, we exploit compact data structures and deeply co-design software and hardware to best utilize the resource. More specifically, this dissertation presents four systems:
(i) NetLock: A new centralized lock management architecture that co-designs programmable switches and servers to simultaneously achieve high performance and rich policy support. It provides orders-of-magnitude higher throughput than existing systems with microsecond-level latency, and supports many commonly-used policies such as performance isolation.
(ii) HCSFQ: A scalable and practical solution to implement hierarchical fair queueing on commodity hardware at line rate. Instead of relying on a hierarchy of queues with complex queue management, HCSFQ does not keep per-flow states and uses only one queue to achieve hierarchical fair queueing.
(iii) AIFO: A new approach for programmable packet scheduling that only uses a single FIFO queue. AIFO utilizes an admission control mechanism to approximate PIFO which is theoretically ideal but hard to implement with commodity devices.
(iv) Lumina: A tool that enables fine-grained analysis of hardware network stack. By exploiting network programmability to emulate various network scenarios, Lumina is able to help users understand the micro-behaviors of hardware network stacks
Improving Resource Efficiency in Cloud Computing
Customers inside the cloud computing market are heterogeneous in several aspects, e.g., willingness to pay and performance requirement. By taking advantage of trade-offs created by these heterogeneities, the service provider can realize a more efficient system. This thesis is concerned with methods to improve the utilization of cloud infrastructure resources, and with the role of pricing in realizing those improvements and leveraging heterogeneity. Towards improving utilization, we explore methods to optimize network usage through traffic engineering. Particularly, we introduce a novel optimization framework to decrease the bandwidth required by inter-data center networks through traffic scheduling and shaping, and then propose algorithms to improve network utilization based on the analytical results derived from the optimization. When considering pricing, we focus on elucidating conditions under which providing a mix of services can increase a service provider\u27s revenue. Specifically, we characterize the conditions under which providing a ``delayed\u27\u27 service can result in a higher revenue for the service provider, and then offer guidelines for both users and providers