7 research outputs found

    P4-CoDel:Experiences on Programmable Data Plane Hardware

    Get PDF
    Fixed buffer sizing in computer networks, especially the Internet, is a compromise between latency and bandwidth. A decision in favor of high bandwidth, implying larger buffers, subordinates the latency as a consequence of constantly filled buffers. This phenomenon is called Bufferbloat. Active Queue Management (AQM) algorithms such as CoDel or PIE, designed for the use on software based hosts, offer a flow agnostic remedy to Bufferbloat by controlling the queue filling and hence the latency through subtle packet drops. In previous work, we have shown that the data plane programming language P4 is powerful enough to implement the CoDel algorithm. While legacy software algorithms can be easily compiled onto almost any processing architecture, this is not generally true for AQM on programmable data plane hardware, i.e., programmable packet processors. In this work, we highlight corresponding challenges, demonstrate how to tackle them, and provide techniques enabling the implementation of such AQM algorithms on different high speed P4-programmable data plane hardware targets. In addition, we provide measurement results created on different P4-programmable data plane targets. The resulting latency measurements reveal the feasibility and the constraints to be considered to perform Active Queue Management within these devices. Finally, we release the source code and instructions to reproduce the results in this paper as open source to the research community

    P4-CoDel: Experiences on Programmable Data Plane Hardware

    Full text link
    Fixed buffer sizing in computer networks, especially the Internet, is a compromise between latency and bandwidth. A decision in favor of high bandwidth, implying larger buffers, subordinates the latency as a consequence of constantly filled buffers. This phenomenon is called Bufferbloat. Active Queue Management (AQM) algorithms such as CoDel or PIE, designed for the use on software based hosts, offer a flow agnostic remedy to Bufferbloat by controlling the queue filling and hence the latency through subtle packet drops. In previous work, we have shown that the data plane programming language P4 is powerful enough to implement the CoDel algorithm. While legacy software algorithms can be easily compiled onto almost any processing architecture, this is not generally true for AQM on programmable data plane hardware, i.e., programmable packet processors. In this work, we highlight corresponding challenges, demonstrate how to tackle them, and provide techniques enabling the implementation of such AQM algorithms on different high speed P4-programmable data plane hardware targets. In addition, we provide measurement results created on different P4-programmable data plane targets. The resulting latency measurements reveal the feasibility and the constraints to be considered to perform Active Queue Management within these devices. Finally, we release the source code and instructions to reproduce the results in this paper as open source to the research community

    RIFO: Pushing the Efficiency of Programmable Packet Schedulers

    Full text link
    Packet scheduling is a fundamental networking task that recently received renewed attention in the context of programmable data planes. Programmable packet scheduling systems such as those based on Push-In First-Out (PIFO) abstraction enabled flexible scheduling policies, but are too resource-expensive for large-scale line rate operation. This prompted research into practical programmable schedulers (e.g., SP-PIFO, AIFO) approximating PIFO behavior on regular hardware. Yet, their scalability remains limited due to extensive number of memory operations. To address this, we design an effective yet resource-efficient packet scheduler, Range-In First-Out (RIFO), which uses only three mutable memory cells and one FIFO queue per PIFO queue. RIFO is based on multi-criteria decision-making principles and uses small guaranteed admission buffers. Our large-scale simulations in Netbench demonstrate that despite using fewer resources, RIFO generally achieves competitive flow completion times across all studied workloads, and is especially effective in workloads with a significant share of large flows, reducing flow completion time up to 2.9x in Datamining workloads compared to state-of-the-art solutions. Our prototype implementation using P4 on Tofino switches requires only 650 lines of code, is scalable, and runs at line rate

    Enhancing User Experience by Extracting Application Intelligence from Network Traffic

    Full text link
    Internet Service Providers (ISPs) continue to get complaints from users on poor experience for diverse Internet applications ranging from video streaming and gaming to social media and teleconferencing. Identifying and rectifying the root cause of these experience events requires the ISP to know more than just coarse-grained measures like link utilizations and packet losses. Application classification and experience measurement using traditional deep packet inspection (DPI) techniques is starting to fail with the increasing adoption of traffic encryption and is not cost-effective with the explosive growth in traffic rates. This thesis leverages the emerging paradigms of machine learning and programmable networks to design and develop systems that can deliver application-level intelligence to ISPs at scale, cost, and accuracy that has hitherto not been achieved before. This thesis makes four new contributions. Our first contribution develops a novel transformer-based neural network model that classifies applications based on their traffic shape, agnostic to encryption. We show that this approach has over 97% f1-score for diverse application classes such as video streaming and gaming. Our second contribution builds and validates algorithmic and machine learning models to estimate user experience metrics for on-demand and live video streaming applications such as bitrate, resolution, buffer states, and stalls. For our third contribution, we analyse ten popular latency-sensitive online multiplayer games and develop data structures and algorithms to rapidly and accurately detect each game using automatically generated signatures. By combining this with active latency measurement and geolocation analysis of the game servers, we help ISPs determine better routing paths to reduce game latency. Our fourth and final contribution develops a prototype of a self-driving network that autonomously intervenes just-in-time to alleviate the suffering of applications that are being impacted by transient congestion. We design and build a complete system that extracts application-aware network telemetry from programmable switches and dynamically adapts the QoS policies to manage the bottleneck resources in an application-fair manner. We show that it outperforms known queue management techniques in various traffic scenarios. Taken together, our contributions allow ISPs to measure and tune their networks in an application-aware manner to offer their users the best possible experience

    Empowering Cloud Data Centers with Network Programmability

    Get PDF
    Cloud data centers are a critical infrastructure for modern Internet services such as web search, social networking and e-commerce. However, the gradual slow-down of Moore’s law has put a burden on the growth of data centers’ performance and energy efficiency. In addition, the increasing of millisecond-scale and microsecond-scale tasks also bring higher requirements to the throughput and latency for the cloud applications. Today’s server-based solutions are hard to meet the performance requirements in many scenarios like resource management, scheduling, high-speed traffic monitoring and testing. In this dissertation, we study these problems from a network perspective. We investigate a new architecture that leverages the programmability of new-generation network switches to improve the performance and reliability of clouds. As programmable switches only provide very limited memory and functionalities, we exploit compact data structures and deeply co-design software and hardware to best utilize the resource. More specifically, this dissertation presents four systems: (i) NetLock: A new centralized lock management architecture that co-designs programmable switches and servers to simultaneously achieve high performance and rich policy support. It provides orders-of-magnitude higher throughput than existing systems with microsecond-level latency, and supports many commonly-used policies such as performance isolation. (ii) HCSFQ: A scalable and practical solution to implement hierarchical fair queueing on commodity hardware at line rate. Instead of relying on a hierarchy of queues with complex queue management, HCSFQ does not keep per-flow states and uses only one queue to achieve hierarchical fair queueing. (iii) AIFO: A new approach for programmable packet scheduling that only uses a single FIFO queue. AIFO utilizes an admission control mechanism to approximate PIFO which is theoretically ideal but hard to implement with commodity devices. (iv) Lumina: A tool that enables fine-grained analysis of hardware network stack. By exploiting network programmability to emulate various network scenarios, Lumina is able to help users understand the micro-behaviors of hardware network stacks

    Improving Resource Efficiency in Cloud Computing

    Get PDF
    Customers inside the cloud computing market are heterogeneous in several aspects, e.g., willingness to pay and performance requirement. By taking advantage of trade-offs created by these heterogeneities, the service provider can realize a more efficient system. This thesis is concerned with methods to improve the utilization of cloud infrastructure resources, and with the role of pricing in realizing those improvements and leveraging heterogeneity. Towards improving utilization, we explore methods to optimize network usage through traffic engineering. Particularly, we introduce a novel optimization framework to decrease the bandwidth required by inter-data center networks through traffic scheduling and shaping, and then propose algorithms to improve network utilization based on the analytical results derived from the optimization. When considering pricing, we focus on elucidating conditions under which providing a mix of services can increase a service provider\u27s revenue. Specifically, we characterize the conditions under which providing a ``delayed\u27\u27 service can result in a higher revenue for the service provider, and then offer guidelines for both users and providers
    corecore