112 research outputs found

    RIFO: Pushing the Efficiency of Programmable Packet Schedulers

    Full text link
    Packet scheduling is a fundamental networking task that recently received renewed attention in the context of programmable data planes. Programmable packet scheduling systems such as those based on Push-In First-Out (PIFO) abstraction enabled flexible scheduling policies, but are too resource-expensive for large-scale line rate operation. This prompted research into practical programmable schedulers (e.g., SP-PIFO, AIFO) approximating PIFO behavior on regular hardware. Yet, their scalability remains limited due to extensive number of memory operations. To address this, we design an effective yet resource-efficient packet scheduler, Range-In First-Out (RIFO), which uses only three mutable memory cells and one FIFO queue per PIFO queue. RIFO is based on multi-criteria decision-making principles and uses small guaranteed admission buffers. Our large-scale simulations in Netbench demonstrate that despite using fewer resources, RIFO generally achieves competitive flow completion times across all studied workloads, and is especially effective in workloads with a significant share of large flows, reducing flow completion time up to 2.9x in Datamining workloads compared to state-of-the-art solutions. Our prototype implementation using P4 on Tofino switches requires only 650 lines of code, is scalable, and runs at line rate

    P4-CoDel: Experiences on Programmable Data Plane Hardware

    Full text link
    Fixed buffer sizing in computer networks, especially the Internet, is a compromise between latency and bandwidth. A decision in favor of high bandwidth, implying larger buffers, subordinates the latency as a consequence of constantly filled buffers. This phenomenon is called Bufferbloat. Active Queue Management (AQM) algorithms such as CoDel or PIE, designed for the use on software based hosts, offer a flow agnostic remedy to Bufferbloat by controlling the queue filling and hence the latency through subtle packet drops. In previous work, we have shown that the data plane programming language P4 is powerful enough to implement the CoDel algorithm. While legacy software algorithms can be easily compiled onto almost any processing architecture, this is not generally true for AQM on programmable data plane hardware, i.e., programmable packet processors. In this work, we highlight corresponding challenges, demonstrate how to tackle them, and provide techniques enabling the implementation of such AQM algorithms on different high speed P4-programmable data plane hardware targets. In addition, we provide measurement results created on different P4-programmable data plane targets. The resulting latency measurements reveal the feasibility and the constraints to be considered to perform Active Queue Management within these devices. Finally, we release the source code and instructions to reproduce the results in this paper as open source to the research community

    Everything Matters in Programmable Packet Scheduling

    Full text link
    Programmable packet scheduling allows the deployment of scheduling algorithms into existing switches without need for hardware redesign. Scheduling algorithms are programmed by tagging packets with ranks, indicating their desired priority. Programmable schedulers then execute these algorithms by serving packets in the order described in their ranks. The ideal programmable scheduler is a Push-In First-Out (PIFO) queue, which achieves perfect packet sorting by pushing packets into arbitrary positions in the queue, while only draining packets from the head. Unfortunately, implementing PIFO queues in hardware is challenging due to the need to arbitrarily sort packets at line rate based on their ranks. In the last years, various techniques have been proposed, approximating PIFO behaviors using the available resources of existing data planes. While promising, approaches to date only approximate one of the characteristic behaviors of PIFO queues (i.e., its scheduling behavior, or its admission control). We propose PACKS, the first programmable scheduler that fully approximates PIFO queues on all their behaviors. PACKS does so by smartly using a set of strict-priority queues. It uses packet-rank information and queue-occupancy levels at enqueue to decide: whether to admit packets to the scheduler, and how to map admitted packets to the different queues. We fully implement PACKS in P4 and evaluate it on real workloads. We show that PACKS: better-approximates PIFO than state-of-the-art approaches and scales. We also show that PACKS runs at line rate on existing hardware (Intel Tofino).Comment: 12 pages, 12 figures (without references and appendices

    Design and implementation of a belief-propagation scheduler for multicast traffic in input-queued switches

    Get PDF
    Scheduling multicast traffic in input-queued switches to maximize throughput requires solving a hard combinatorial optimization problem in a very short time. This task advocates the design of algorithms that are simple to implement and efficient in terms of performance. We propose a new scheduling algorithm, based on message passing and inspired by the belief propagation paradigm, meant to approximate the provably-optimal scheduling policy for multicast traffic. We design and implement both a software and a hardware version of the algorithm, the latter running on a NetFPGA. We compare the performance and the power consumption of the two versions when integrated in a software router. Our main findings are that our algorithm outperforms other centralized greedy scheduling policies, achieving a better tradeoff between complexity and performance, and it is amenable to practical high-performance implementations

    Formal Abstractions for Packet Scheduling

    Get PDF
    This paper studies PIFO trees from a programming language perspective. PIFO trees are a recently proposed model for programmable packet schedulers. They can express a wide range of scheduling algorithms including strict priority, weighted fair queueing, hierarchical schemes, and more. However, their semantic properties are not well understood. We formalize the syntax and semantics of PIFO trees in terms of an operational model. We also develop an alternate semantics in terms of permutations on lists of packets, prove theorems characterizing expressiveness, and develop an embedding algorithm for replicating the behavior of one with another. We present a prototype implementation of PIFO trees in OCaml and relate its behavior to a hardware switch on a variety of standard and novel scheduling algorithms.Comment: 25 pages, 12 figure

    Empowering Cloud Data Centers with Network Programmability

    Get PDF
    Cloud data centers are a critical infrastructure for modern Internet services such as web search, social networking and e-commerce. However, the gradual slow-down of Moore’s law has put a burden on the growth of data centers’ performance and energy efficiency. In addition, the increasing of millisecond-scale and microsecond-scale tasks also bring higher requirements to the throughput and latency for the cloud applications. Today’s server-based solutions are hard to meet the performance requirements in many scenarios like resource management, scheduling, high-speed traffic monitoring and testing. In this dissertation, we study these problems from a network perspective. We investigate a new architecture that leverages the programmability of new-generation network switches to improve the performance and reliability of clouds. As programmable switches only provide very limited memory and functionalities, we exploit compact data structures and deeply co-design software and hardware to best utilize the resource. More specifically, this dissertation presents four systems: (i) NetLock: A new centralized lock management architecture that co-designs programmable switches and servers to simultaneously achieve high performance and rich policy support. It provides orders-of-magnitude higher throughput than existing systems with microsecond-level latency, and supports many commonly-used policies such as performance isolation. (ii) HCSFQ: A scalable and practical solution to implement hierarchical fair queueing on commodity hardware at line rate. Instead of relying on a hierarchy of queues with complex queue management, HCSFQ does not keep per-flow states and uses only one queue to achieve hierarchical fair queueing. (iii) AIFO: A new approach for programmable packet scheduling that only uses a single FIFO queue. AIFO utilizes an admission control mechanism to approximate PIFO which is theoretically ideal but hard to implement with commodity devices. (iv) Lumina: A tool that enables fine-grained analysis of hardware network stack. By exploiting network programmability to emulate various network scenarios, Lumina is able to help users understand the micro-behaviors of hardware network stacks

    Mitigating the Performance Impact of Network Failures in Public Clouds

    Full text link
    Some faults in data center networks require hours to days to repair because they may need reboots, re-imaging, or manual work by technicians. To reduce traffic impact, cloud providers \textit{mitigate} the effect of faults, for example, by steering traffic to alternate paths. The state-of-art in automatic network mitigations uses simple safety checks and proxy metrics to determine mitigations. SWARM, the approach described in this paper, can pick orders of magnitude better mitigations by estimating end-to-end connection-level performance (CLP) metrics. At its core is a scalable CLP estimator that quickly ranks mitigations with high fidelity and, on failures observed at a large cloud provider, outperforms the state-of-the-art by over 700Ă—\times in some cases

    Performance Analysis of RR and FQ Algorithms in Reconfigurable Routers

    Get PDF
    Currently, we are witnessing a trend in network routers to include reconfigurable hardware structures to provide flexibility at improved performance levels when compared to software-only implementations. This permits the run-time reconfiguration of the hardware resources, i.e., to change their functionality (for example, from one scheduling algorithm to another), to adapt to changing network scenarios. In particular, different scheduling algorithms are more efficient in handling a specific mix of incoming packet traffic in terms of various criteria (e.g., delay, jitter, throughput, and packet loss). Therefore, reconfigurable hardware is able to provide improved performance levels and to allow more efficient algorithms to be utilized when different incoming packet traffic patterns are encountered. This project investigates the possibilities to improve upon end-to-end delays, jitter, throughput, and packet loss by exploiting the availability of a flexible hardware structure such as an field-programmable gate array (FPGA). The aim of the project is to provide an overview on adaptive scheduling using reconfigurable hardware. Consequently, we investigate different scheduling algorithms that provide QoS provisioning for traffic streams that are sensitive to packet delay and jitter, e.g., mpeg video traffic. The investigation utilizes the NS-2 simulator for which we generate realistic network scenarios. Our approach is based on understanding which kind of traffic is passing in the network, and subsequently change the scheduling algorithm accordingly in the core router to meet specific performance requirements. The investigated scheduling algorithms are taken from two well-known families, i.e., Round Robin (RR) and Fair Queuing (FQ). Our investigation confirmed the idea on the behavior of the two investigated scheduling algorithm: WFQ outperforms WRR in terms of end-to-end delay, jitter and throughput but it is more expensive than it at a computational level. Nonetheless, it is possible to find a tradeoff between the required area in FPGA and the level of performance desired for a kind of stream

    Self-Evaluation Applied Mathematics 2003-2008 University of Twente

    Get PDF
    This report contains the self-study for the research assessment of the Department of Applied Mathematics (AM) of the Faculty of Electrical Engineering, Mathematics and Computer Science (EEMCS) at the University of Twente (UT). The report provides the information for the Research Assessment Committee for Applied Mathematics, dealing with mathematical sciences at the three universities of technology in the Netherlands. It describes the state of affairs pertaining to the period 1 January 2003 to 31 December 2008
    • …
    corecore