112 research outputs found
RIFO: Pushing the Efficiency of Programmable Packet Schedulers
Packet scheduling is a fundamental networking task that recently received
renewed attention in the context of programmable data planes. Programmable
packet scheduling systems such as those based on Push-In First-Out (PIFO)
abstraction enabled flexible scheduling policies, but are too
resource-expensive for large-scale line rate operation. This prompted research
into practical programmable schedulers (e.g., SP-PIFO, AIFO) approximating PIFO
behavior on regular hardware. Yet, their scalability remains limited due to
extensive number of memory operations. To address this, we design an effective
yet resource-efficient packet scheduler, Range-In First-Out (RIFO), which uses
only three mutable memory cells and one FIFO queue per PIFO queue. RIFO is
based on multi-criteria decision-making principles and uses small guaranteed
admission buffers. Our large-scale simulations in Netbench demonstrate that
despite using fewer resources, RIFO generally achieves competitive flow
completion times across all studied workloads, and is especially effective in
workloads with a significant share of large flows, reducing flow completion
time up to 2.9x in Datamining workloads compared to state-of-the-art solutions.
Our prototype implementation using P4 on Tofino switches requires only 650
lines of code, is scalable, and runs at line rate
P4-CoDel: Experiences on Programmable Data Plane Hardware
Fixed buffer sizing in computer networks, especially the Internet, is a
compromise between latency and bandwidth. A decision in favor of high
bandwidth, implying larger buffers, subordinates the latency as a consequence
of constantly filled buffers. This phenomenon is called Bufferbloat. Active
Queue Management (AQM) algorithms such as CoDel or PIE, designed for the use on
software based hosts, offer a flow agnostic remedy to Bufferbloat by
controlling the queue filling and hence the latency through subtle packet
drops. In previous work, we have shown that the data plane programming language
P4 is powerful enough to implement the CoDel algorithm. While legacy software
algorithms can be easily compiled onto almost any processing architecture, this
is not generally true for AQM on programmable data plane hardware, i.e.,
programmable packet processors. In this work, we highlight corresponding
challenges, demonstrate how to tackle them, and provide techniques enabling the
implementation of such AQM algorithms on different high speed P4-programmable
data plane hardware targets. In addition, we provide measurement results
created on different P4-programmable data plane targets. The resulting latency
measurements reveal the feasibility and the constraints to be considered to
perform Active Queue Management within these devices. Finally, we release the
source code and instructions to reproduce the results in this paper as open
source to the research community
Everything Matters in Programmable Packet Scheduling
Programmable packet scheduling allows the deployment of scheduling algorithms
into existing switches without need for hardware redesign. Scheduling
algorithms are programmed by tagging packets with ranks, indicating their
desired priority. Programmable schedulers then execute these algorithms by
serving packets in the order described in their ranks.
The ideal programmable scheduler is a Push-In First-Out (PIFO) queue, which
achieves perfect packet sorting by pushing packets into arbitrary positions in
the queue, while only draining packets from the head. Unfortunately,
implementing PIFO queues in hardware is challenging due to the need to
arbitrarily sort packets at line rate based on their ranks.
In the last years, various techniques have been proposed, approximating PIFO
behaviors using the available resources of existing data planes. While
promising, approaches to date only approximate one of the characteristic
behaviors of PIFO queues (i.e., its scheduling behavior, or its admission
control).
We propose PACKS, the first programmable scheduler that fully approximates
PIFO queues on all their behaviors. PACKS does so by smartly using a set of
strict-priority queues. It uses packet-rank information and queue-occupancy
levels at enqueue to decide: whether to admit packets to the scheduler, and how
to map admitted packets to the different queues.
We fully implement PACKS in P4 and evaluate it on real workloads. We show
that PACKS: better-approximates PIFO than state-of-the-art approaches and
scales. We also show that PACKS runs at line rate on existing hardware (Intel
Tofino).Comment: 12 pages, 12 figures (without references and appendices
Design and implementation of a belief-propagation scheduler for multicast traffic in input-queued switches
Scheduling multicast traffic in input-queued switches to maximize throughput requires solving a hard combinatorial optimization problem in a very short time. This task advocates the design of algorithms that are simple to implement and efficient in terms of performance. We propose a new scheduling algorithm, based on message passing and inspired by the belief propagation paradigm, meant to approximate the provably-optimal scheduling policy for multicast traffic. We design and implement both a software and a hardware version of the algorithm, the latter running on a NetFPGA. We compare the performance and the power consumption of the two versions when integrated in a software router. Our main findings are that our algorithm outperforms other centralized greedy scheduling policies, achieving a better tradeoff between complexity and performance, and it is amenable to practical high-performance implementations
Formal Abstractions for Packet Scheduling
This paper studies PIFO trees from a programming language perspective. PIFO
trees are a recently proposed model for programmable packet schedulers. They
can express a wide range of scheduling algorithms including strict priority,
weighted fair queueing, hierarchical schemes, and more. However, their semantic
properties are not well understood. We formalize the syntax and semantics of
PIFO trees in terms of an operational model. We also develop an alternate
semantics in terms of permutations on lists of packets, prove theorems
characterizing expressiveness, and develop an embedding algorithm for
replicating the behavior of one with another. We present a prototype
implementation of PIFO trees in OCaml and relate its behavior to a hardware
switch on a variety of standard and novel scheduling algorithms.Comment: 25 pages, 12 figure
Empowering Cloud Data Centers with Network Programmability
Cloud data centers are a critical infrastructure for modern Internet services such as web search, social networking and e-commerce. However, the gradual slow-down of Moore’s law has put a burden on the growth of data centers’ performance and energy efficiency. In addition, the increasing of millisecond-scale and microsecond-scale tasks also bring higher requirements to the throughput and latency for the cloud applications. Today’s server-based solutions are hard to meet the performance requirements in many scenarios like resource management, scheduling, high-speed traffic monitoring and testing.
In this dissertation, we study these problems from a network perspective. We investigate a new architecture that leverages the programmability of new-generation network switches to improve the performance and reliability of clouds. As programmable switches only provide very limited memory and functionalities, we exploit compact data structures and deeply co-design software and hardware to best utilize the resource. More specifically, this dissertation presents four systems:
(i) NetLock: A new centralized lock management architecture that co-designs programmable switches and servers to simultaneously achieve high performance and rich policy support. It provides orders-of-magnitude higher throughput than existing systems with microsecond-level latency, and supports many commonly-used policies such as performance isolation.
(ii) HCSFQ: A scalable and practical solution to implement hierarchical fair queueing on commodity hardware at line rate. Instead of relying on a hierarchy of queues with complex queue management, HCSFQ does not keep per-flow states and uses only one queue to achieve hierarchical fair queueing.
(iii) AIFO: A new approach for programmable packet scheduling that only uses a single FIFO queue. AIFO utilizes an admission control mechanism to approximate PIFO which is theoretically ideal but hard to implement with commodity devices.
(iv) Lumina: A tool that enables fine-grained analysis of hardware network stack. By exploiting network programmability to emulate various network scenarios, Lumina is able to help users understand the micro-behaviors of hardware network stacks
Mitigating the Performance Impact of Network Failures in Public Clouds
Some faults in data center networks require hours to days to repair because
they may need reboots, re-imaging, or manual work by technicians. To reduce
traffic impact, cloud providers \textit{mitigate} the effect of faults, for
example, by steering traffic to alternate paths. The state-of-art in automatic
network mitigations uses simple safety checks and proxy metrics to determine
mitigations. SWARM, the approach described in this paper, can pick orders of
magnitude better mitigations by estimating end-to-end connection-level
performance (CLP) metrics. At its core is a scalable CLP estimator that quickly
ranks mitigations with high fidelity and, on failures observed at a large cloud
provider, outperforms the state-of-the-art by over 700 in some cases
Performance Analysis of RR and FQ Algorithms in Reconfigurable Routers
Currently, we are witnessing a trend in network routers to include reconfigurable hardware structures to provide flexibility at improved performance levels when compared to software-only implementations. This permits the run-time reconfiguration of the hardware resources, i.e., to change their functionality (for example, from one scheduling algorithm to another), to adapt to changing network scenarios. In particular, different scheduling algorithms are more efficient in handling a specific mix of incoming packet traffic in terms of various criteria (e.g., delay, jitter, throughput, and packet loss). Therefore, reconfigurable hardware is able to provide improved performance levels and to allow more efficient algorithms to be utilized when different incoming packet traffic patterns are encountered. This project investigates the possibilities to improve upon end-to-end delays, jitter, throughput, and packet loss by exploiting the availability of a flexible hardware structure such as an field-programmable gate array (FPGA). The aim of the project is to provide an overview on adaptive scheduling using reconfigurable hardware. Consequently, we investigate different scheduling algorithms that provide QoS provisioning for traffic streams that are sensitive to packet delay and jitter, e.g., mpeg video traffic. The investigation utilizes the NS-2 simulator for which we generate realistic network scenarios. Our approach is based on understanding which kind of traffic is passing in the network, and subsequently change the scheduling algorithm accordingly in the core router to meet specific performance requirements. The investigated scheduling algorithms are taken from two well-known families, i.e., Round Robin (RR) and Fair Queuing (FQ). Our investigation confirmed the idea on the behavior of the two investigated scheduling algorithm: WFQ outperforms WRR in terms of end-to-end delay, jitter and throughput but it is more expensive than it at a computational level. Nonetheless, it is possible to find a tradeoff between the required area in FPGA and the level of performance desired for a kind of stream
Self-Evaluation Applied Mathematics 2003-2008 University of Twente
This report contains the self-study for the research assessment of the Department of Applied Mathematics (AM) of the Faculty of Electrical Engineering, Mathematics and Computer Science (EEMCS) at the University of Twente (UT). The report provides the information for the Research Assessment Committee for Applied Mathematics, dealing with mathematical sciences at the three universities of technology in the Netherlands. It describes the state of affairs pertaining to the period 1 January 2003 to 31 December 2008
- …