136 research outputs found
Lightweight Implementation of Per-packet Service Protection in eBPF/XDP
Deterministic communication means reliable packet forwarding with close to
zero packet loss and bounded latency. Packet loss or delay above a threshold
caused by, e.g., equipment failure or malfunction could be catastrophic for
applications that require deterministic communication. To meet loss related
targets, per-packet service protection has been introduced by deterministic
communications standards; it is provided by Frame Replication and Elimination
for Reliability (FRER) for Layer 2 Ethernet networks and by Packet Replication,
Elimination, and Ordering Functions (PREOF) for Layer 3 IP/MPLS networks.
We have implemented FRER with two conceptually different methods: (1) in
eBPF/XDP as a lightweight software implementation; and (2) in userspace. We
evaluate our XDP FRER via an experimental analysis and compare the two FRER
implementations.Comment: Paper submission for the talk with same title on netdev 0x17
conference:
https://netdevconf.info/0x17/sessions/talk/lightweight-implementation-of-per-packet-service-protection-in-ebpfxdp.htm
Orchestrating Edge Computing Services with Efficient Data Planes
L'abstract è presente nell'allegato / the abstract is in the attachmen
UPIR: Toward the Design of Unified Parallel Intermediate Representation for Parallel Programming Models
The complexity of heterogeneous computing architectures, as well as the
demand for productive and portable parallel application development, have
driven the evolution of parallel programming models to become more
comprehensive and complex than before. Enhancing the conventional compilation
technologies and software infrastructure to be parallelism-aware has become one
of the main goals of recent compiler development. In this paper, we propose the
design of unified parallel intermediate representation (UPIR) for multiple
parallel programming models and for enabling unified compiler transformation
for the models. UPIR specifies three commonly used parallelism patterns (SPMD,
data and task parallelism), data attributes and explicit data movement and
memory management, and synchronization operations used in parallel programming.
We demonstrate UPIR via a prototype implementation in the ROSE compiler for
unifying IR for both OpenMP and OpenACC and in both C/C++ and Fortran, for
unifying the transformation that lowers both OpenMP and OpenACC code to LLVM
runtime, and for exporting UPIR to LLVM MLIR dialect.Comment: Typos corrected. Format update
A Survey on Data Plane Programming with P4: Fundamentals, Advances, and Applied Research
With traditional networking, users can configure control plane protocols to
match the specific network configuration, but without the ability to
fundamentally change the underlying algorithms. With SDN, the users may provide
their own control plane, that can control network devices through their data
plane APIs. Programmable data planes allow users to define their own data plane
algorithms for network devices including appropriate data plane APIs which may
be leveraged by user-defined SDN control. Thus, programmable data planes and
SDN offer great flexibility for network customization, be it for specialized,
commercial appliances, e.g., in 5G or data center networks, or for rapid
prototyping in industrial and academic research. Programming
protocol-independent packet processors (P4) has emerged as the currently most
widespread abstraction, programming language, and concept for data plane
programming. It is developed and standardized by an open community and it is
supported by various software and hardware platforms. In this paper, we survey
the literature from 2015 to 2020 on data plane programming with P4. Our survey
covers 497 references of which 367 are scientific publications. We organize our
work into two parts. In the first part, we give an overview of data plane
programming models, the programming language, architectures, compilers,
targets, and data plane APIs. We also consider research efforts to advance P4
technology. In the second part, we analyze a large body of literature
considering P4-based applied research. We categorize 241 research papers into
different application domains, summarize their contributions, and extract
prototypes, target platforms, and source code availability.Comment: Submitted to IEEE Communications Surveys and Tutorials (COMS) on
2021-01-2
Caladan: a distributed meta-OS for data center disaggregation
Data center resource disaggregation promises cost savings by pool-ing compute, storage and memory resources into separate, net-worked nodes. The benefits of this model are clear, but a closer lookshows that its full performance and efficiency potential cannot beeasily realized. Existing systems use CPUs pervasively to interface ar-bitrary devices with the network and to orchestrate communicationamong them, reducing the benefits of disaggregation.In this paper we presentCaladan, a novel system with a trusteduni-versal resource fabricthat interconnects all resources and efficientlyoffloads the system and application control planes to SmartNICs,freeing server CPUs to execute application logic. Caladan offersthree core services: capability-driven distributed name space, virtualdevices, and direct inter-device communications. These servicesare implemented in a trustedmeta-kernelthat executes in per-nodeSmartNICs. Low-level device drivers running on the commodity hostOS are used for setting up accelerators and I/O devices, and exposingthem to Caladan. Applications run in a distributed fashion acrossCPUs and multiple accelerators, which in turn can directly performI/O, i.e., access files, other accelerators or host services. Our dis-tributed dataflow runtime runs on top of this substrate. It orchestratesthe distributed execution, connecting disaggregated resources usingdata transfers and inter-device communication, while eliminatingthe performance bottlenecks of the traditional CPU-centric design
Generating Permutations with Restricted Containers
We investigate a generalization of stacks that we call
-machines. We show how this viewpoint rapidly leads to functional
equations for the classes of permutations that -machines generate,
and how these systems of functional equations can frequently be solved by
either the kernel method or, much more easily, by guessing and checking.
General results about the rationality, algebraicity, and the existence of
Wilfian formulas for some classes generated by -machines are
given. We also draw attention to some relatively small permutation classes
which, although we can generate thousands of terms of their enumerations, seem
to not have D-finite generating functions
Kernel- vs. User-Level Networking: A Ballad of Interrupts and How to Mitigate Them
Networking performance has become especially important in the current age with growing demands on services over the Internet. Recent advances in network controllers has exposed bottlenecks in various parts of network processing. User-level networking, which bypasses the operating system's network stack and replaces it with one re-implemented in the userspace, is often framed as a silver bullet to mitigate any performance issues arising in the kernel network stack. However, there is often no comprehensive study on where this performance increase ultimately comes from.
This work aims to explore potential areas from which improvements in overall performance can arise. Most importantly, it is identified that asynchronous interrupts and their handling is a major source of overhead associated with the kernel network stack. Several proposals are presented with the goal of reducing the need for interrupts in the kernel network stack, simulating the execution model of user-level networking. It is shown that a small kernel modification with around 30 lines of code change results in a substantial performance increase without the need to replace the kernel network stack in its entirety
- …