13 research outputs found
P4CEP: Towards In-Network Complex Event Processing
In-network computing using programmable networking hardware is a strong trend
in networking that promises to reduce latency and consumption of server
resources through offloading to network elements (programmable switches and
smart NICs). In particular, the data plane programming language P4 together
with powerful P4 networking hardware has spawned projects offloading services
into the network, e.g., consensus services or caching services. In this paper,
we present a novel case for in-network computing, namely, Complex Event
Processing (CEP). CEP processes streams of basic events, e.g., stemming from
networked sensors, into meaningful complex events. Traditionally, CEP
processing has been performed on servers or overlay networks. However, we argue
in this paper that CEP is a good candidate for in-network computing along the
communication path avoiding detouring streams to distant servers to minimize
communication latency while also exploiting processing capabilities of novel
networking hardware. We show that it is feasible to express CEP operations in
P4 and also present a tool to compile CEP operations, formulated in our P4CEP
rule specification language, to P4 code. Moreover, we identify challenges and
problems that we have encountered to show future research directions for
implementing full-fledged in-network CEP systems.Comment: 6 pages. Author's versio
Packet Transactions: High-level Programming for Line-Rate Switches
Many algorithms for congestion control, scheduling, network measurement,
active queue management, security, and load balancing require custom processing
of packets as they traverse the data plane of a network switch. To run at line
rate, these data-plane algorithms must be in hardware. With today's switch
hardware, algorithms cannot be changed, nor new algorithms installed, after a
switch has been built.
This paper shows how to program data-plane algorithms in a high-level
language and compile those programs into low-level microcode that can run on
emerging programmable line-rate switching chipsets. The key challenge is that
these algorithms create and modify algorithmic state. The key idea to achieve
line-rate programmability for stateful algorithms is the notion of a packet
transaction : a sequential code block that is atomic and isolated from other
such code blocks. We have developed this idea in Domino, a C-like imperative
language to express data-plane algorithms. We show with many examples that
Domino provides a convenient and natural way to express sophisticated
data-plane algorithms, and show that these algorithms can be run at line rate
with modest estimated die-area overhead.Comment: 16 page
Fast ReRoute on Programmable Switches
Highly dependable communication networks usually rely on some kind of Fast Re-Route (FRR) mechanism which allows to quickly re-route traffic upon failures, entirely in the data plane. This paper studies the design of FRR mechanisms for emerging reconfigurable switches. Our main contribution is an FRR primitive for programmable data planes, PURR, which provides low failover latency and high switch throughput, by avoiding packet recirculation. PURR tolerates multiple concurrent failures and comes with minimal memory requirements, ensuring compact forwarding tables, by unveiling an intriguing connection to classic ``string theory'' (i.e., stringology), and in particular, the shortest common supersequence problem. PURR is well-suited for high-speed match-action forwarding architectures (e.g., PISA) and supports the implementation of a broad variety of FRR mechanisms. Our simulations and prototype implementation (on an FPGA and a Tofino switch) show that PURR improves TCAM memory occupancy by a factor of 1.5x-10.8x compared to a naïve encoding when implementing state-of-the-art FRR mechanisms. PURR also improves the latency and throughput of datacenter traffic up to a factor of 2.8x-5.5x and 1.2x-2x, respectively, compared to approaches based on recirculating packets
Partitioned Paxos via the Network Data Plane
Consensus protocols are the foundation for building fault-tolerant,
distributed systems, and services. They are also widely acknowledged as
performance bottlenecks. Several recent systems have proposed accelerating
these protocols using the network data plane. But, while network-accelerated
consensus shows great promise, current systems suffer from an important
limitation: they assume that the network hardware also accelerates the
application itself. Consequently, they provide a specialized replicated
service, rather than providing a general-purpose high-performance consensus
that fits any off-the-shelf application.
To address this problem, this paper proposes Partitioned Paxos, a novel
approach to network-accelerated consensus. The key insight behind Partitioned
Paxos is to separate the two aspects of Paxos, agreement, and execution, and
optimize them separately. First, Partitioned Paxos uses the network forwarding
plane to accelerate agreement. Then, it uses state partitioning and
parallelization to accelerate execution at the replicas. Our experiments show
that using this combination of data plane acceleration and parallelization,
Partitioned Paxos is able to provide at least x3 latency improvement and x11
throughput improvement for a replicated instance of a RocksDB key-value store
P4Bricks: Enabling multiprocessing using Linker-based network data plane architecture
Packet-level programming languages such as P4 usually require to describe all packet processing functionalities for a given programmable network device within a single program. However, this approach monopolizes the device by a single large network application program, which prevents possible addition of new functionalities by other independently written network applications. We propose P4Bricks, a system which aims to deploy and execute multiple independently developed and compiled P4 programs on the same reconfigurable hardware device. P4Bricks is based on a Linker component that merges the pro-grammable parsers/deparsers and restructures the logical pipeline of P4 programs by refactoring, decomposing and scheduling the pipelines' tables. It merges P4 programs according to packet processing semantics (parallel or sequential) specified by the network operator and runs the programs on the stages of the same hardware pipeline, thereby enabling multiprocessing. This paper presents the initial design of our system with an ongoing implementation and studies P4 language's fundamental constructs facilitating merging of independently written programs
A Survey on Data Plane Programming with P4: Fundamentals, Advances, and Applied Research
With traditional networking, users can configure control plane protocols to
match the specific network configuration, but without the ability to
fundamentally change the underlying algorithms. With SDN, the users may provide
their own control plane, that can control network devices through their data
plane APIs. Programmable data planes allow users to define their own data plane
algorithms for network devices including appropriate data plane APIs which may
be leveraged by user-defined SDN control. Thus, programmable data planes and
SDN offer great flexibility for network customization, be it for specialized,
commercial appliances, e.g., in 5G or data center networks, or for rapid
prototyping in industrial and academic research. Programming
protocol-independent packet processors (P4) has emerged as the currently most
widespread abstraction, programming language, and concept for data plane
programming. It is developed and standardized by an open community and it is
supported by various software and hardware platforms. In this paper, we survey
the literature from 2015 to 2020 on data plane programming with P4. Our survey
covers 497 references of which 367 are scientific publications. We organize our
work into two parts. In the first part, we give an overview of data plane
programming models, the programming language, architectures, compilers,
targets, and data plane APIs. We also consider research efforts to advance P4
technology. In the second part, we analyze a large body of literature
considering P4-based applied research. We categorize 241 research papers into
different application domains, summarize their contributions, and extract
prototypes, target platforms, and source code availability.Comment: Submitted to IEEE Communications Surveys and Tutorials (COMS) on
2021-01-2