Emerging network applications (augmented reality, industrial Internet, etc.) introduce stringent new requirements on the performance, dependability, and adaptability of communication networks. Programmable data planes (e.g., based on P4) provide new opportunities to meet these requirements, by enabling adaptive network reconfigurations. However, ensuring consistency during such reconfigurations remains challenging. This paper makes a first step toward a more automated state management of adaptive data planes. In particular, we present an efficient P4 state management framework, P4State, which allows to quickly identify the network states from the source code that are critical for data plane reconfigurations (e.g., due to scaling, failure recovery). We report on first promising evaluation results of our prototype implementation in terms of correctness and efficiency, also considering two case studies using HULA (load balancing in data center) and HashPipe (line-rate measurement in data plane).
toward efficient and consistent network reconfigurations remains challenging. In particular, many PDP operations are stateful: the forwarding devices maintain states for the network services, e.g., registers for forwarding port mapping and meters for traffic rate control. In order to provide a predictable data plane behavior, we need to ensure state consistency, at any time. For example, in an autonomous driving context, network services provisioned by the vehicles should be migrated together with their states (e.g., service ports and IP addresses), to guarantee connectivity and QoS. This paper is motivated by the observation that a prerequisite to preserve state consistency is to automatically recognize all the states in a PDP program (e.g., P4) that should be maintained during reconfiguration: an aspect which to the best of our best knowledge has not been addressed in the P4 literature yet. State recognition however is challenging because of the wide range of possible notions of states (and access methods) in P4-based PDPs.
We present P4State, a first step toward automated P4 state management. Our framework includes an analyzer that takes a P4 program as input, collects state access along the packet processing pipelines, and visualizes the analysis output for P4 programmers and operators. To recognize the states, we apply control-flow analysis [21] on top of the Control Flow Graph (CFG), which is built out of the program. We prune the original CFG graph and only keep the nodes with state access, i.e., read or write, to make it more accessible. Finally, all stateful paths within the CFG with involved state entities are available as output and therefore can help to maintain consistency during network reconfiguration.
Contributions. Succinctly, our contributions are:
• We characterize the states of P4 data planes and provide a taxonomy of the usage of registers (which store states) in open-source P4 programs.
• We propose a suite of algorithms to analyze P4 programs and to identify the states that need to be maintained during reconfiguration.
• We report on the implementation and evaluation of the program analyzer, considering synthetic and real programs (namely HULA and HashPipe), and in terms of correctness and efficiency. Organization. The remainder of this paper is organized as follows. Section 2 characterizes states in P4 data plane. Section 3 describes the algorithms to analyze the register accesses within a P4 program. Section 4 introduces the prototype of P4State and two case studies. We discuss related works in Section 5. In Section 6 we conclude the paper and discuss possible extensions of P4State.
CHARACTERIZING P4 PDP STATES
P4 is a domain specific language that can describe the packet processing behavior of programmable data plane [7] . The programmability exists in the parser, the match-action pipeline, and the deparser. The parser can extract any customized fields from the header, whereas the deparser insert values back to the fields in the end. The match-action pipeline consists of multiple stages, each with a matching table and customized action(s) of packet processing. P4 programs are supported in various data plane targets, such as BMv2 [3] , SmartNIC [2] and NetFPGA [31] .
The definition of states in a P4 data plane is quite broad [6, 22] . States include (i) table entries, (ii) stateful variables defined in the P4 specification, and (iii) (some) temporary variables defined in a program. We include the temporary variables only if they act like pointers that refer to stateful variables (details in Sec. 3.2). Since the table entries can be recognized and maintained by the control plane during data plane reconfiguration without much effort, we do not consider them in this paper.
The P4 specification defines three types of stateful variables: register, meter, counter. The variables need to be persistent, i.e., their values should persist beyond a single iteration of the packet processing loop [21] . In this paper, we focus on the register variable, as it is commonly adopted in real P4 programs. Regarding meter and counter, we briefly discuss the mechanism to maintain their consistency in Sec. 6.
Register Usage. We identify the following scenarios when registers can be declared:
• a value that the processing of the following packets would access, e.g., packet counter 1 [29] .
• a value that the control plane can access for making control decisions, e.g., the status of a port [27] .
• a value that controls the packet processing in a P4 node, e.g., a flag enabling on-demand functionality [14] . The usage of a register is one of the factors indicating whether it should be transfered during data plane reconfigurations. As an example, Figure 1 shows the declaration (line 2) and read-access (line 5) of the content of register flag_reg. Note that in this example, we denote flag_reg as a register type and the content as a register entry. Table 1 gives an overview of the P4 programs that leverage registers in their implementation 2 .
Register access within a P4 program can be either read or write (both directly and indirectly). In Figure 1 , the binary register value decides the following packet processing path: either line 6 or line 7. This is an indirect access in an if-conditional, i.e., a temporary variable that refers to the value of a register entry is evaluated. Register entries can only be directly accessed in actions. Figure 2 demonstrates an example, where a register entry is read and copied to one field in the user-defined custom metadata. Actions are always associated with tables; an action is called either based on the matching result of a table or when a packet processing path traverses a table. Meanwhile, the value of a register entry can impact the decision of an if-conditional.
Register Classification. We classify registers into two categories: flow-based and device-based. A flow-based register saves per-flow state, and typically instantiates a large number of entries (e.g., 65536), which can be migrated on the data plane and control plane. A device-based register saves the state that is device-specific. It has less entries, but may need to be migrated with the help of the controller. The classification helps to coordinate the maintenance of various registers at runtime, i.e., decide how to migrate them.
Luo et al. [22] advocate that it is not necessary to migrate the flow-based register that is computed from the events of arriving packets, e.g., the flowlet_id, which denotes a flowlet and is calculated from the header field tuple. However, we argue that in order to maintain the consistency requirement, we have to migrate those flow-based registers. For example, when the flowlet_id determines the egress port, the loss of its values might lead to the following packets of the same flowlet to be forwarded on a different path; this can induce jitter and delays.
Register Migration. States can be migrated either through the data plane [22] , where register values are carried as specific header fields, or migrated through the controller [14] , which performs direct register read/write. For the first approach, a flow-based register entry needs an exact flow to piggyback its value, which can lead to long migration latencies when flow patterns change quickly and the expected flows do not show up on time. For the second approach, since the state values need to be copied first from the source node to the controller and then copied to the destination node, we need to take into account the extra forwarding time from the nodes to the controller. Note that we assume the controller integrates the functionalities of both the control plane and the management plane.
No matter which approach is applied, we can assume that the total migration time increases with the number of register entries, no matter which migration approach we would apply. Here the migration time is defined as the overall time spent for migrating all necessary states before the new node can work correctly. For the flow-based registers, i.e., the ones with indices calculated with hash functions, we may potentially create many (more than 2 16 ) entries, depending on an initial guess of how many flows can show up in the network. However, not all entries are filled with effective values that need migration. In other words, an intuitive approach that strives to migrate all register values would induce a long total migration time. Therefore, we propose P4State which recognizes only necessary register values.
P4STATE: DESIGN AND ALGORITHMS
In this section, we describe the algorithm set to analyze the register accesses of a P4 program. Instead of directly parsing the P4 code, we leverage the P4 compiler to produce a compiled .json file and feed it to the algorithm suite [23] . The basic idea is to identify state accesses in tables and conditionals (Sec. 3.1 and 3.2), translate the P4 program into a Control Flow Graph (CFG) (Sec. 3.3), delete all nodes in the CFG without references to registers (Sec. 3.4), traverse all paths in the CFG to collect register accesses, and conclude the registers that need to be migrated (Sec. 3.5). The CFG represents all paths that might be traversed in a program during its execution.
Identifying States
As first step, we collect all the declared registers (including their depths and widths) and classify them as flow-based or device-based. The classification criterion is the width of the register. As a common practice, a flow is identified by a hash value with width 16 or 32 [1], which corresponds to hashing algorithms defined in v1 model [5] (CRC16 or CRC32). Therefore, a register whose width is larger or equal than 16 is classified as flow-based register. In order to be comprehensive, we also classify the registers, whose indexes are calculated with hash functions, as flow-based registers. Those for h in associated headers do 13 Update set of registers R c through h r ;
that are not classified as flow-based registers will be treated as device-based registers.
Table/Conditional Register-Binding
Algorithm 1 traverses all tables and if-conditionals defined in the .json file and associates the registers that are accessed by them. In order to do this, it first collects all defined registers, headers, actions, tables and conditionals in the pipelines. Since direct register access inside conditionals is not possible, it is non-trivial to associate registers to conditionals. We first collect all header fields (including the temporary variables), e.g., flag in Figure 1 and reg_index in Figure 2 . Afterwards, we traverse all statements in actions and associate the header field with the register when there is a register access. Note that the statements in the apply struct are translated into an action associated with a table. The temporary variable flag is declared as a custom header field scaler_flag.
Besides the set of all registers R, we also fill the set of accessed registers for each table t and conditional c. For a table t, we check all associated actions and place every accessed register in the set R t (see line 5-8). Similarly, for a conditional c, we check all associated headers and place every accessed register in the set R c .
Stateful CFG Construction
As mentioned before, the packet processing pipeline (i.e., the CFG) described by P4 can be decomposed as a bunch of basic entities (nodes) of tables and conditionals. A table has only one egress, whereas a conditional has two, each associated with a decision result (True/False).
We construct a CFG of the P4 program under analysis with Algorithm 2. Following the pattern of p4c-graphs [4], the processing path of each packet always starts from "START", traverses different sets of tables and conditionals, and terminates at "EXIT". The introduction of START and EXIT (as dummy tables) provides the two anchor points for all possible processing paths. After adding the first edge between START and the initial node (table or it calls ExpNext to further explore the path until EXIT, which is described in Algorithm 3.
Algorithm 3 works in a recursive manner. It stops calling itself only if (i) there is no node after the current table (line 3-5), or (ii) there are no entities on the true or false branch of the current conditional (line [11] [12] [13] . In this case, it calls DrawPath to draw a full path from START to EXIT.
CFG Pruning
The stateful CFG assists the following register accesses analysis. In our design, the analysis should be able to return all paths with state access. However, the original CFG and the paths inside can come with very large size, which is hard for humans to consume. Inspired by the idea of Thin Slicing [30] , we exclude all stateless nodes from the CFG, in order to produce a human-friendly (pruned) version. The pruned CFG provides an evident view of all stateful operations.
The pruning process consists of two steps (described in Algorithm 4 and 5). The first step detects all nodes that do not have access to registers, i.e., n R == ∅ in line 5, and removes these nodes. The nodes before any node to be removed, i.e., n, and after it, i.e., n, should be reconnected to ensure complete path(s) from START to EXIT. The second step merges consecutive tables on a single path, if they access the same registers, i.e., n R == n R . Only the first table stays, whereas the following tables are replaced with edges in the graph. if n R == n R then 8 Call RemoveNode(CFG, n); 9 for next node n of n do 10 Call AddEdge(CFG, n, n);
Algorithm 4: Pruning -Stateless Node Elimination

11
Call UpdateNeighbourNodes();
As function utilities, the method UpdateNeighbourNodes updates the previous and subsequent nodes of each node, given the current status of the CFG. The method RemoveNode removes one node and all edges connected to it.
Path & Role Identification
Finally, the analyzer recognizes all paths with state access and generates the pruned CFG as well as a report listing all stateful paths and their respective associated state sets. When a P4 program consists of multiple functions, which are enabled/disabled upon startup (e.g., HULA [1]), the analyzer can also infer such information and report the enabled functionalities. For this, it leverages both the pruned CFG and the controller rules such as register initializations (typically specified in a file). If the controller maintains the state consistency, such information can also assist with deciding the order in which different types of states should be transferred.
PROTOTYPE AND EVALUATION
The prototype of P4State mainly includes a code analyzer and the utilities of P4C compiler [4] . We implement the analyzer in Python with 500 LoC 3 . P4State takes a P4 program as input, analyzes its compiled .json format, and outputs the paths with state accesses. We provide a first impression of P4State's practicability with case study on two real P4 programs. Afterwards, we evaluate the efficiency of our propose algorithms with both real and synthetic programs.
Case Study
HULA. HULA addresses congestion in data center networks. For this, HULA switches run two functions: probing and forwarding. Figure 3 depicts an exemplary data center topology with HULA. Probing is deployed on the ToR switches for finding best paths in the core. The probing updates the forwarding function, which then forwards all data plane traffic. hula.p4 [1] is a simplified version of HULA with four types of registers: 1 srcindex_qdepth_reg, 2 srcindex_digest_reg, 3 dstindex_nhop_reg and 4 flow_port. 1 and 2 store the queue length and the digest of the best path from each ToR. 3 keeps the next hop to reach each ToR, and 4 keeps the next hop for each flow.
hula.p4 merges the pipeline of probing and the pipeline of forwarding in one program. To decide which pipeline should be referred to for a single packet, the program checks the hula header field of the received packet. For the ToR switches, all four types of registers are needed to enable HULA update and normal packets switching. For the non-ToR (i.e., core) switches, only type 3 and 4 are needed. P4State successfully recognizes the above two functions, and outputs the pruned CFG with 12 nodes (original CFG 25 nodes), which is shown in Figure 4 . Such knowledge can help to maintain state consistency during data plane reconfiguration. For example, when a core switch is about to fail, migrating states of type 3 and 4 to a backup switch would be sufficient to ensure that all current best path in the core are preserved.
HashPipe. To perform line-rate measurements in the data plane, HashPipe [29] implements a pipeline of hash tables to record heavy flows, i.e., flows with a huge number of packets. There are 8 types of registers that come within total 224 entries. The flows are tracked and the tracking information is maintained within three types of registers (two stages, in total six). One type is used to track the flow 3 We plan to make the code public in an extended version of this paper. identifiers (source IPv4 addresses), and another type is used to store the packet counts corresponding to the identifiers. The last type shows whether each table entry is valid, i.e., there are non-zero values for the previous two types of registers. P4State recognizes only one function, i.e., counter update, and all registers that needed to be migrated correctly.
General Performance Measurement
To evaluate the efficiency of P4State, we measure the runtime of the CFG construction module and the CFG pruning model, which together account for the most algorithmic execution time. The measurement is performed on both synthetic and real P4 programs. We use Whippersnapper [9] to generate synthetic programs having 20 to 300 tables. The realistic programs are selected from Table 1 . Figure 5a presents the runtimes when analyzing synthetic programs; each data point is the average of 30 measurement runs. It shows comparable runtimes of CFG construction and pruning: they increase exponentially with the number of tables. Nevertheless, a program with 300 tables (which is more than all realistic P4 programs that we have collected) can be analyzed in around 100 ms. Figure 5b presents the results for real programs, which we sort according to the program's complexity (represented as LoC) along the x-axis. We observe that the code analysis takes up to 15 ms for the CFG construction and pruning (the case of dapper.p4). Even though linearroad.p4 has the highest LoC, its analysis time is not necessarily the highest, due to its simpler pipeline structure. In conclusion, the analyzer is very efficient.
RELATED WORK
To the best of our knowledge, we are the first to study the consistent state management in an elaborated manner for P4 data plane. However, there is much interesting previous work on state management in the context of general NFV and P4, as well as analyzing P4 code. Data Plane State Management. The research of state management of data plane is quite abundant. Split/Merge [26] requires middleboxes to allocate and access all states through a customized shared library. OpenNF [10] , however, transfers directly the serialized states between different middleboxes. From a different perspective, StatelessNF [19] requires middleboxes to create/read/update states in a central data store, which allows any middlebox to access any state at any time. SNAP [6] considers state allocation in a static scenario; the whole network is considered as a single switch and the location of states in the form of forwarding rules are optimized to enforce policy.
State Management of P4. SwingState [22] initiates the study of state transfer during reconfiguration by piggybacking states on the data plane packets. P4NFV [14] recommends managing the state with the controller, which has a holistic view of the data plane states and can perform operations such as merging on the states during reconfiguration.
NFV Program Analyzer.
Many tools were proposed to analyze data plane program for performance or security. CASTAN [25] and BOLT [16] discover execution paths within the code of an NF and recognize potentially large resource consumptions, e.g., CPU cycles and memory accesses. P4pktgen [24] analyzes a P4 program and generates input packets and table entries that cover all execution paths. Assert-P4 [23] leverages assertions and symbolic execution to validate the general network correctness properties. However, the analysis for data plane reconfiguration is still missing in the literature.
CONCLUSION & DISCUSSION
Motivated by the need for more adaptive networks and the challenge of consistent reconfigurations of stateful data planes, we presented, implemented, and evaluated P4State, an automated mechanism to recognize states in P4. P4State is able to quickly analyze programs and successfully recognizes the register types that need migration during data plane reconfiguration. With synthetic and real programs, we show the efficiency of our proposed algorithms in terms of runtime.
We understand P4State as a very first step, but believe that it readily provides many interesting and promising extensions, which we discuss as follows.
Line-rate Processing and Verification. P4State outputs an overview of multiple register accesses in a P4 program. With that, we can identify when a read or a write operation is performed more than one time to the same register type, which can lead to longer processing times [29] . Moreover, the analyzer can automatically detect race conditions of register access and write-before-read error.
Group Transfer. Currently the controller only accesses one register entry at a time. If multiple entries can be transferred simultaneously, the forwarding latency can be greatly reduced. In that case, P4State can be extended to detect the valid entries that will be transferred all at the once.
Counters & Meters. Since the data plane cannot read counters, the control plane reads and stores all values before reconfiguration, and if possible, updates the counted values afterwards. For the meters, the control plane is always in charge of their configurations, therefore the controller only needs to configure the previous meter settings upon startup of new data plane entity. P4State can be extended to recognize counters and meters and facilitate the maintenance of them during runtime.
Consistent Network Update. Updating a network policy can involve the reconfiguration of multiple P4 nodes which is not trivial. It would be interesting to investigate the order of state update in a multi-node P4 environment, to avoid data plane misbehaviors (e.g., routing loops and black-holes) during policy update.
