Abstract-In this paper we propose P4-IPsec which follows the software-defined networking (SDN) paradigm. It comprises a P4-based implementation of an IPsec gateway, a client agent, and a controller-based, IKE-less signalling between them. P4-IPsec features the Encapsulation Security Payload (ESP) protocol, tunnel mode, and various cipher suites for host-to-site virtual private networks (VPNs). We consider the use case of a roadwarrior and multiple IPsec gateways steered by the same controller. P4-IPsec supports on-demand VPN which sets up tunnels to appropriate resources within these sites when requested by applications. To validate the P4-based approach for IPsec gateways, we provide three prototypes leveraging the software switch BMv2, the NetFPGA SUME card, and the Edgecore Wedge 100BF-32X switch as P4 targets. For the latter, we perform a performance evaluation giving experimental results on throughput and delay.
I. INTRODUCTION
Internet Protocol Security (IPsec) is a widespread IETF standard for virtual private networks (VPNs). It protects Layer 3 data that is transmitted over insecure networks such as the Internet by message authentication and optional encryption. IPsec supports VPNs between two hosts, hosts and networks, and networks. IPsec clients are part of all modern operating systems, a large selection of open-source and proprietary IPsec gateways are available. IPsec is criticized for its complexity in architecture, protocols, implementation, and configuration. However, it is still one of the most widely-used VPN technologies nowadays.
Software-defined networking (SDN) splits the strong binding between data and control plane and facilitates simplified deployment of distributed technologies such as IPsec. However, the currently wide-spread SDN switches, e.g., OpenFlow switches, have a fixed-function data plane with a limited set of functions. This can be extended by outsourcing functions to the control plane, but performance overhead is critical. Programmable data planes, e.g., as offered by P4, are a gamechanger as data plane behaviour can be described in a highlevel programming language. Thereby, new functionality can be implemented and deployed to a large number of software and hardware switches. In previous work [1] , we implemented MACsec and secure link discovery in P4 and demonstrated, how this new flexibility can help to re-implement common concepts and protocols in a novel fashion including new features. Instead of complex configuration on a large number of network devices, MACsec is automatically deployed on all links between P4 switches.
In this paper, we propose a novel concept and implementation of IPsec in P4 and call it P4-IPsec. It features the Encapsulation Security Payload (ESP) protocol in tunnel mode with different cipher suites. It comprises a Linux-based client module and a P4-based IPsec gateway implementation to support host-to-site scenarios. As these components are steered by a centralized control plane through an authenticated and encrypted control connection, complex IKE-based key exchange protocols are substituted by simple setup, re-keying, and teardown procedures for IPsec tunnels. IPsec's security policy database (SPD) and security association database (SAD) are converted to appropriate match-and-action tables of the P4-based IPsec gateway. Converting P4 switches into IPsec gateways provides the opportunity to operate multiple of them close to protected resources that are otherwise shielded from external access. This limits the size of the perimeter and improves security through better isolation compared to today's solutions with a single VPN concentrator for a larger perimeter. Our second contribution, on-demand VPN, is tailored to a use case with multiple IPsec gateways steered by a single controller. It may be cumbersome for users to set up the appropriate VPN connection before utilizing a desired resource. With on-demand VPN, the client agent detects whether a protected resource is requested and automatically opens a VPN connection to the appropriate IPsec gateway. A third contribution of this work is the validation of P4-IPsec through prototypical implementations on the bmv2 P4 software switch and two P4 hardware platforms, the NetFPGA SUME board and the Edgecore Wedge 100BF-32X switch that features a Tofino ASIC. Furthermore, we perform an extensive evaluation of the implementation for the Edgecore Wedge switch with two alternative implementation variants regarding throughput and delay. The rest of the paper is structured as follows. We first review technical background on IPsec in Section II and data plane programming with P4 (Section III). We then we give an overview and categorization of existing controller-based operation and data plane implementations of IPsec in Section IV and Section V. We describe the data plane implementation of our proposed P4-based IPsec gateway in Section VI. The control plane concept comprising controller and client module for on-demand VPN are presented in Section VII. For validation purposes, we realize these concepts for P4-IPsec in a Mininet environment with the BMv2 P4 software switch Section VIII. In addition, we provide data plane implementations of P4-IPsec on the NetFPGA SUME platform in Section IX and on the Edgecore Wedge 100BF-32X switch that features a Tofino ASIC in Section X. For the latter we evaluate throughput and delay for two alternative implementation variants. Finally, Section XI concludes this work.
II. VPN WITH IPSEC: FOUNDATIONS
We give an introduction on virtual private networks (VPN) and provide an overview on IPsec. We discuss advantages and disadvantages of IPsec and briefly introduce alternative VPN technologies.
A. Virtual Private Networks (VPNs)
VPNs extend private networks across a public network such as the Internet. To protect the private network access and transmitted data, VPNs leverage methods for authentication, authorization, encryption, and integrity validation. Figure 1 depicts three usage scenarios for VPNs. In the host-to-host (1) scenario, two remote peers are connected by a VPN that is set up directly between both hosts. In the host-to-site (2) scenario, a remote peer connects to an internal network. The VPN is set up between the remote host and a VPN gateway that connects the internal network to the Internet. The remote host receives an IP address from the internal network so that it can send and receive network packets as if it is part of the internal network. In the site-to-site (3) scenario, two internal networks are connected by a VPN that is set up between two VPN gateways that connect both internal networks to the Internet. 
B. Overview of IPsec
Internet Protocol Security (IPsec) introduces authentication and encryption on Layer 3. It features host-to-host, host-to-site, and site-to-site scenarios and is standardized by the IETF. Its architecture and functionality is described in RFC 4301 [2] . 1) Protocols: IPsec includes two protocols. The Authentication Header protocol (AH, specified in RFC 4302 [3] ) ensures sender authenticity and packet integrity. Hash functions and a shared key are used to calculate an integrity check value (ICV) through keyed-hash message authentication codes (HMACs) such as HMAC-SHA256. AH also leverages packet sequence numbers to provide protection against replay attacks. The Encapsulating Security Payload (ESP, specified in RFC 4302 [4] ) protocol ensures packet confidentiality through symmetric encryption. Like AH, ESP also ensures sender authenticity, integrity, and protection against replay attacks. ESP supports various symmetric ciphers such as 3DES, Blowfish, and AES. If pure encryption ciphers such as AES in cipher block chaining mode or counter mode are used, additional integrity algorithms such as SHA are also applied. As an alternative, authenticated encryption (AE) ciphers may be used which provide both confidentiality and authenticity protection. AES in Galois/Counter Mode (GCM) is the most common AES cipher and its application in IPsec is described in RFC 4106 [5] .
2) Operation Modes: Figure 2 depicts the two operation modes of IPsec. In transport mode (1), the IPsec header is inserted between the IP header and IP payload. Transport mode provides end-to-end protection and introduces little overhead but can be applied only in host-to-host scenarios. In tunnel mode (2), IP packets are encapsulated in IP packets with an IPsec header. The outer IP header holds the IP addresses of the hosts or gateways that perform IPsec, the inner IP packet may be addressed to a host behind the IPsec gateway. Tunnel mode is required for host-to-site and site-to-site scenarios.
Outer IP header (1) IP header IP payload IPsec header (2) IP payload IP header IPsec header 3) Packet Processing: Figure 3 depicts the base components of IPsec on a host or VPN gateway. IPsec clearly separates connection management, key management, and packet processing. First, IP packets are processed by a Security Policy Database (SPD). It is configured with a list of security policies that define whether a IP packet needs to be protected by IPsec, processed by IP forwarding without IPsec, or dropped. The SPD is configured by the control plane, e.g., by an administrator or network management tool. SPD entries for IPsec connections point to the protocol (AH or ESP), the operation mode (transport or tunnel), and the cipher suite to be used. They refer to a Security Association (SA) that contains all required data for IPsec processing, e.g., cipher keys, valid sequence numbers, and a SA lifetime. SAs are part of the SA Database (SAD) that is either configured manually or managed by an Internet Key Exchange (IKE) daemon that runs on the control plane. The IKE protocol (specified in RFC 2409 [6] ) was introduced with IPsec. It authenticates both peers, sets up a secure channel for key exchange, and negotiates SAs. Today, its successor IKEv2 (specified in RFC 7296 [7] ) is typically used and solves complexity and incompatibility issues of IKE.
C. Evaluation
IPsec is probably the most widespread VPN technology nowadays. IPsec clients are part of all current operating systems for computers, and mobile devices. IPsec gateway functionality is part of server operating systems, network hardware appliances such as firewalls or routers, or dedicated boxes. However, IPsec is highly criticized for its complexity since many years. The most encompassing analysis was performed by Ferguson and Schneier [8] in 2003. The authors especially criticize the redundancy of functionality caused by AH, ESP, and the two operation modes, the complex key exchange with IKE, and the complex configuration caused by the SPD and SAD. However, those issues can be easily solved by omitting transport mode and AH, using AE ciphers in ESP with tunnel mode, and applying a less complex protocol for key exchange.
D. Alternative VPN Technologies
Application-layer VPNs that run on top of transport protocols such as TCP or UDP are promoted as less-complex alternatives to IPsec. OpenVPN [9] is a popular open-source VPN that features a custom security protocol transmitted via TCP or UDP. It leverages RSA for authentication and a custom protocol to secure the VPN connection. Session keys for encryption are either defined statically or generated in a DiffieHellman (DH) key exchange. OpenVPN supports VPNs on Layer 2 and Layer 3 networks. Just like IPsec, it supports host-to-host, host-to-site, and site-to-site scenarios. However, the throughput performance is far beyond IPsec. The Secure Socket Tunneling Protocol (SSTP) is a host-to-host VPN introduced by Microsoft. It leverages TLS to secure the VPN connection and provides support for several authentication methods, e.g., X.509 certificates and EAP methods. Due to its restriction to host-to-host scenarios, applicability is highly limited. WireGuard [10] is the most recent open-source VPN alternative to IPsec. It features host-to-host, host-to-site, and site-to-site VPNs with a custom security protocol transmitted via UDP. WireGuard has been designed to be less complex than IPsec. It implements only a fixed set of cryptography mechanisms for authentication, key exchange, encryption, and integrity checks. Its throughput performance and latency is similar to IPsec, but it lacks any management functions to build host-to-site or site-to-site setups. However, we see similar advantages and a similar level of complexity when using IPsec with a reduced set of functionality as proposed by Ferguson and Schneier.
III. P4: A PROGRAMMABLE DATA PLANE
We give a short introduction to programmable data planes in general. Then, we give an overview of P4 programming and deployment, major P4 language components, and the P4 runtime control.
A. Programmable Network Data Planes
Programmable network data planes facilitate reconfiguration of the packet processing functionality of a network device. Bifulco et al. [11] provide an encompassing overview. Hardware platforms that support programmability are typically fieldprogrammable gate arrays (FPGAs), i.e., integrated circuits that can be programmed after manufacturing, or network processing units (NPUs) with physical network ports.
B. P4
P4 is a domain-specific language that allows the specification of packet forwarding behaviour of programmable network data planes. P4 was first published as research paper in 2014 [12] . Today, P4 is a project of the Open Network Foundation (ONF). Its latest specification is P 4 16 [13] . We describe P4's core concepts and components as visualized in Figure 4 .
1) P4 Programming & Deployment: P4 programs include the forwarding behaviour of a switch described with P4 language components. P4 programs refer to a particular P4 architecture that represents the programming model of a switch. P4 architectures are implemented by software or hardware switches called P4 targets. Target-specific compilers then translate P4 programs into code that can be executed on the P4 target.
2) P4 Language Components: P 4 16 features six core language components for describing packet forwarding behaviour. Header types describe packet header formats by an ordered collection of base types. For instance, an IPv4 header is described by bit vectors, e.g., for the source and destination address, and protocol field. The parser identifies and extracts packet data by applying predefined sequences based on header types. For instance, the value of the protocol field in an IP packet determines the parsing sequence for the following packet header format, e.g., IPsec, TCP, or UDP. Match-andaction tables hold a list of user-defined keys that refer to particular actions that modify packets. Externs are functions with a clearly-defined interface provided by a P4 target that can be used within P4 programs. An example is a function that encrypts given data using AES-GCM. The deparser assembles the headers back into a well-formed network packet that can be sent out via an egress port of the switch.
3) P4 Runtime Control: The forwarding behaviour of P4 programs can be controlled in runtime by modifying the match-and-action tables via a control plane interface. Basic examples are command-line interfaces (CLIs) or custom APIs as part of a P4 target. As an alternative, the P4 Runtime API introduces a standardized control-plane interface for table manipulations and packet exchange between the data and control plane. It relies on gRPC [14] , protocol buffer data structures [15] , and connection security with TLS using certificate-based mutual authentication.
IV. CONTROLLER-BASED OPERATION OF IPSEC: FOUNDATIONS & RELATED WORK
We first describe foundations and use cases on controllerbased operation of IPsec. Afterwards, we review and categorize related work by its operation modes and southbound protocols for data plane management.
A. Overview
Controller-based operation of IPsec applies the principle of SDN to IPsec deployment. Functions for connection and key management, e.g., SPD and SAD maintenance, are outsourced to a control plane. This introduces several advantages over traditional deployment. First, the control plane has an encompassing view on the network topology with all devices. It can monitor utilization and detect outages for reliable operation. Second, the centralized control plane features northbound interfaces for management applications and southbound interfaces for controlling data plane devices. Instead of manual perdevice configuration, VPN management can be performed on a high abstraction layer with policy languages that allow rule validation. Last, the centralized control plane offers flexibility so that VPN operation can be extended by other mechanisms, e.g., user authentication with 802.1X [16] .
B. Use Cases
In the following, we describe three use cases for controllerbased operation of IPsec.
1) SD-WAN: Large organizations with distributed locations require network connectivity between the different sites. As dedicated links are expensive, site-to-site IPsec-VPNs over provider networks are increasingly used. However, manually setting up VPN connections between all branches is timeintensive and complex. SD-WAN [16] - [18] proposes IPsec data plane functionality as part of hardware appliances or software modules at the perimeter of the different sites of the organization. Then, a centralized controller automatically sets up and maintains IPsec-VPN connections.
2) Cloud Provider Networks: In many cases, internal services offered by a public or private cloud provider need to be accessed from within networks of an organization. Again, site-to-site IPsec-VPN tunnels are a cost-efficient alternative to dedicated links. Administrators define IPsec-VPN gateways via a cloud management interface. Then, the cloud orchestrator deploys IPsec-VPN gateways as virtual network function on the cloud provider's infrastructure. Its run-time operation is managed by a controller. In addition, controller-based operation of IPsec can be also used to dynamically connect different cloud networks by a multi-cloud orchestrator [19] .
3) Dynamic VPN Setup: Managing many IPsec-VPN connections to different hosts or services on a client host can be cumbersome. Dynamic VPN setup performed by a controller takes over the tasks of tunnel setup and management. In [20] , users request VPN access to a particular network device from the controller. It then automatically sets up a VPN tunnel to the remote domain. The authors of [21] combine dynamic VPN setup with authentication and authorization to automatically deploy IPsec-VPN tunnels between IoT network devices.
C. Operation Modes for Data Plane Management
We describe three operation modes for controller-based data plane management of IPsec as visualized in Figure 5 .
1) IKE on the Data Plane: IPsec data plane nodes still feature an IKE daemon. To reduce the message exchanges in an IKE process, it is preconfigured by the controller. In [21] , [22] , the controller pre-configures authentication keys, in [23] , the controller distributes Diffie-Hellman public values to all associated IPsec data plane nodes. An IETF draft [24] describes the same mechanism as one of two alternative mechanisms for controller-based operation of IPsec. The authors of [18] propose a similar approach that is compatible to older IKE daemons that only support IKEv1.
2) IKE on the Control Plane: The authors of [25] relocate the IKE daemon to the control plane. It performs key exchange with peers and manages the SAD of the IPsec data plane nodes. This approach even supports migration schemes so that the SA can be transferred to other IPsec data plane nodes, e.g., in case of fail-over or load-balancing operations.
3) IKE-less Operation: The authors of [21] , [24] , [26] , and [27] propose SA management without IKE. The controller generates keying material and sets up SAs in the SAD of associated IPsec data plane nodes. 
D. Southbound Protocols for Data Plane Management
On legacy IPsec devices, e.g. [28] , SNMP is used for basic configuration and monitoring. The authors of [18] extend this usage in making an IKE daemon manageable by SNMP as well. In [17] , SSH is used as southbound interface to manage and monitor IPsec data plane nodes. The work in [24] uses NETCONF with YANG configuration models. In addition to the southbound protocol, they consider east-/westbound interfaces for controller-to-controller communication via different domains. Aragon et al. in [21] used OAuth 2.0 to deliver configuration data within authorization messages. In [26] , OpenFlow is extended using Experimenter messages. The work in [27] leverages BGP. Li and Mao [16] use a custom southbound protocol to interface an IPsec module on an Open vSwitch. The authors of [29] propose a custom southbound protocol with notification, configuration, and query messages that are transmitted via TCP or TLS.
E. Positioning of Own Work
P4-IPsec features controller-based operation without IKE. The SPD and SAD as part of the data plane are managed by a controller via the P4 Runtime API. We introduce a new approach for dynamic VPN setup that establishes on-demand IPsec tunnels for requested network resources without user interaction.
V. DATA PLANE IMPLEMENTATION OF IPSEC: RELATED WORK
In the following, we review related work on data plane implementations for IPsec gateways. We describe IPsec software implementations, hardware acceleration techniques, and hardware implementations In the end, we position our work.
A. IPsec Software Implementations
IPsec is part of the system kernel of Linux and BSD since many years. Windows server supports IPsec since Version 2000 [30] . The performance of pure software implementations depends on the hardware, the used cryptographic algorithms, and the average packet size.
Many optimization techniques aim at improving the packet processing of operating systems. The Data Plane Development Kit (DPDK) [31] , Netmap [32] , and PF_RING [33] optimize the Linux network stack to improve packet I/O rates. The authors of [34] and [35] describe mechanisms to distribute IPsec processing on multiple CPU cores while PacketShader [36] proposes to improve IPsec throughput by using the GPU. Gallenmüller et al. [37] compare packet I/O improvement mechanisms in an extensive study. Most of the described optimization mechanisms are only applicable to Linux operating systems.
B. Hardware Acceleration for Software Implementations
The performance of IPsec software implementations can be improved by offloading functionality to hardware components. 1) Cryptography Hardware Acceleration: IPsec throughput can be improved by offloading complex encryption and decryption mechanisms to cryptographic hardware acceleration units. Current CPU architectures from Intel, AMD, and ARM include functions for encryption and decryption. AES-NI [38] and the ARMv8 Cryptographic Extension [39] are examples for AES instruction sets that can be used instead of AES software implementations. Besides, system-on-a-chip (SoC) platforms and circuit boards may contain chips for offloading cryptographic processing. The Marvell Cryptographic Engines and Security Accelerator (CESA) is an example for a cryptographic processor that is part of SoC platforms such as the Marvell ARMADA 38x family [40] . Intel QuickAssist [41] is a cryptographic acceleration processor that can be embedded into a mainboard's chipset. Processors for offloading cryptographic processing can be also part of an extension circuit board that is connected to the mainboard, e.g., via PCI. For example, the Intel QuickAssist Server Acceleration Card [41] embeds the QuickAssist processor on a PCIe extension board. Several vendors [42] , [43] supply implementations for cryptographic algorithms as intellectual property (IP) cores that can be executed on FPGAs. For cryptography hardware acceleration, an FPGA running the IP core is part of a PCIe extension board in a computer.
2) IPsec Hardware Acceleration: IPsec throughput can be also improved by offloading IPsec processing or parts of it to IPsec hardware acceleration units. Application-Specific Integrated Circuits (ASICs) are integrated circuits that are designed and manufactured to execute specific functions. For example, [44] and [45] describe IPsec hardware accelerators that are built as ASICs to implement particular IPsec crypto suites. Besides, NPUs are programmable integrated circuits that are optimized for networking functions. The authors of [46] and [47] describe implementations of IPsec functions on multi-core NPUs. Accellerated Processing Units (APUs) combine a CPU and GPU with physically shared memory. PIPSEA [48] is an IPsec implementation for an APU that leverages OpenGL for IPsec processing. In addition, several works [49] , [50] implemented parts of the IPsec data plane processing on FPGAs. The functions are then used by a software implementation that interfaces the FPGAs via PCIe. However, development for FPGA requires expensive development software environments besides knowledge and experiences in hardware programming.
C. IPsec Hardware Implementations
In contrast to the previous mechanisms, IPsec processing may be completely implemented on hardware platforms. IPsec processing of proprietary VPN gateways, e.g., as sold by Cisco or Juniper, mostly feature high-performance implementations of IPsec. Due to its not public architectural details, we cannot get insight into technical details. The authors of [51] and [52] describe IPsec data plane implementations for FPGAs. IPsec processing is performed on the FPGA, its SPD and SAD are maintained by an external control plane. In 2016, a Xilinx employee reported on the P4-Development mailing list [53] that IPsec was successfully implemented in PX [54] , a high-level domain-specific programming language for programmable data planes.
D. Positioning of Own Work
We propose an implementation of IPsec in P4 that features ESP in tunnel mode with support for different cipher suites. For the BMv2 and NetFPGA SUME board, we implement cipher suites as P4 externs. For the Edgecore Wedge 100BF-32X with a Tofino ASIC, we implement the cipher suites in software on the main CPU module and provide an interface to the processing pipeline. The implementation is managed by a control plane that maintains the SPD and SAD.
VI. P4-IPSEC: DATA PLANE IMPLEMENTATION
In this section, we describe our data plane implementation of IPsec in P4. We provide an overview of the proposed P4 processing pipeline and describe its components.
A. Overview
The proposed data plane implementation of P4-IPsec features ESP in tunnel mode with support for different cipher suites. This simplification of the original specification corresponds to today's applications of IPsec and was also proposed by [8] (see Section II-C). We adopt IPsec components such as the SPD and SAD and implement IPv4 packet forwarding with longest-prefix matching. Figure 7 depicts packet processing in the proposed processing pipeline of a P4-IPsec switch. For ease of understanding, we have grouped together functionalities in function blocks (FBs). Runtime behaviour of the data plane can be managed by manipulating the four match-and-action tables (MATs), e.g., with a CLI on the P4 switch or an interface to a control plane.
B. P4 Processing Pipeline
We now overview packet processing of the proposed P4 processing pipeline. When a packet arrives via the ingress, the P4 parser first extracts the packet headers (1) . If the packet has an IPv4 header but no ESP header, it is forwarded to the security policy (SP) matching (2) function block. It determines whether a particular IPv4 packet should be dropped (2a), further processed by IP forwarding (2b), or sent to the IPsec encryption (3) function block. In the IP forwarding FB, the packet is either dropped due to a missing entry in the routing table (2c) or sent out (2d). The deparser (4) reassembles all headers and re-calculates the IPv4 checksum as some fields, e.g., the TTL, is changed. If the packet has an ESP header, it is forwarded to the IPsec decryption (3) function block. It validates the packet's authenticity, decrypts the ESP message, and extracts the original IPv4 packet. As before, the IPv4 packet is either dropped (2c) or sent out (2d) by the IP forwarding function block. 2) IP Forwarding: Packets received from the SP matching (1c) or IPsec encryption (2) FB are further processed by a longest-prefix MAT. It implements IP packet forwarding to the next hop via a particular output port of the switch. The longestprefix MAT maps IPv4 destination addresses to a particular output port of the switch and the MAC address of the next hop. If it yields no match for the given IPv4 destination address, the packet is dropped (3a). If it yields a match for the given IPv4 destination address (3b), the following steps are applied. First, the packet's source MAC address is set to the MAC address of the P4 switch. Second, the packet's destination MAC address is set to the MAC address of the next hop. Third, the IPv4 TTL of the packet is decreased by 1. Last, the packet is passed to the deparser.
D. Function Blocks: IPsec Encryption and Decryption
The FBs of IPsec encryption and IPsec decryption consists of cipher suite externs and security association (SA) MATs.
1) Cipher Suite Externs: P4 does not feature cryptographic functions for encryption, decryption, or message authentication that might be used to implement IPsec. Therefore, we leverage P4 externs (see Section III) to implement IPsec cipher suites that consist of a particular set of cryptographic algorithms. Each cipher suite is implemented by two P4 externs, one for encryption and one for decryption. Figure 9 depicts IPsec encryption and decryption using a cipher suite of AES-CTR for encryption and HMAC-MD5 for message authentication. In the IPsec encryption (1) FB, the cipher suite extern receives the original IP packet, the outer IP source and destination addresses, the SPI, and the keys for AES-CTR and HMAC-MD5. The encryption cipher suite extern encrypts the original IP packet, calculates the message authentication code, and creates an IPsec packet. In the IPsec decryption (2) FB, the cipher suite extern receives the IPsec packet and the keys for AES-CTR and HMAC-MD5. The decryption cipher suite extern decrypts the IPsec packet, validates the message authentication code, and returns the original IP packet. 2) SA Match-and-Action Tables: Cipher suite externs receive the required parameters and keying material from SA MATs. That functionality corresponds to the SAD as introduced in Section II-B3. As SAs are unidirectional, we introduce an encryption SA MAT and a decryption SA MAT. Figure 10 depicts IPsec encryption and decryption using the previously described cipher suite and the SA MAT. In the IPsec encryption (1) FB, packets are matched using the IP source and destination address. The SA entry then invokes the particular cipher suite extern with the required data and keying material. If a packet does not match an SA entry, it is directly discarded. In the IPsec decryption (2) FB, packets are matched using the SPI. Again, the SA entry invokes the particular cipher suite extern with the required data and keying material. If it yields no match, the packet is discarded due to a missing SA.
VII. P4-IPSEC: CONTROL PLANE IMPLEMENTATION
In this section, we describe the control plane of P4-IPsec. We describe host-to-site standard operation and our novel approach of on-demand VPN setup.
A. Regular Host-to-Site Operation
Our control plane implementation focuses on the host-tosite usage scenario as described in Section II-A. depicts an overview. The P4 switch implements IPsec and connects an internal network to the Internet. It is managed by a controller via the P4 Runtime API and holds gateway profiles for each P4 switch that acts as IPsec gateway. Gateway (GW) profiles contain the following information. IPsec properties include the public IP address of the P4 switch as IPsec gateway address to the shared network resource and the private IPv4 network. In addition, it holds a list of user identities that are permitted to access the VPN. More sophisticated ways of managing access permissions could be realized by using LDAP or RADIUS. On the public IP address of the P4 switch, the internal network, and a list of user identities that are permitted to access the network. The roadwarrior host runs an IPsec agent for interacting with the controller. Following Figure 12 , we describe the steps performed by the IPsec agent to setup, renew, and teardown a VPN tunnel with the help of the controller.
1) Controller Connection Setup: After its start, the IPsec agent establishes a connection to the controller via its IPv4 address or FQDN (1). To prevent unauthorized access, the IPsec agent and controller perform mutual authentication. Therefore, the IPsec agent validates the server certificate of the controller and provides its own certificate that includes its identity. The controller checks if the identity is part of a list of authorized users in the requested GW profile and grants the connection attempt. To protect against eavesdropping, packet manipulation, and replay attacks, the communication between IPsec agent and controller is protected with encryption and message authentication.
2) Tunnel Setup: To set up an IPsec tunnel to a particular P4 switch, the IPsec agent sends a tunnel setup request with the public IPv4 address of the target P4 switch to the controller (2a). If the user is permitted to set up an IPsec tunnel to the particular P4 switch, the controller creates and installs the IPsec tunnel. As a first step, it creates keying material for the chosen cipher suite and generates the SAs. To set up the new IPsec tunnel on the P4 switch, the controller updates the encryption SA MAT, decryption SA MAT, and SP MAT by entries for the new tunnel (2b). Once the table manipulations are confirmed by the P4 switch, the controller sends the configuration parameters to the IPsec agent (2c).
3) Tunnel Renewal: IPsec SAs have a limited lifetime, i.e., keying material needs to be renewed on a regular basis. Therefore, the IPsec agent continuously monitors packet number and timeout limits. Once a threshold has been exceeded, the IPsec agent sends a tunnel renewal request to the controller (3a). The controller creates new SAs with a new SPI and installs them on the P4 switch. First, it updates the decryption SA MAT (3b). As entries in the decryption SA MAT are identified by the SPI, both old and new SA entries can coexist. Second, the controller replaces the current SA in the encryption SA MAT by the new SA (3c). A direct replacement is necessary as a MAT cannot contain two entries with the same match key. If the MAT modifications are confirmed by the P4 switch, the controller delivers the new SA for encryption and decryption to the IPsec agent as well (3d). Last, the controller deletes the old SA in the decryption SA MAT once it is ensured that no more packets encrypted with the old SA can reach the P4-switch (3e). B. On-Demand Host-to-Site Operation P4-IPsec introduces IPsec functionality on P4 switches. Therefore, IPsec connections may terminate close to protected resources that should be accessible via VPN tunnels. This distributes high processing loads of current deployments with VPN concentrators at the perimeter to multiple P4 switches that are steered by a controller. In addition, it improves security through better isolation.
1) Use Case Description: As configuring many VPN tunnels for different protected resources may be cumbersome for users, we introduce on-demand VPN. Figure 13 depicts an overview. An internal server is connected to a P4 switch that has a public IPv4 address reachable from the Internet. The switches and routers in between forward IPsec traffic addressed to that public IPv4 address to the P4 switch. We extend the IPsec agent on the roadwarrior host by a DNSbased tunnel setup detection mechanism. We extend the GW profiles from default host-to-site operation by a FQDN of the protected resource and its private IPv4 address in the internal network to resource profiles. If the IPsec agent detectes that a VPN tunnel needs to be established, it sends an IPsec tunnel setup request to the controller that performs IPsec tunnel setup as described before. 
2) DNS-Based Lookup of VPN Profiles:
We propose the following mechanism for DNS-based tunnel setup detection on the IPsec client. We install text (TXT) records on a DNS server that indicate whether a FQDN is only reachable via a IPsec tunnel. This principle of storing meta data relating to a FQDN in TXT records is used by several applications, e.g., DNS-based Service Discovery [55] , DNS-AS [56] , SPF [57] . Figure 14 depicts the operation principle. Therefore, the controller continuously updates the DNS records for all resource profiles on the DNS server (1). It sets up the private IPv4 address as address (A) record and installs a TXT record set to ipsec-tunneling:yes to indicate that the particular network resource is only reachable via an VPN tunnel. When a user tries to access the shared network resource, e.g., by entering the FQDN in the browser, a DNS A record request is dispatched. We extend this by requesting the TXT record in parallel (2a) so that the DNS server responds with the DNS A and optionally the TXT record (2b). In case the IPsec agent received a TXT record, it requests IPsec tunnel setup for the particular FQDN via the controller (3).
3) P4-IPsec Support for On-Demand VPN: Figure 15 visualizes the steps performed by the IPsec agent that implements this operation principle. Whenever a program on the user host 
Controller DNS server VIII. P4-IPSEC CONTROL AND DATA PLANE IMPLEMENTATION FOR BMV2 AND MININET We describe the prototypical implementation of P4-IPsec in a Mininet testbed environment. We provide an overview on the platform, describe the P4-IPsec data plane implementation for the Behavioral Model version 2 (BMv2) P4 software switch [58] , the implementation of the IPsec agent, and the implementation of the controller.
A. Platform Overview and Development Process
We use Mininet [59] to build an emulated network environment that consists of a Linux host that runs the IPsec agent, a P4 switch, and a Linux server. The testbed runs inside of a KVM/QEMU virtual machine with Ubuntu 16.04.
B. P4-IPsec Data Plane: BMv2 P4 Software Switch
We implement the data plane functions of P4-IPsec on a BMv2 P4 software switch. It consists of multiple P4 targets that represent different types of P4 switches. We use the BMv2 P4 software switch in the version from February 2018 [60] . It uses the PI library for implementing a P4 Runtime server in a version from April 2018 [61] . For implementing P4-IPsec, we extend the most commonly used P4 target simple_switch as follows. First, we implemented the cipher suite externs for AES-CTR with HMAC-MD5 for IPsec encryption and decryption as part of the simple_switch P4 target. We program the extensions in C++ and leverage OpenSSL to apply AES-CTR for encryption and decryption, and HMAC-MD5 for packet authentication. Both functions can be used as P4 externs within the P4 processing pipeline. The functions of the P4 processing pipeline are implemented as a P4_16 program using known P4 constructs. We run the P4 program on our modified simple_switch P4 target within the the simple_switch_grpc target. It encapsulates our modified simple_switch P4 target and provides the P4 Runtime API that is used by the control plane to modify match-and-action tables.
C. P4-IPsec Control Plane: IPsec Agent
We implement the IPsec agent for Linux hosts as a command-line tool in Python 3.6 [62] . As interface to the controller, we implement a gRPC client using the gRPC library [14] . At startup, users are required to specify the network interface and an FQDN or IP address of the controller. Afterwards, the connection to the controller is established. For monitoring DNS queries triggered by applications, we use the Scapy library [63] . It listens on the loopback network interface where the libresolve library on Linux systems sends DNS queries to caching DNS resolvers like dnsmasq or systemdresolved. Requesting TXT records when a potential tunnel endpoint has been identified in a DNS query is done using the dnspython [64] library. For IPsec tunnel setup, the IPsec agent translates received configuration data into particular ip xfrm commands from the iproute2 tool to configure IPsec on the host system. In addition, it sets up IP routes for routing IP traffic through the IPsec tunnel. The IPsec agent saves the received and applied configuration data because it needs to be reverted after the tunnel has been shutdown or when rekeying is performed. In case of rekeying, the old tunnel is deleted and a tunnel with the new parameters is set up. We implement rekeying with the help of Netlink [65] . We receive Netlink messages by listening on the according Netlink socket and binding to the XFRMNLGRP_EXPIRE address so that XFRM Expire messages can be received. When receiving an XFRM Expire message, it extracts parameters such as SPI and IP addresses of the tunnel endpoints. In order to initiate rekeying, the tunnel source and destination address and SPI are put into a queue for processing in the main class.
D. P4-IPsec Control Plane: Controller
We implement the controller as a Python 2.7 application. We leverage the P4 runtime library [66] to program the interface to the P4 switches and the gRPC [14] library for communication with the IPsec agent. The controller features a simple command line interface (CLI) for development and testing purposes that displays information about all active IPsec tunnels.
IX. P4-IPSEC DATA PLANE IMPLEMENTATION FOR P4
HARDWARE SWITCHES: NETFPGA SUME
In the following, we briefly describe experiences with the NetFPGA SUME platform. We briefly review the platform and development process and report on our experiences in implementing P4-IPsec. Due to severe limitations, we did not conduct a performance evaluation.
A. Platform Overview and Development Process
The NetFPGA SUME board is a open-source platform for rapid prototyping of network applications with throughput rates up to 100 Gb/s. It is based on the Xilinx Virtex-7 690T field-programmable gate array (FPGA), an integrated circuit that can be programmed after its production. The board features four SFP+ interfaces along a PCIe interface to a computer host system. A user guide [68] provided by Xilinx describes all specifics for writing P4 16 programs for the NetFPGA SUME board. It supports three P4 architectures, namely the XilinxEngineOnly, XilinxStreamSwitch, and XilinxSwitch architecture. The latter is the most common architecture for this P4 target that consists of a parser, a single match-and-action pipeline, and a deparser. The user guide describes several specific examples, e.g., an implementation of IP checksum calculation as a P4 extern. Figure 16 depicts the P4-NetFPGA toolchain [69] that was developed by Xilinx to run P4 programs on the NetFPGA SUME board. First, the Xilinx P4 16 compiler translates P4 programs that are written for a Xilinx P4 architecture into a program for the Software Defined Specification Environment for Networking (SDNet) [70] from Xilinx. SDNet is a highlevel design environment that was created by Xilinx prior to P4. It aims to simplify the design of packet processing data planes that target FPGA hardware. Although SDNet and P4 share some design goals, SDNet lays a bigger focus on custom architectures compared to the approach of P4 to abstract the hardware on a high level. Second, the SDNet compiler generates hardware descriptions in Verilog. To validate the function of the hardware description language (HDL) descriptions, generic HDL simulations and platformspecific FPGA simulations are performed. Third, the Vivado HLx suite performs hardware synthesis and implementation. It transforms the HDL representation into a hardware design using lookup tables (LUTs) and 1-bit-registers (flipflops) to program the FPGA. Last, the resulting bitfile is used to program the FPGA.
B. Prototypical Implementation
We managed to implement a very limited prototype. It only allows to apply the NULL cipher on fixed-length packets that do not exceed a total length of 140 bytes. In the following, we describe the encountered problems and limitations in detail.
First, P4-SDNet is limited to packet header manipulation. Therefore, we parsed payload fields of packets as additional header fields. Besides performance downsides of this approach, P4-SDNet does not support parsing variable-length header fields. This restricts the implementation to packets with a fixed length. Second, P4-SDNet has severe limitations on the implementation of P4 externs. They are programmed as hardware definitions in Verilog that run separated from the rest of the P4 program. In contrast to BMv2, a data stucture that contains the parsed header fields, the payload, and a field for the returning value needs to be passed to the P4 extern. As we experienced major performance drawbacks in implementing cryptographic functions within P4 externs in our previous work [1] , we only implemented the IPsec NULL cipher. For IPsec protection, it encapsulates the plain text IP header and packet within an ESP packet. For IPsec decryption, it extracts the original IP header and payload from the ESP packet. Again, P4-SDNet turned out to be not designed for packet payload manipulations. Data transmission between the P4 extern and P4 pipeline is limited to 10 kbit for one function call which limits the maximum packet size to be processed through a P4 extern to approximately 600 bytes. In addition, data transfer to the P4 extern needs to be executed within one clock cycle of the FPGA. During the synthesis, the Vivado suite optimizes the hardware implementation through several algorithms. Various experiments have shown a practical upper bound of 140 bytes for packets. Either the hardware implementation did use more resources than offered by the FPGA, or data transfer and calculation within the P4 extern exceeded one clock cycle. Last, we encountered several more general problems with P4-SDNet and the NetFPGA. Probably due to a bug, we were not able to access the values of a LPM table for IP routing with our SDN controller. We solved that problem by using exact matching tables instead, an approach that is not acceptable for a productional implementation. In addition, we experienced several stability problems. No matches in match-and-action tables were found when data was written to hardware registers. Finally, we missed many important details in the documentation.
X. P4-IPSEC DATA PLANE IMPLEMENTATION FOR P4 HARDWARE SWITCHES: EDGECORE WEDGE WITH TOFINO
In the following, we describe experiences with the Edgecore Wedge 100BF-32X switch that features a Tofino P4 ASIC from Barefoot networks. We provide an overview of the platform, describe the development process, and report on the implementation of P4-IPsec.
A. Platform Overview and Development Process
The Edgecore Wedge 100BF-32X [71] switch is a top-ofrack (ToR) switch for data center networks. It features 32 QSFP network ports that can be configured to support throughput rates up to 100 Gbit/s. Packet switching is performed by a Tofino ASIC [72] from Barefoot Networks. It connects to a main CPU module via PCIe. The main CPU module is an off-the-shelf CPU board that features an Intel Pentium D1517 processor with 1.6 GHz on 4 cores, 8 GB RAM, and a 32 GB SSD. Unfortunately, the main CPU module only supports PCIe Gen 2 [73] while the Tofino ASIC can support Gen 3. Figure 17 depicts the architecture and process of development for switches with a Tofino ASIC. The main CPU module runs the Barefoot P4 software development environment (SDE) [74] on top of a Linux-based operating system such as Open Network Linux (ONL) or SONiC [75] . It is responsible for programming and managing the Tofino ASIC and contains interfaces for loading, configuring, and managing P4 programs during execution. It also exposes management operations to APIs, e.g., the P4 Runtime API, that may be used by a control plane to modify match-and-action tables. The SDE also exposes a PCIe CPU port that enables network packet exchange between the P4 processing pipeline and the main CPU module. A Linux kernel module supplied with the SDE provides a virtual network interface for communication with the PCIe CPU port. P4 programs are written for the Tofino native architecture (TNA), compiled by the Barefoot P4 compiler, and loaded on the Tofino ASIC by the SDE. 
B. Prototypical Implementation
The Tofino ASIC is optimized for high-speed packet processing and bandwidths up to 6.5 Tbit/s in data center or core networks. The second version of the Tofino ASIC even supports bandwidths up to 12.8 Tbit/s. User-defined P4 externs that may contain computation-intense functions are not supported on high-speed switching silicon such as the Tofino ASIC. To investigate the feasibility of implementing P4-IPsec on that hardware, we relocate the IPsec functions that were implemented as P4 externs in our BMv2 prototype Section VIII to the main CPU module of the Wedge switch. Figure 18 depicts this concept. We leverage the PCIe CPU port to exchange packets between the Tofino ASIC and the main CPU module. Therefore, we replace all function calls of the P4 externs in the P4 processing pipeline by the CPU port. We use the IPsec kernel functions of the Linux operating system running on the main CPU module for IPsec processing. We implement an IPsec crypto manager program that configures the IPsec kernel functions with data received from the controller. It is implemented in Python 3 [76] and uses iproute2 [77] commands for managing the SPD and SAD. After finishing IPsec processing, the main CPU module transfers the packets back to the Tofino ASIC via the CPU port. The Tofino ASIC then performs IP routing to forward the packets received from the CPU port. Besides programming the SAD and SPD via the IPsec crypto manager, the control plane programs IP forwarding and the SPD on the P4 processing pipeline. 
C. Evaluation
We evaluate the prototypical implementation in experiments on latency and TCP throughput.
1) Experiment Setup: We attach two physical hosts running Ubuntu 16.04 LTS via 10 Gbit/s links to the front ports of the Wedge switch and perform the following experiments. The link between the first client and the switch is secured using IPsec while the link between the switch and the second client is not secured. The main CPU module performs decryption for packets coming from the first client and encryption for the packets coming from the second client.
2) Latency: We investigate the latency when using the CPU port and IPsec processing on the main CPU module. We send 100 ICMP echo requests from the first client to the second client and measure an average round-trip time of approximately 1.5 ms.
3) TCP Throughput: We investigate the maximum TCP throughput when using the CPU port with IPsec processing on the main CPU module. Figure 19 depicts the results of three experiments, each performed with five runs and a duration of 30 s. We generate TCP transmissions between both clients with iperf3 [78] and measure the throughput. For a single IPsec tunnel with AES-GCM-256, we measure an average TCP throughput of approximately 1.4 Gbit/s. When using the NULL encryption and authentication cipher, i.e., when not encrypting and authenticating packets, the average TCP throughput rises to approximately 2 Gbit/s. Throughput is not only determined by IPsec processing, also re-keying during the experiments affects the performance as some packets are lost in the time between the old SA is deleted and the new SA is set up. We also performed measurements for up to 16 concurrent IPsec flows and calculate the average of 10 runs with a duration of 300 s each. However, the maximum TCP throughput remains at 1.4 Gbit/s for IPsec with AES-GCM-256 and 2 Gbit/s for IPsec with the NULL cipher. For providing a reference measurement, we investigate the maximum TCP throughput of the main CPU module. We assign an IP address to the virtual network interface of the CPU port on the main CPU module and perform iperf3 measurements between the first client and the main CPU module. We measure an average TCP throughput of approximately 3.3 Gbit/s. That is the upper bound for TCP throughput that can be handled by the main CPU module of the Wedge switch in our setup. We attribute the large differences in TCP throughput to the rather slow CPU that is used in the Edgecore Wedge 100BF-32X. The Intel Pentium D1517 processor has a base frequency of 1.6 GHz. We consider this is a very reasonable performance that might be sufficient for scenarios where only few shared network resources should be sporadically accessed by roadwarrior hosts. Fig. 19 : Maximum throughput for a single IPsec connection using the PCIe CPU port and different encryption algorithms. The uppermost bar shows the maximum throughput that the PCIe CPU port is capable of.
no IPsec

D. Improved Throughput through External Crypto Host
As the presented throughput was limited by the performance of the switch CPU and the CPU port's bandwidth, we suggest the use of an external Linux-based crypto host as means to improve throughput. Figure 20 depicts the concept of offloading IPsec processing. That means, IPsec processing is offloaded from the switch to the external crypto host which works as network function. As a consequence, en-/decryption capacity could be scaled up by increasing the number of crypto hosts connected to the switch. The controller is extended to instruct the switch that IPsec traffic is forwarded via a particular front port instead of the CPU port. Furthermore, the IPsec crypto manager program from the previous approach (see Section X-B) constitutes the heart of the crypto host implementation. IP and IPsec processing and communication with SAD and SPD are carried out in kernel space while other functions are executed in user space. For the evaluation of this approach, we utilizes a crypto host with an 8-core and 16-thread Intel Xeon Gold 6134 CPU, 128 GB RAM, and a 240 GB SSD, and Ubuntu 18.04 LTS as operating system. We perform the same experiments as in Section X-C. The roundtrip time is approximately 2 ms which is slightly larger than in the previous approach. The TCP throughput for IPsec traffic is shown in Figure 21 . For a single IPsec tunnel with AES-GCM-256, we measure an average TCP throughput of approximately 4 Gbit/s. It can be increased by running multiple connections over the same crypto host. For 16 parallel IPsec flows, we measure an overall average TCP throughput of approximately 24 Gbit/s. This effect can be attributed to receive-side scaling (RSS) of the network interface card, which can leverage multiple cores, but only one per flow. In case of multiple flow, the overall throughput can be increased through RSS by leveraging the processing power of more than a single core. The TCP throughput can be further improved by optimization techniques as presented in Section V. XI. CONCLUSION In this work, we provided an extensive review and categorization of IPsec-related work. We proposed P4-IPsec, a P4-based implementation of an IPsec gateway supporting ESP in tunnel mode and different cipher suites. To the best of our knowledge, it is the first data plane implementation of IPsec. This allows to turn any switch of a P4 network into a VPN concentrator so that access to critical resources can be restricted to very close to the target instead of having a single VPN concentrator for a large network. The runtime operation of P4-IPsec is managed by a control plane avoiding complex key exchange protocols such as IKE. It further supports the new use case "on-demand VPN" where IPsec tunnels are automatically set up to appropriate IPsec gateways when a specific resource behind them is requested. We demonstrated the feasibility of P4-IPsec in prototypical implementations for the BMv2 P4 software switch, the NetFPGA SUME platform, and the Edgecore Wedge 100BF-32X switch that features a Tofino ASIC. We performed performance evaluation experiments on the Wedge switch with two different variants for encryption and decryption. They revealed an acceptable throughput rate for a platform without specialized encryption/decryption support, but clearly leave room for encryption/decryption acceleration. The experiments show that P4 is suitable for use cases that target security functions with medium data rates, e.g., at the edge of enterprise or campus networks.
IP + IPsec processing
