2,146 research outputs found
Programming Protocol-Independent Packet Processors
P4 is a high-level language for programming protocol-independent packet
processors. P4 works in conjunction with SDN control protocols like OpenFlow.
In its current form, OpenFlow explicitly specifies protocol headers on which it
operates. This set has grown from 12 to 41 fields in a few years, increasing
the complexity of the specification while still not providing the flexibility
to add new headers. In this paper we propose P4 as a strawman proposal for how
OpenFlow should evolve in the future. We have three goals: (1)
Reconfigurability in the field: Programmers should be able to change the way
switches process packets once they are deployed. (2) Protocol independence:
Switches should not be tied to any specific network protocols. (3) Target
independence: Programmers should be able to describe packet-processing
functionality independently of the specifics of the underlying hardware. As an
example, we describe how to use P4 to configure a switch to add a new
hierarchical label
Network Virtual Machine (NetVM): A New Architecture for Efficient and Portable Packet Processing Applications
A challenge facing network device designers, besides increasing the speed of network gear, is improving its programmability in order to simplify the implementation of new applications (see for example, active networks, content networking, etc). This paper presents our work on designing and implementing a virtual network processor, called NetVM, which has an instruction set optimized for packet processing applications, i.e., for handling network traffic. Similarly to a Java Virtual Machine that virtualizes a CPU, a NetVM virtualizes a network processor. The NetVM is expected to provide a compatibility layer for networking tasks (e.g., packet filtering, packet counting, string matching) performed by various packet processing applications (firewalls, network monitors, intrusion detectors) so that they can be executed on any network device, ranging from expensive routers to small appliances (e.g. smart phones). Moreover, the NetVM will provide efficient mapping of the elementary functionalities used to realize the above mentioned networking tasks upon specific hardware functional units (e.g., ASICs, FPGAs, and network processing elements) included in special purpose hardware systems possibly deployed to implement network devices
Hardware Acceleration of the Robust Header Compression (RoHC) Algorithm
With the proliferation of Long Term Evolution (LTE) networks, many cellular carriers are embracing the emerging eld of mobile Voice over Internet Protocol (VoIP). The robust header compression (RoHC) framework was introduced as a part of the LTE Layer 2 stack to compress the large headers of the VoIP packets before transmitted over LTE IP-based architectures. The headers, which are encapsulated Real-time Transport Protocol (RTP)/User Datagram Protocol (UDP)/Internet Protocol (IP) stack, are large compared to the small payload. This header-compression scheme is especially useful for ecient utilization of the radio bandwidth and network resources. In an LTE base-station implementation, RoHC is a processing-intensive algorithm that may be the bottleneck of the system, and thus, may be the limiting factor when it comes to number of users served. In this thesis, a hardware-software and a full-hardware solution are proposed, targeting LTE base-stations to accelerate this computationally intensive algorithm and enhance the throughput and the capacity of the system. The results of both solutions are discussed and compared with respect to design metrics like throughput, capacity, power consumption, chip area and exibility. This comparison is instrumental in taking architectural level trade-o decisions in-order to meet the present day requirements and also be ready to support future evolution. In terms of throughput, a gain of 20% (6250 packets/sec can be processed at a frequency of 150 MHz) is achieved in the HW-SW solution compared to the SW-Only solution by implementing the Cyclic Redundancy Check (CRC) and the Least Signicant Bit(LSB) encoding blocks as hardware accelerators . Whereas, a Full-HW implementation leads to a throughput of 45 times (244000 packets/sec can be processed at a frequency of 100 MHz) the throughput of the SW-Only solution. However, the full-HW solution consumes more Lookup Tables (LUTs) when it is synthesized on an Field-Programmable Gate Array (FPGA) platform compared to the HW-SW solution. In Arria II GX, the HW-SW and the full-HW solutions use 2578 and 7477 LUTs and consume 1.5 and 0.9 Watts, respectively. Finally, both solutions are synthesized and veried on Altera's Arria II GX FPGA
Recommended from our members
Psi: A Silicon Compiler for Very Fast Protocol Processing
Conventional protocols implementations typically fall short, by a few orders of magnitude, of supporting the speeds afforded by high-speed optical transmission media. This protocol processing bottleneck is a key hurdle in taking advantage of the opportunities presented by high-speed communications. This paper describes PSi, a silicon compiler that transforms formal protocol specifications into efficient VLSI implementations. PSi takes advantage of the parallelisms intrinsic to a given protocol to accomplish very high-speed implementations. Initial application of PSi to the IEEE 802.2 (logical link control) leads to processing rates in the order of 106 packets per second (p/s). The 802.2 was selected as a benchmark of complexity; light-weight protocols can accomplish even higher processing rates, reaching the limits set by chip clock rates (i.e., a packet per cycle). These speeds significantly exceed typical of software implementations (up to a few hundred p/s) or special hardware-assisted implementations (up to a few thousands p/s). More importantly, at these rates when the packet size is 103-4 bits the protocol throughput of 109-10 bits/sec reaches the limiting throughput afforded by memory technology. Thus, the protocol processing bottleneck is pushed to the ultimate bounds set by VLSI technologies
A Modular Approach to Adaptive Reactive Streaming Systems
The latest generations of FPGA devices offer large resource counts that provide the headroom to implement large-scale and complex systems. However, there are increasing challenges for the designer, not just because of pure size and complexity, but also in harnessing effectively the flexibility and programmability of the FPGA. A central issue is the need to integrate modules from diverse sources to promote modular design and reuse. Further, the capability to perform dynamic partial reconfiguration (DPR) of FPGA devices means that implemented systems can be made reconfigurable, allowing components to be changed during operation. However, use of DPR typically requires low-level planning of the system implementation, adding to the design challenge. This dissertation presents ReShape: a high-level approach for designing systems by interconnecting modules, which gives a âplug and playâ look and feel to the designer, is supported by tools that carry out implementation and verification functions, and is carried through to support system reconfiguration during operation. The emphasis is on the inter-module connections and abstracting the communication patterns that are typical between modules â for example, the streaming of data that is common in many FPGA-based systems, or the reading and writing of data to and from memory modules. ShapeUp is also presented as the static precursor to ReShape. In both, the details of wiring and signaling are hidden from view, via metadata associated with individual modules. ReShape allows system reconfiguration at the module level, by supporting type checking of replacement modules and by managing the overall system implementation, via metadata associated with its FPGA floorplan. The methodology and tools have been implemented in a prototype for a broad domain-specific setting â networking systems â and have been validated on real telecommunications design projects
P4CEP: Towards In-Network Complex Event Processing
In-network computing using programmable networking hardware is a strong trend
in networking that promises to reduce latency and consumption of server
resources through offloading to network elements (programmable switches and
smart NICs). In particular, the data plane programming language P4 together
with powerful P4 networking hardware has spawned projects offloading services
into the network, e.g., consensus services or caching services. In this paper,
we present a novel case for in-network computing, namely, Complex Event
Processing (CEP). CEP processes streams of basic events, e.g., stemming from
networked sensors, into meaningful complex events. Traditionally, CEP
processing has been performed on servers or overlay networks. However, we argue
in this paper that CEP is a good candidate for in-network computing along the
communication path avoiding detouring streams to distant servers to minimize
communication latency while also exploiting processing capabilities of novel
networking hardware. We show that it is feasible to express CEP operations in
P4 and also present a tool to compile CEP operations, formulated in our P4CEP
rule specification language, to P4 code. Moreover, we identify challenges and
problems that we have encountered to show future research directions for
implementing full-fledged in-network CEP systems.Comment: 6 pages. Author's versio
- âŠ