67 research outputs found

    P4-enabled Smart NIC:Enabling Sliceable and Service-Driven Optical Data Centres

    Get PDF

    Programmable Smart NIC

    Get PDF

    The future roadmap of in-vehicle network processing: a HW-centric (R-)evolution

    Get PDF
    © 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.The automotive industry is undergoing a deep revolution. With the race towards autonomous driving, the amount of technologies, sensors and actuators that need to be integrated in the vehicle increases exponentially. This imposes new great challenges in the vehicle electric/electronic (E/E) architecture and, especially, in the In-Vehicle Network (IVN). In this work, we analyze the evolution of IVNs, and focus on the main network processing platform integrated in them: the Gateway (GW). We derive the requirements of Network Processing Platforms that need to be fulfilled by future GW controllers focusing on two perspectives: functional requirements and structural requirements. Functional requirements refer to the functionalities that need to be delivered by these network processing platforms. Structural requirements refer to design aspects which ensure the feasibility, usability and future evolution of the design. By focusing on the Network Processing architecture, we review the available options in the state of the art, both in industry and academia. We evaluate the strengths and weaknesses of each architecture in terms of the coverage provided for the functional and structural requirements. In our analysis, we detect a gap in this area: there is currently no architecture fulfilling all the requirements of future automotive GW controllers. In light of the available network processing architectures and the current technology landscape, we identify Hardware (HW) accelerators and custom processor design as a key differentiation factor which boosts the devices performance. From our perspective, this points to a need - and a research opportunity - to explore network processing architectures with a strong HW focus, unleashing the potential of next-generation network processors and supporting the demanding requirements of future autonomous and connected vehicles.Peer ReviewedPostprint (published version

    High-performance software packet processing

    Full text link
    In today’s Internet, it is highly desirable to have fast and scalable software packet processing solutions for network applications that run on commodity hardware. The advent of cloud computing drives the continued rapid growth of Internet traffic. Moreover, the development of emerging networking techniques, such as Network Function Virtualization, significantly shapes the need for implementing the network functions in software. Finally, with the advancement of modern platforms as well as software frameworks for packet processing, network applications have potential to process 100+ Gbps network traffic on a single commodity server. Representative frameworks include the Click modular router, the RouteBricks scalable routing architecture, and BUFFALO, the software-based Ethernet switch. Beneath this general-purpose routing and switching functionality lie a broad set of network applications, many of which are handled with custom methods to provide cost-effectiveness and flexibility. This thesis considers two long-standing networking applications, IP lookup and distributed denial-of-service (DDoS) mitigation, and proposes efficient software-based methods drawing from this new perspective. In this thesis, we first introduce several optimization techniques to accelerate network applications by taking advantage of modern CPU features. Then, we explore the IP lookup problem to find the longest matching prefix of an IP address in a set of prefixes. An ideal IP lookup algorithm should achieve small constant IP lookup time, and on-chip memory usage. However, no prior IP lookup algorithm achieves both requirements at the same time. We propose SAIL, a splitting approach to IP lookup, and a suite of algorithms for IP lookup based on SAIL framework. We conducted extensive experiments to evaluate our algorithms, and experimental results show that our SAIL algorithms are much faster than well-known IP lookup algorithms. Next, we switch our focus to DDoS, an attempt to disrupt the legitimate traffic of a victim by sending a flood of Internet traffic from different sources. Our solution is Gatekeeper, the first open-source and deployable DDoS mitigation system. We present a series of optimization techniques, including use of modern platforms, group prefetching, coroutines, and hashing, to accelerate Gatekeeper. Experimental results show that these optimization techniques significantly improve its performance over alternative baseline solutions.2022-01-30T00:00:00

    Framework for Hardware Acceleration of 400Gb Networks

    Get PDF
    Platforma NetCOPE již prokázala svou životaschopnost jako framework pro rychlý vývoj hardwarově akcelerovaných síťových aplikací. Tyto aplikace využívají virtualizované síťové funkce (NFV). Aby platforma v nejbližších letech nezastarala, tak se musí přizpůsobit požadavkům souvisejících s příchodem 400 Gigabitového ethernetu. Příchod 400 Gigabitového ethernetu sebou přináší velké množství výzev, které vyžadují nutnost kompletně změnit dosavadní myšlení. Během jediného hodinového cyklu musí být zpracováno několik síťových paketů, což vyžaduje nový koncept zpracování. Je použita pokročilá správa paměti, aby byla dosažena konstantní paměťová složitost vzhledem k počtu DMA kanálů. Díky tomu je možné realizovat se současnou technologií i více než 256 kompletně nezávislých DMA kanálů. Mnoho úsilí bylo kladeno na vytvoření co nejobecnějšího frameworku i pro vyšší rychlosti přesahující 400 Gb/s. Práce se zaměřuje na komunikaci mezi frameworkem a hostitelským počítačem skrze PCI Express rozhraní. Byla vzata do úvahy i varianta s více síťovými rozhraními. Navržený systém je připraven na nasazení na kartách rodiny COMBO, které jsou použity jako referenční platforma.The NetCOPE framework has proven itself as a viable framework for rapid development of hardware accelerated wire-speed network applications using Network Functions Virtualization (NFV). To meet the current and future requirements of such applications the NetCOPE platform has to catch up with upcoming 400 Gigabit Ethernet. Otherwise, it may become deprecated in following years. Catching up with 400 Gigabit Ethernet brings many challenges bringing necessity of completely different way of thinking. Multiple network packets have to be processed each clock cycle requiring a new concept of processing. Advanced memory management is used to ensure constant memory complexity with respect to the number of DMA channels without any impact on performance. Thanks to that, even more than 256 completely independent DMA channels are feasible with current technology. A lot of effort was made to create the framework as generic as possible allowing deployment of 400 Gigabit Ethernet and beyond. Emphasis is put on communication between the framework and host computer via PCI Express technology. Multiple Ethernet ports are also considered. The proposed system is prepared to be deployed on the family of COMBO cards, used as a reference platform.

    A Quantitative Analysis and Guideline of Data Streaming Accelerator in Intel 4th Gen Xeon Scalable Processors

    Full text link
    As semiconductor power density is no longer constant with the technology process scaling down, modern CPUs are integrating capable data accelerators on chip, aiming to improve performance and efficiency for a wide range of applications and usages. One such accelerator is the Intel Data Streaming Accelerator (DSA) introduced in Intel 4th Generation Xeon Scalable CPUs (Sapphire Rapids). DSA targets data movement operations in memory that are common sources of overhead in datacenter workloads and infrastructure. In addition, it becomes much more versatile by supporting a wider range of operations on streaming data, such as CRC32 calculations, delta record creation/merging, and data integrity field (DIF) operations. This paper sets out to introduce the latest features supported by DSA, deep-dive into its versatility, and analyze its throughput benefits through a comprehensive evaluation. Along with the analysis of its characteristics, and the rich software ecosystem of DSA, we summarize several insights and guidelines for the programmer to make the most out of DSA, and use an in-depth case study of DPDK Vhost to demonstrate how these guidelines benefit a real application

    Consensus protocols exploiting network programmability

    Get PDF
    Services rely on replication mechanisms to be available at all time. The service demanding high availability is replicated on a set of machines called replicas. To maintain the consistency of replicas, a consensus protocol such as Paxos or Raft is used to synchronize the replicas' state. As a result, failures of a minority of replicas will not affect the service as other non-faulty replicas continue serving requests. A consensus protocol is a procedure to achieve an agreement among processors in a distributed system involving unreliable processors. Unfortunately, achieving such an agreement involves extra processing on every request, imposing a substantial performance degradation. Consequently, performance has long been a concern for consensus protocols. Although many efforts have been made to improve consensus performance, it continues to be an important problem for researchers. This dissertation presents a novel approach to improving consensus performance. Essentially, it exploits the programmability of a new breed of network devices to accelerate consensus protocols that traditionally run on commodity servers. The benefits of using programmable network devices to run consensus protocols are twofold: The network switches process packets faster than commodity servers and consensus messages travel fewer hops in the network. It means that the system throughput is increased and the latency of requests is reduced. The evaluation of our network-accelerated consensus approach shows promising results. Individual components of our FPGA- based and switch-based consensus implementations can process 10 million and 2.5 billion consensus messages per second, respectively. Our FPGA-based system as a whole delivers 4.3 times performance of a traditional software consensus implementation. The latency is also better for our system and is only one third of the latency of the software consensus implementation when both systems are under half of their maximum throughputs. In order to drive even higher performance, we apply a partition mechanism to our switch-based system, leading to 11 times better throughput and 5 times better latency. By dynamically switching between software-based and network-based implementations, our consensus systems not only improve performance but also use energy more efficiently. Encouraged by those benefits, we developed a fault-tolerant non-volatile memory system. A prototype using software memory controller demonstrated reasonable overhead over local memory access, showing great promise as scalable main memory. Our network-based consensus approach would have a great impact in data centers. It not only improves performance of replication mechanisms which relied on consensus, but also enhances performance of services built on top of those replication mechanisms. Our approach also motivates others to move new functionalities into the network, such as, key-value store and stream processing. We expect that in the near future, applications that typically run on traditional servers will be folded into networks for performance
    corecore