395 research outputs found

    OSMOSIS: Enabling Multi-Tenancy in Datacenter SmartNICs

    Full text link
    Multi-tenancy is essential for unleashing SmartNIC's potential in datacenters. Our systematic analysis in this work shows that existing on-path SmartNICs have resource multiplexing limitations. For example, existing solutions lack multi-tenancy capabilities such as performance isolation and QoS provisioning for compute and IO resources. Compared to standard NIC data paths with a well-defined set of offloaded functions, unpredictable execution times of SmartNIC kernels make conventional approaches for multi-tenancy and QoS insufficient. We fill this gap with OSMOSIS, a SmartNICs resource manager co-design. OSMOSIS extends existing OS mechanisms to enable dynamic hardware resource multiplexing on top of the on-path packet processing data plane. We implement OSMOSIS within an open-source RISC-V-based 400Gbit/s SmartNIC. Our performance results demonstrate that OSMOSIS fully supports multi-tenancy and enables broader adoption of SmartNICs in datacenters with low overhead.Comment: 12 pages, 14 figures, 103 reference

    Reconfigurable network systems and software-defined networking

    Get PDF
    Modern high-speed networks have evolved from relatively static networks to highly adaptive networks facilitating dynamic reconfiguration. This evolution has influenced all levels of network design and management, introducing increased programmability and configuration flexibility. This influence has extended from the lowest level of physical hardware interfaces to the highest level of network management by software. A key representative of this evolution is the emergence of softwaredefined networking (SDN). In this paper, we review the current state of the art in reconfigurable network systems, covering hardware reconfiguration, SDN, and the interplay between them. We take a top-down approach, starting with a tutorial on software-defined networks. We then continue to discuss programming languages as the linking element between different levels of software and hardware in the network. We review electronic switching systems, highlighting programmability and reconfiguration aspects, and describe the trends in reconfigurable network elements. Finally, we describe the state of the art in the integration of photonic transceiver and switching elements with electronic technologies, and consider the implications for SDN and reconfigurable network systems.This work was jointly supported by the UKs Engineering and Physical Sciences Research Council (EPSRC) Internet Project EP/H040536/1, an EPSRC Research Fellowship grant to Philip Watts (EP/I004157/2), and DARPA and AFRL under contract FA8750-11-C-0249.This is the final version of the article. It first appeared from IEEE via http://dx.doi.org/10.1109/JPROC.2015.243573

    A novel dynamic software-defined networking approach to neutralize traffic burst

    Get PDF
    Software-defined networks (SDN) has a holistic view of the network. It is highly suitable for handling dynamic loads in the traditional network with a minimal update in the network infrastructure. However, the standard SDN architecture control plane has been designed for single or multiple distributed SDN controllers facing severe bottleneck issues. Our initial research created a reference model for the traditional network, using the standard SDN (referred to as SDN hereafter) in a network simulator called NetSim. Based on the network traffic, the reference models consisted of light, modest and heavy networks depending on the number of connected IoT devices. Furthermore, a priority scheduling and congestion control algorithm is proposed in the standard SDN, named extended SDN (eSDN), which minimises congestion and performs better than the standard SDN. However, the enhancement was suitable only for the small-scale network because, in a large-scale network, the eSDN does not support dynamic SDN controller mapping. Often, the same SDN controller gets overloaded, leading to a single point of failure. Our literature review shows that most proposed solutions are based on static SDN controller deployment without considering flow fluctuations and traffic bursts that lead to a lack of load balancing among the SDN controllers in real-time, eventually increasing the network latency. Therefore, to maintain the Quality of Service (QoS) in the network, it becomes imperative for the static SDN controller to neutralise the on-the-fly traffic burst. Thus, our novel dynamic controller mapping algorithm with multiple-controller placement in the SDN is critical to solving the identified issues. In dSDN, the SDN controllers are mapped dynamically with the load fluctuation. If any SDN controller reaches its maximum threshold, the rest of the traffic will be diverted to another controller, significantly reducing delay and enhancing the overall performance. Our technique considers the latency and load fluctuation in the network and manages the situations where static mapping is ineffective in dealing with the dynamic flow variation. © 2023 by the authors

    VEGa : a high performance vehicular Ethernet gateway on hybrid FPGA

    Get PDF
    Modern vehicles employ a large amount of distributed computation and require the underlying communication scheme to provide high bandwidth and low latency. Existing communication protocols like Controller Area Network (CAN) and FlexRay do not provide the required bandwidth, paving the way for adoption of Ethernet as the next generation network backbone for in-vehicle systems. Ethernet would co-exist with safety-critical communication on legacy networks, providing a scalable platform for evolving vehicular systems. This requires a high-performance network gateway that can simultaneously handle high bandwidth, low latency, and isolation; features that are not achievable with traditional processor based gateway implementations. We present VEGa, a configurable vehicular Ethernet gateway architecture utilising a hybrid FPGA to closely couple software control on a processor with dedicated switching circuit on the reconfigurable fabric. The fabric implements isolated interface ports and an accelerated routing mechanism, which can be controlled and monitored from software. Further, reconfigurability enables the switching behaviour to be altered at run-time under software control, while the configurable architecture allows easy adaptation to different vehicular architectures using high-level parameter settings. We demonstrate the architecture on the Xilinx Zynq platform and evaluate the bandwidth, latency, and isolation using extensive tests in hardware

    COREC: Concurrent Non-Blocking Single-Queue Receive Driver for Low Latency Networking

    Full text link
    Existing network stacks tackle performance and scalability aspects by relying on multiple receive queues. However, at software level, each queue is processed by a single thread, which prevents simultaneous work on the same queue and limits performance in terms of tail latency. To overcome this limitation, we introduce COREC, the first software implementation of a concurrent non-blocking single-queue receive driver. By sharing a single queue among multiple threads, workload distribution is improved, leading to a work-conserving policy for network stacks. On the technical side, instead of relying on traditional critical sections - which would sequentialize the operations by threads - COREC coordinates the threads that concurrently access the same receive queue in non-blocking manner via atomic machine instructions from the Read-Modify-Write (RMW) class. These instructions allow threads to access and update memory locations atomically, based on specific conditions, such as the matching of a target value selected by the thread. Also, they enable making any update globally visible in the memory hierarchy, bypassing interference on memory consistency caused by the CPU store buffers. Extensive evaluation results demonstrate that the possible additional reordering, which our approach may occasionally cause, is non-critical and has minimal impact on performance, even in the worst-case scenario of a single large TCP flow, with performance impairments accounting to at most 2-3 percent. Conversely, substantial latency gains are achieved when handling UDP traffic, real-world traffic mix, and multiple shorter TCP flows

    Accurate and Resource-Efficient Monitoring for Future Networks

    Get PDF
    Monitoring functionality is a key component of any network management system. It is essential for profiling network resource usage, detecting attacks, and capturing the performance of a multitude of services using the network. Traditional monitoring solutions operate on long timescales producing periodic reports, which are mostly used for manual and infrequent network management tasks. However, these practices have been recently questioned by the advent of Software Defined Networking (SDN). By empowering management applications with the right tools to perform automatic, frequent, and fine-grained network reconfigurations, SDN has made these applications more dependent than before on the accuracy and timeliness of monitoring reports. As a result, monitoring systems are required to collect considerable amounts of heterogeneous measurement data, process them in real-time, and expose the resulting knowledge in short timescales to network decision-making processes. Satisfying these requirements is extremely challenging given today’s larger network scales, massive and dynamic traffic volumes, and the stringent constraints on time availability and hardware resources. This PhD thesis tackles this important challenge by investigating how an accurate and resource-efficient monitoring function can be realised in the context of future, software-defined networks. Novel monitoring methodologies, designs, and frameworks are provided in this thesis, which scale with increasing network sizes and automatically adjust to changes in the operating conditions. These achieve the goal of efficient measurement collection and reporting, lightweight measurement- data processing, and timely monitoring knowledge delivery

    A Framework for eBPF-Based Network Functions in an Era of Microservices

    Get PDF
    By moving network functionality from dedicated hardware to software running on end-hosts, Network Functions Virtualization (NFV) pledges the benefits of cloud computing to packet processing. While most of the NFV frameworks today rely on kernel-bypass approaches, no attention has been given to kernel packet processing, which has always proved hard to evolve and to program. In this article, we present Polycube, a software framework whose main goal is to bring the power of NFV to in-kernel packet processing applications, enabling a level of flexibility and customization that was unthinkable before. Polycube enables the creation of arbitrary and complex network function chains, where each function can include an efficient in-kernel data plane and a flexible user-space control plane with strong characteristics of isolation, persistence, and composability. Polycube network functions, called Cubes, can be dynamically generated and injected into the kernel networking stack, without requiring custom kernels or specific kernel modules, simplifying the debugging and introspection, which are two fundamental properties in recent cloud environments. We validate the framework by showing significant improvements over existing applications, and we prove the generality of the Polycube programming model through the implementation of complex use cases such as a network provider for Kubernetes

    Reconfigurable-Hardware Accelerated Stream Aggregation

    Get PDF
    High throughput and low latency stream aggregation is essential for many applications that analyze massive volumes of data in real-time. Incoming data need to be stored in a single sliding-window before processing, in cases where incremental aggregations are wasteful or not possible at all. However, storing all incoming values in a single-window puts tremendous pressure on the memory bandwidth and capacity. GPU and CPU memory management is inefficient for this task as it introduces unnecessary data movement that wastes bandwidth. FPGAs can make more efficient use of their memory but existing approaches employ only on-chip memory and therefore, can only support small problem sizes (i.e. small sliding windows and number of keys) due to the limited capacity. This thesis addresses the above limitations of stream processing systems by proposing techniques for accelerating single sliding-window stream aggregation using FPGAs to achieve line-rate processing throughput and ultra low latency. It does so first by building accelerators using FPGAs and second, by alleviating the memory pressure posed by single-window stream aggregation. The initial part of this thesis presents the accelerators for both windowing policies, namely, tuple- and time-\ua0based, using Maxeler\u27s DataFlow Engines\ua0(DFEs) which have a direct feed of incoming data from the network as well as direct access to off-chip DRAM. Compared to state-of-the-art stream processing software system, the DFEs offer 1-2 orders of magnitude higher processing throughput and 4 orders of magnitude lower latency. The later part of this thesis focuses on alleviating the memory pressure due to the various steps in single-window stream aggregation. Updating the window with new incoming values and reading it to feed the aggregation functions are the two primary steps in stream aggregation. The high on-chip SRAM bandwidth enables line-rate processing, but only for small problem sizes due to the limited capacity. The larger off-chip DRAM size supports larger problems, but falls short on performance due to lower bandwidth. In order to bridge this gap, this thesis introduces a specialized memory hierarchy for stream aggregation. It employs Multi-Level Queues (MLQs) spanning across multiple memory levels with different characteristics to offer both high bandwidth and capacity. In doing so, larger stream aggregation problems can be supported at line-rate performance, outperforming existing competing solutions. Compared to designs with only on-chip memory, our approach supports 4 orders of magnitude larger problems. Compared to designs that use only DRAM, our design achieves up to 8x higher throughput. Finally, this thesis aims to alleviate the memory pressure due to the window-aggregation step. Although window-updates can be supported efficiently using MLQs, frequent window-aggregations remain a performance bottleneck. This thesis addresses this problem by introducing StreamZip, a dataflow stream aggregation engine that is able to compress the sliding-windows. StreamZip deals with a number of data and control dependency challenges to integrate a compressor in the stream aggregation pipeline and alleviate the memory pressure posed by frequent aggregations. In doing so, StreamZip offers higher throughput as well as larger effective window capacity to support larger problems. StreamZip supports diverse compression algorithms offering both lossless and lossy compression to fixed- as well as floating- point numbers. Compared to designs using MLQs, StreamZip lossless and lossy designs achieve up to 7.5x and 22x higher throughput, while improving the effective memory capacity by up to 5x and 23x, respectively
    • …
    corecore