13 research outputs found

    Accelerating Network Functions using Reconfigurable Hardware. Design and Validation of High Throughput and Low Latency Network Functions at the Access Edge

    Get PDF
    Providing Internet access to billions of people worldwide is one of the main technical challenges in the current decade. The Internet access edge connects each residential and mobile subscriber to this network and ensures a certain Quality of Service (QoS). However, the implementation of access edge functionality challenges Internet service providers: First, a good QoS must be provided to the subscribers, for example, high throughput and low latency. Second, the quick rollout of new technologies and functionality demands flexible configuration and programming possibilities of the network components; for example, the support of novel, use-case-specific network protocols. The functionality scope of an Internet access edge requires the use of programming concepts, such as Network Functions Virtualization (NFV). The drawback of NFV-based network functions is a significantly lowered resource efficiency due to the execution as software, commonly resulting in a lowered QoS compared to rigid hardware solutions. The usage of programmable hardware accelerators, named NFV offloading, helps to improve the QoS and flexibility of network function implementations. In this thesis, we design network functions on programmable hardware to improve the QoS and flexibility. First, we introduce the host bypassing concept for improved integration of hardware accelerators in computer systems, for example, in 5G radio access networks. This novel concept bypasses the system’s main memory and enables direct connectivity between the accelerator and network interface card. Our evaluations show an improved throughput and significantly lowered latency jitter for the presented approach. Second, we analyze different programmable hardware technologies for hardware-accelerated Internet subscriber handling, including three P4-programmable platforms and FPGAs. Our results demonstrate that all approaches have excellent performance and are suitable for Internet access creation. We present a fully-fledged User Plane Function (UPF) designed upon these concepts and test it in an end-to-end 5G standalone network as part of this contribution. Third, we analyze and demonstrate the usability of Active Queue Management (AQM) algorithms on programmable hardware as an expansion to the access edge. We show the feasibility of the CoDel AQM algorithm and discuss the challenges and constraints to be considered when limited hardware is used. The results show significant improvements in the QoS when the AQM algorithm is deployed on hardware. Last, we focus on network function benchmarking, which is crucial for understanding the behavior of implementations and their optimization, e.g., Internet access creation. For this, we introduce the load generation and measurement framework P4STA, benefiting from flexible software-based load generation and hardware-assisted measuring. Utilizing programmable network switches, we achieve a nanosecond time accuracy while generating test loads up to the available Ethernet link speed

    neuroAIx-Framework: design of future neuroscience simulation systems exhibiting execution of the cortical microcircuit model 20Ă— faster than biological real-time

    Get PDF
    IntroductionResearch in the field of computational neuroscience relies on highly capable simulation platforms. With real-time capabilities surpassed for established models like the cortical microcircuit, it is time to conceive next-generation systems: neuroscience simulators providing significant acceleration, even for larger networks with natural density, biologically plausible multi-compartment models and the modeling of long-term and structural plasticity.MethodsStressing the need for agility to adapt to new concepts or findings in the domain of neuroscience, we have developed the neuroAIx-Framework consisting of an empirical modeling tool, a virtual prototype, and a cluster of FPGA boards. This framework is designed to support and accelerate the continuous development of such platforms driven by new insights in neuroscience.ResultsBased on design space explorations using this framework, we devised and realized an FPGA cluster consisting of 35 NetFPGA SUME boards.DiscussionThis system functions as an evaluation platform for our framework. At the same time, it resulted in a fully deterministic neuroscience simulation system surpassing the state of the art in both performance and energy efficiency. It is capable of simulating the microcircuit with 20Ă— acceleration compared to biological real-time and achieves an energy efficiency of 48nJ per synaptic event

    Improving Data Management and Data Movement Efficiency in Hybrid Storage Systems

    Get PDF
    University of Minnesota Ph.D. dissertation.July 2017. Major: Computer Science. Advisor: David Du. 1 computer file (PDF); ix, 116 pages.In the big data era, large volumes of data being continuously generated drive the emergence of high performance large capacity storage systems. To reduce the total cost of ownership, storage systems are built in a more composite way with many different types of emerging storage technologies/devices including Storage Class Memory (SCM), Solid State Drives (SSD), Shingle Magnetic Recording (SMR), Hard Disk Drives (HDD), and even across off-premise cloud storage. To make better utilization of each type of storage, industries have provided multi-tier storage through dynamically placing hot data in the faster tiers and cold data in the slower tiers. Data movement happens between devices on one single device and as well as between devices connected via various networks. Toward improving data management and data movement efficiency in such hybrid storage systems, this work makes the following contributions: To bridge the giant semantic gap between applications and modern storage systems, passing a piece of tiny and useful information (I/O access hints) from upper layers to the block storage layer may greatly improve application performance or ease data management in heterogeneous storage systems. We present and develop a generic and flexible framework, called HintStor, to execute and evaluate various I/O access hints on heterogeneous storage systems with minor modifications to the kernel and applications. The design of HintStor contains a new application/user level interface, a file system plugin and a block storage data manager. With HintStor, storage systems composed of various storage devices can perform pre-devised data placement, space reallocation and data migration polices assisted by the added access hints. Each storage device/technology has its own unique price-performance tradeoffs and idiosyncrasies with respect to workload characteristics they prefer to support. To explore the internal access patterns and thus efficiently place data on storage systems with fully connected (i.e., data can move from one device to any other device instead of moving tier by tier) differential pools (each pool consists of storage devices of a particular type), we propose a chunk-level storage-aware workload analyzer framework, simplified as ChewAnalyzer. With ChewAnalzyer, the storage manager can adequately distribute and move the data chunks across different storage pools. To reduce the duplicate content transferred between local storage devices and devices in remote data centers, an inline Network Redundancy Elimination (NRE) process with Content-Defined Chunking (CDC) policy can obtain a higher Redundancy Elimination (RE) ratio but may suffer from a considerably higher computational requirement than fixed-size chunking. We build an inline NRE appliance which incorporates an improved FPGA based scheme to speed up CDC processing. To efficiently utilize the hardware resources, the whole NRE process is handled by a Virtualized NRE (VNRE) controller. The uniqueness of this VNRE that we developed lies in its ability to exploit the redundancy patterns of different TCP flows and customize the chunking process to achieve a higher RE ratio
    corecore