101 research outputs found

    VNF-AAPC : accelerator-aware VNF placement and chaining

    Get PDF
    In recent years, telecom operators have been migrating towards network architectures based on Network Function Virtualization in order to reduce their high Capital Expenditure (CAPEX) and Operational Expenditure (OPEX). However, virtualization of some network functions is accompanied by a significant degradation of Virtual Network Function (VNF) performance in terms of their throughput or energy consumption. To address these challenges, use of hardware-accelerators, e.g. FPGAs, GPUs, to offload CPU-intensive operations from performance-critical VNFs has been proposed. Allocation of NFV infrastructure (NFVi) resources for VNF placement and chaining (VNF-PC) has been a major area of research recently. A variety of resources allocation models have been proposed to achieve various operator's objectives i.e. minimizing CAPEX, OPEX, latency, etc. However, the VNF-PC resource allocation problem for the case when NFVi incorporates hardware-accelerators remains unaddressed. Ignoring hardware-accelerators in NFVi while performing resource allocation for VNF-chains can nullify the advantages resulting from the use of hardware-accelerators. Therefore, accurate models and techniques for the accelerator-aware VNF-PC (VNF-AAPC) are needed in order to achieve the overall efficient utilization of all NFVi resources including hardware-accelerators. This paper investigates the problem of VNF-AAPC, i.e., how to allocate usual NFVi resources along-with hardware-accelerators to VNF-chains in a cost-efficient manner. Particularly, we propose two methods to tackle the VNF-AAPC problem. The first approach is based on Integer Linear Programming (ILP) which jointly optimizes VNF placement, chaining and accelerator allocation while concurring to all NFVi constraints. The second approach is a heuristic-based method that addresses the scalability issue of the ILP approach. The heuristic addresses the VNF-AAPC problem by following a two-step algorithm. The experimental evaluations indicate that incorporating accelerator-awareness in VNF-PC strategies can help operators to achieve additional cost-savings from the efficient allocation of hardware-accelerator resources

    An Energy-Efficient Reconfigurable DTLS Cryptographic Engine for Securing Internet-of-Things Applications

    Full text link
    This paper presents the first hardware implementation of the Datagram Transport Layer Security (DTLS) protocol to enable end-to-end security for the Internet of Things (IoT). A key component of this design is a reconfigurable prime field elliptic curve cryptography (ECC) accelerator, which is 238x and 9x more energy-efficient compared to software and state-of-the-art hardware respectively. Our full hardware implementation of the DTLS 1.3 protocol provides 438x improvement in energy-efficiency over software, along with code size and data memory usage as low as 8 KB and 3 KB respectively. The cryptographic accelerators are coupled with an on-chip low-power RISC-V processor to benchmark applications beyond DTLS with up to two orders of magnitude energy savings. The test chip, fabricated in 65 nm CMOS, demonstrates hardware-accelerated DTLS sessions while consuming 44.08 uJ per handshake, and 0.89 nJ per byte of encrypted data at 16 MHz and 0.8 V.Comment: Published in IEEE Journal of Solid-State Circuits (JSSC

    NASA SpaceCube Intelligent Multi-Purpose System for Enabling Remote Sensing, Communication, and Navigation in Mission Architectures

    Get PDF
    New, innovative CubeSat mission concepts demand modern capabilities such as artificial intelligence and autonomy, constellation coordination, fault mitigation, and robotic servicing – all of which require vastly more processing resources than legacy systems are capable of providing. Enabling these domains within a scalable, configurable processing architecture is advantageous because it also allows for the flexibility to address varying mission roles, such as a command and data-handling system, a high-performance application processor extension, a guidance and navigation solution, or an instrument/sensor interface. This paper describes the NASA SpaceCube Intelligent Multi-Purpose System (IMPS), which allows mission developers to mix-and-match 1U (10 cm × 10 cm) CubeSat payloads configured for mission-specific needs. The central enabling component of the system architecture to address these concerns is the SpaceCube v3.0 Mini Processor. This single-board computer features the 20nm Xilinx Kintex UltraScale FPGA combined with a radiation-hardened FPGA monitor, and extensive IO to integrate and interconnect varying cards within the system. To unify the re-usable designs within this architecture, the CubeSat Card Standard was developed to guide design of 1U cards. This standard defines pinout configurations, mechanical, and electrical specifications for 1U CubeSat cards, allowing the backplane and mechanical enclosure to be easily extended. NASA has developed several cards adhering to the standard (System-on-Chip, power card, etc.), which allows the flexibility to configure a payload from a common catalog of cards

    Studies on high-speed hardware implementation of cryptographic algorithms

    Get PDF
    Cryptographic algorithms are ubiquitous in modern communication systems where they have a central role in ensuring information security. This thesis studies efficient implementation of certain widely-used cryptographic algorithms. Cryptographic algorithms are computationally demanding and software-based implementations are often too slow or power consuming which yields a need for hardware implementation. Field Programmable Gate Arrays (FPGAs) are programmable logic devices which have proven to be highly feasible implementation platforms for cryptographic algorithms because they provide both speed and programmability. Hence, the use of FPGAs for cryptography has been intensively studied in the research community and FPGAs are also the primary implementation platforms in this thesis. This thesis presents techniques allowing faster implementations than existing ones. Such techniques are necessary in order to use high-security cryptographic algorithms in applications requiring high data rates, for example, in heavily loaded network servers. The focus is on Advanced Encryption Standard (AES), the most commonly used secret-key cryptographic algorithm, and Elliptic Curve Cryptography (ECC), public-key cryptographic algorithms which have gained popularity in the recent years and are replacing traditional public-key cryptosystems, such as RSA. Because these algorithms are well-defined and widely-used, the results of this thesis can be directly applied in practice. The contributions of this thesis include improvements to both algorithms and techniques for implementing them. Algorithms are modified in order to make them more suitable for hardware implementation, especially, focusing on increasing parallelism. Several FPGA implementations exploiting these modifications are presented in the thesis including some of the fastest implementations available in the literature. The most important contributions of this thesis relate to ECC and, specifically, to a family of elliptic curves providing faster computations called Koblitz curves. The results of this thesis can, in their part, enable increasing use of cryptographic algorithms in various practical applications where high computation speed is an issue

    Techniques for Processing TCP/IP Flow Content in Network Switches at Gigabit Line Rates

    Get PDF
    The growth of the Internet has enabled it to become a critical component used by businesses, governments and individuals. While most of the traffic on the Internet is legitimate, a proportion of the traffic includes worms, computer viruses, network intrusions, computer espionage, security breaches and illegal behavior. This rogue traffic causes computer and network outages, reduces network throughput, and costs governments and companies billions of dollars each year. This dissertation investigates the problems associated with TCP stream processing in high-speed networks. It describes an architecture that simplifies the processing of TCP data streams in these environments and presents a hardware circuit capable of TCP stream processing on multi-gigabit networks for millions of simultaneous network connections. Live Internet traffic is analyzed using this new TCP processing circuit

    Doctor of Philosophy

    Get PDF
    dissertationAs the base of the software stack, system-level software is expected to provide ecient and scalable storage, communication, security and resource management functionalities. However, there are many computationally expensive functionalities at the system level, such as encryption, packet inspection, and error correction. All of these require substantial computing power. What's more, today's application workloads have entered gigabyte and terabyte scales, which demand even more computing power. To solve the rapidly increased computing power demand at the system level, this dissertation proposes using parallel graphics pro- cessing units (GPUs) in system software. GPUs excel at parallel computing, and also have a much faster development trend in parallel performance than central processing units (CPUs). However, system-level software has been originally designed to be latency-oriented. GPUs are designed for long-running computation and large-scale data processing, which are throughput-oriented. Such mismatch makes it dicult to t the system-level software with the GPUs. This dissertation presents generic principles of system-level GPU computing developed during the process of creating our two general frameworks for integrating GPU computing in storage and network packet processing. The principles are generic design techniques and abstractions to deal with common system-level GPU computing challenges. Those principles have been evaluated in concrete cases including storage and network packet processing applications that have been augmented with GPU computing. The signicant performance improvement found in the evaluation shows the eectiveness and eciency of the proposed techniques and abstractions. This dissertation also presents a literature survey of the relatively young system-level GPU computing area, to introduce the state of the art in both applications and techniques, and also their future potentials

    The P4->NetFPGA Workflow for Line-Rate Packet Processing

    Get PDF
    P4 has emerged as the de facto standard language for describing how network packets should be processed, and is becoming widely used by network owners, systems developers, researchers and in the classroom. The goal of the work presented here is to make it easier for engineers, researchers and students to learn how to program using P4, and to build prototypes running on real hardware. Our target is the NetFPGA SUME platform, a 4x10 Gb/s PCIe card designed for use in universities for teaching and research. Until now, NetFPGA users have needed to learn an HDL such as Verilog or VHDL, making it off limits to many software developers and students. Therefore, we developed the P4->NetFPGA workflow, allowing developers to describe how packets are to be processed in the high-level P4 language, then compile their P4 programs to run at line rate on the NetFPGA SUME board. The P4->NetFPGA workflow is built upon the Xilinx P4-SDNet compiler and the NetFPGA SUME open source code base. In this tutorial paper, we provide an overview of the P4 programming language and describe the P4->NetFPGA workflow. We also describe how the workflow is being used by the P4 community to build research prototypes, and to teach how network systems are built by providing students with hands-on experience working with real hardware.Leverhulme Trust Isaac Newton Trust TBD other

    High-speed Side-channel-protected Encryption and Authentication in Hardware

    Get PDF
    This paper describes two FPGA implementations for the encryption and authentication of data, based on the AES algorithm running in Galois/Counter mode (AES-GCM). Both architectures are protected against side-channel analysis attacks through the use of a threshold implementation (TI). The first architecture is fully unrolled and optimized for throughput. The second architecture uses a round-based structure, fits on a relatively small FPGA board, and is evaluated for side-channel attack resistance. We perform a Test Vector Leakage Assessment (TVLA), which shows no first-order leakage in the power consumption of the FPGA. To the best of our knowledge, our work is (1) the first to describe a throughput-optimized FPGA architecture of AES-GCM, protected against first-order side-channel information leakage, and (2) the first to evaluate the side-channel attack resistance of a TI-protected AES-GCM implementation
    corecore