6 research outputs found

    FPGA based technical solutions for high throughput data processing and encryption for 5G communication: A review

    Get PDF
    The field programmable gate array (FPGA) devices are ideal solutions for high-speed processing applications, given their flexibility, parallel processing capability, and power efficiency. In this review paper, at first, an overview of the key applications of FPGA-based platforms in 5G networks/systems is presented, exploiting the improved performances offered by such devices. FPGA-based implementations of cloud radio access network (C-RAN) accelerators, network function virtualization (NFV)-based network slicers, cognitive radio systems, and multiple input multiple output (MIMO) channel characterizers are the main considered applications that can benefit from the high processing rate, power efficiency and flexibility of FPGAs. Furthermore, the implementations of encryption/decryption algorithms by employing the Xilinx Zynq Ultrascale+MPSoC ZCU102 FPGA platform are discussed, and then we introduce our high-speed and lightweight implementation of the well-known AES-128 algorithm, developed on the same FPGA platform, and comparing it with similar solutions already published in the literature. The comparison results indicate that our AES-128 implementation enables efficient hardware usage for a given data-rate (up to 28.16 Gbit/s), resulting in higher efficiency (8.64 Mbps/slice) than other considered solutions. Finally, the applications of the ZCU102 platform for high-speed processing are explored, such as image and signal processing, visual recognition, and hardware resource management

    Low-Cost Concurrent Error Detection for GCM and CCM

    Get PDF
    In many applications, encryption alone does not provide enough security. To enhance security, dedicated authenticated encryption (AE) mode are invented. Galios Counter Mode (GCM) and Counter with CBC-MAC mode (CCM) are the AE modes recommended by the National Institute of Standards and Technology. To support high data rates, AE modes are usually implemented in hardware. However, natural faults reduce its reliability and may undermine both its encryption and authentication capability. We present a low-cost concurrent error detection (CED) scheme for 7 AE architectures. The proposed technique explores idle cycles of the AE mode architectures. Experimental results shows that the performance overhead can be lower than 100% for all architectures depending on the workload. FPGA implementation results show that the hardware overhead in the 0.1-23.3% range and the power overhead is in the 0.2-23.2% range. ASIC implementation results show that the hardware overhead in the 0.1-22.8% range and the power overhead is in the 0.3-12.6% range. The underlying block cipher and hash module need not have CED built in. Thus, it allows system designers to integrate block cipher and hash function intellectual property from different vendors

    Chiffrement authentifié sur FPGAs de la partie reconfigurable à la partie static

    Get PDF
    Communication systems need to access, store, manipulate, or communicate sensitive information. Therefore, cryptographic primitives such as hash functions and block ciphers are deployed to provide encryption and authentication. Recently, techniques have been invented to combine encryption and authentication into a single algorithm which is called Authenticated Encryption (AE). Combining these two security services in hardware produces better performance compared to two separated algorithms since authentication and encryption can share a part of the computation. Because of combining the programmability with the performance ofcustom hardware, FPGAs become more common as an implementation target for such algorithms. The first part of this thesis is devoted to efficient and high-speed FPGA-based architectures of AE algorithms, AES-GCM and AEGIS-128, in order to be used in the reconfigurable part of FPGAs to support security services of communication systems. Our focus on the state of the art leads to the introduction of high-speed architectures for slow changing keys applications like Virtual Private Networks (VPNs). Furthermore, we present an efficient method for implementing the GF(2¹²⁸) multiplier, which is responsible for the authentication task in AES-GCM, to support high-speed applications. Additionally, an efficient AEGIS-128is also implemented using only five AES rounds. Our hardware implementations were evaluated using Virtex-5 and Virtex-4 FPGAs. The performance of the presented architectures (Thr./Slices) outperforms the previously reported ones.The second part of the thesis presents techniques for low cost solutions in order to secure the reconfiguration of FPGAs. We present different ranges of low cost implementations of AES-GCM, AES-CCM, and AEGIS-128, which are used in the static part of the FPGA in order to decrypt and authenticate the FPGA bitstream. Presented ASIC architectures were evaluated using 90 and 65 nm technologies and they present better performance compared to the previous work.Les systèmes de communication ont besoin d'accéder, stocker, manipuler, ou de communiquer des informations sensibles. Par conséquent, les primitives cryptographiques tels que les fonctions de hachage et le chiffrement par blocs sont déployés pour fournir le cryptage et l'authentification. Récemment, des techniques ont été inventés pour combiner cryptage et d'authentification en un seul algorithme qui est appelé authentifiés Encryption (AE). La combinaison de ces deux services de sécurité dans le matériel de meilleures performances par rapport aux deux algorithmes séparés puisque l'authentification et le cryptage peuvent partager une partie du calcul. En raison de la combinaison de la programmation de l'exécution de matériel personnalisé, FPGA deviennent plus communs comme cible d'une mise en œuvre de ces algorithmes. La première partie de cette thèse est consacrée aux architectures d'algorithmes AE, AES-GCM et AEGIS-128 à base de FPGA efficaces et à grande vitesse, afin d'être utilisé dans la partie reconfigurable FPGA pour soutenir les services de sécurité des systèmes de communication. Notre focalisation sur l'état de l'art conduit à la mise en place d'architectures à haute vitesse pour les applications lentes touches changeantes comme les réseaux privés virtuels (VPN). En outre, nous présentons un procédé efficace pour mettre en œuvre le GF(2¹²⁸) multiplicateur, qui est responsable de la tâche d'authentification en AES-GCM, pour supporter les applications à grande vitesse. En outre, un système efficace AEGIS-128 est également mis en œuvre en utilisant seulement cinq tours AES. Nos réalisations matérielles ont été évaluées à l'aide Virtex-5 et Virtex-4 FPGA. La performance des architectures présentées (Thr. / Parts) surpasse ceux signalés précédemment. La deuxième partie de la thèse présente des techniques pour des solutions à faible coût afin de garantir la reconfiguration du FPGA. Nous présentons différentes gammes de mises en œuvre à faible coût de AES-GCM, AES-CCM, et AEGIS-128, qui sont utilisés dans la partie statique du FPGA afin de décrypter et authentifier le bitstream FPGA. Architectures ASIC présentées ont été évaluées à l'aide de 90 et 65 technologies nm et présentent de meilleures performances par rapport aux travaux antérieurs

    New primitives of controlled elements F2/4 for block ciphers

    Get PDF
    This paper develops the cipher design approach based on the use of data-dependent operations (DDOs). A new class of DDO based on the advanced controlled elements (CEs) is introduced, which is proven well suited to hardware implementations for FPGA devices. To increase the hardware implementation efficiency of block ciphers, while using contemporary FPGA devices there is proposed an approach to synthesis of fast block ciphers, which uses the substitution-permutation network constructed on the basis of the controlled elements F2/4 implementing the 2 x 2 substitutions under control of the four-bit vector. There are proposed criteria for selecting elements F2/4 and results on investigating their main cryptographic properties. It is designed a new fast 128-bit block cipher MM-128 that uses the elements F2/4 as elementary building block. The cipher possesses higher performance and requires less hardware resources for its implementation on the bases of FPGA devices than the known block ciphers. There are presented result on differential analysis of the cipher MM-12

    A >100 Gbps Inline AES-GCM Hardware Engine and Protected DMA Transfers between SGX Enclave and FPGA Accelerator Device

    Get PDF
    This paper proposes a method to protect DMA data transfer that can be used to offload computation to an accelerator. The proposal minimizes changes in the hardware platform and to the application and SW stack. The paper de-scribes the end-to-end scheme to protect communication between an appli-cation running inside a SGX enclave and a FPGA accelerator optimized for bandwidth and latency and details the implementation of AES-GCM hard-ware engines with high bandwidth and low latency

    Towards Intelligent Data Acquisition Systems with Embedded Deep Learning on MPSoC

    Get PDF
    Large-scale scientific experiments rely on dedicated high-performance data-acquisition systems to sample, readout, analyse, and store experimental data. However, with the rapid development in detector technology in various fields, the number of channels and the data rate are increasing. For trigger and control tasks data acquisition systems needs to satisfy real-time constraints, enable short-time latency and provide the possibility to integrate intelligent data processing. During recent years machine learning approaches have been used successfully in many applications. This dissertation will study how machine learning techniques can be integrated already in the data acquisition of large-scale experiments. A universal data acquisition platform for multiple data channels has been developed. Different machine learning implementation methods and application have been realized using this system. On the hardware side, recent FPGAs do not only provide high-performance parallel logic but more and more additional features, like ultra-fast transceivers and embedded ARM processors. TSMC\u27s 16nm FinFET Plus (16FF+) 3D transistor technology enables Xilinx in the Zynq UltraScale+ FPGA devices to increase the performance/watt ratio by 2 to 5 times compared to their previous generation. The selected main processor ZU11EG owns 32 GTH transceivers where each one could operate up to 16.316.3 Gb/s and 16 GTY transceivers where each of them could operate up to 32.7532.75 Gb/s. These transceivers are routed to x16 lanes Gen 33/44 PCIe, 1212 lanes full-duplex FireFly electrical/optical data link and VITA 57.4 FMC+ connector. The new Zynq UltraScale+ device provides at least three major advantages for advanced data acquisition systems: First, the 16nm FinFET+ programmable logic (PL) provides high-speed readout capabilities by high-speed transceivers; second, built-in quad-core 64-bit ARM Cortex-A53 processor enable host embedded Linux system. Thus, webservers, slow control and monitoring application could be realized in a embedded processor environment; third, the Zynq Multiprocessor System-on-Chip technology connects programmable logic and microprocessors. In this thesis, the benefits of such architectures for the integration of machine learning algorithms in data acquisition systems and control application are demonstrated. On the algorithm side, there have been many achievements in the field of machine learning over the last decades. Existing machine learning algorithms split into several categories depending on how the learning phase is organized: Supervised Learning, Unsupervised Learning, Semi-Supervised Learning and Reinforcement Learning. Most commonly used in scientific applications are supervised learning and reinforcement learning. Supervised learning learns from the labelled input and output, and generates a function that could predict the future different input to the appropriate output. A common application instance is a classification. They have a wide difference in basic math theory, training, inference, and their implementation. One of the natural solutions is Application Specific Integrated Circuit (ASIC) Artificial Intelligence (AI) chips. A typical example is the Google Tensor Processing Unit (TPU), it could cover the training and inference for both supervised learning and reinforcement learning. One of the major issues is that such chip could not provide high data transferring bandwidth other than high compute power. As a comparison, the Xilinx UltraScale+ FPGA could also provide raw compute power and efficiency for all different data types down to a single bit. From a deployment point of view, the training part of supervised learning is typically performed by CPU/GPU/TPU on a fixed dataset. For reinforcement learning, the training phase is more complex. The algorithm needs to periodically interact with the controlled system and execute a Markov Decision Process (MDP). There is no static training dataset, but it is obtained in real-time. The time slot between each step depends on the dynamics of the controlled system. The inference is also bound to this sampling time because the algorithm needs to interact with the environment and decide the appropriate action for a response, then a higher demand on time is proposed. This thesis gives solutions for both training and inference of reinforcement learning. At first, the requirements are analyzed, then the algorithm is deduced from scratch, and training on the PS part of Zynq device is implemented, meanwhile the inference at FPGA side is proposed which is similar solution compared with supervised learning. The results for Policy Gradient show a lot of improvement over a CPU/GPU-based machine learning framework. The Deep Deterministic Policy Gradient also has improvement regarding both training latency and stability. This implementation method provides a low-latency approach for reinforcement learning on-field training process
    corecore