112 research outputs found

    Improving Hardware Implementation of Cryptographic AES Algorithm and the Block Cipher Modes of Operation

    Get PDF
    With ever increasing Internet traffic, more business and financial transactions are being conducted online. This is even more so during these days of COVID-19 pandemic when traditional businesses such as traditional face to face educational systems have gone online requiring huge amount of data being exchanged over Internet. Increase in the volume of data sent over the Internet has also increased the security vulnerabilities such as challenging the confidentiality of data being sent over the Internet. Due to sheer volume, all data will need to be effectively encrypted. Due to increase in the volume of data, it is also important to have encryption/decryption functions to work at a higher speed to maintain the confidentiality of sensitive data. In this thesis, our goal is to enhance the hardware speed of encryption process of the standard AES scheme and its four variants such as AES-128, AES-192, AES-256 and new AES-512 and implement such functions on an FPGA. We also consider the FPGA implementation of different modes of AES operation. By employing parallelism and pipelining approach, we attempt to speed up various computational components of AES implementations using the Quartus II onto Intel’s FPGA. This approach shows improvement in the response speed, data throughput and latency

    CUDA capable GPU as an efficient co-processor

    Get PDF

    High throughput FPGA Implementation of Advanced Encryption Standard Algorithm

    Get PDF
     The growth of computer systems and electronic communications and transactions has meant that the need for effective security and reliability of data communication, processing and storage is more important than ever. In this context, cryptography is a high priority research area in engineering. The Advanced Encryption Standard (AES) is a symmetric-key criptographic algorithm for protecting sensitive information and is one of the most widely secure and used algorithm today. High-throughput, low power and compactness have always been topic of interest for implementing this type of algorithm. In this paper, we are interested on the development of high throughput architecture and implementation of AES algorithm, using the least amount of hardware possible. We have adopted a pipeline approach in order to reduce the critical path and achieve competitive performances in terms of throughput and efficiency. This approach is effectively tested on the AES S-Box substitution. The latter is a complex transformation and the key point to improve architecture performances. Considering the high delay and hardware required for this transformation, we proposed 7-stage pipelined S-box by using composite field in order to deal with the critical path and the occupied area resources. In addition, efficient AES key expansion architecture suitable for our proposed pipelined AES is presented. The implementation had been successfully done on Virtex-5 XC5VLX85 and Virtex-6 XC6VLX75T Field Programmable Gate Array (FPGA) devices using Xilinx ISE v14.7. Our AES design achieved a data encryption rate of 108.69 Gbps and used only 6361 slices ressource. Compared to the best previous work, this implementation improves data throughput by 5.6% and reduces the used slices to 77.69%

    Parallel arithmetic encryption for high-bandwidth communications on multicore/GPGPU platforms.

    Get PDF
    International audienceIn this work we study the feasibility of high-bandwidth, secure communications on generic machines equipped with the latest CPUs and General-Purpose Graphical Processing Units (GPGPU). We first analyze the suitability of current Nehalem CPU architectures. We show in particular that high performance CPUs are not sufficient by themselves to reach our performance objectives, and that encryption is the main bottleneck. Therefore we also consider the use of GPGPU, and more particularly we measure the bandwidth of the AES ciphering on CUDA. These tests lead us to the conclusion that finding an appropriate solution is extremely difficult

    Cryptographic algorithm acceleration using CUDA enabled GPUs in typical system configurations

    Get PDF
    The need to encrypt data is becoming more and more necessary. As the size of datasets continues to grow, the speed of encryption must increase to keep up or it will become a bottleneck. CUDA GPUs have been shown to offer performance improvements versus conventional CPUs for some data-intensive problems. This thesis evaluates the applicability of CUDA GPUs in accelerating the execution of cryptographic algorithms, which are increasingly used for growing amounts of data and thus will require significantly faster encryption and hashing throughput. Specifically, the CUDA environment was used to implement and experiment with three distinct cryptographic algorithms -- AES, SHA-2, and Keccak -- in order to show the applicability for various cryptographic algorithm classes. They were implemented in a system that emulates the conditions present in a real world environment, and the effects of offloading these tasks from the CPU to the GPU were assessed. Speedups up to 2.6x relative to the CPU were seen for single-kernel AES, but SHA-2 and Keccak did not perform as well as on the GPU as on the CPU. Multi-kernel AES saw speedups over single-kernel AES up to 1.4x, 1.65x, and 1.8x for two, three, and four kernels, respectively. This translates to speedups between 3.6x and 4.7x over CPU implementations of AES. Introducing a CPU load had a minimal effect on throughput whereas a GPU load was seen to decrease throughput by as much as 4%. Overall, CUDA GPUs appear to have potential for improving encryption throughputs if a parallelizable algorithm is selected

    High-performance FPGA architecture for data streams processing on example of IPsec gateway

    Get PDF
    In modern digital world, there is a strong demand for efficient data streams processing methods. One of application areas is cybersecurity — IPsec is a suite of protocol that adds security to communications at the IP level. This paper presents principles of high-performance FPGA architecture for data streams processing on example of IPsec gateway implementation. Efficiency of the proposed solution allows to use it in networks with data rates of several Gbit/s

    GPU-based Private Information Retrieval for On-Device Machine Learning Inference

    Full text link
    On-device machine learning (ML) inference can enable the use of private user data on user devices without revealing them to remote servers. However, a pure on-device solution to private ML inference is impractical for many applications that rely on embedding tables that are too large to be stored on-device. In particular, recommendation models typically use multiple embedding tables each on the order of 1-10 GBs of data, making them impractical to store on-device. To overcome this barrier, we propose the use of private information retrieval (PIR) to efficiently and privately retrieve embeddings from servers without sharing any private information. As off-the-shelf PIR algorithms are usually too computationally intensive to directly use for latency-sensitive inference tasks, we 1) propose novel GPU-based acceleration of PIR, and 2) co-design PIR with the downstream ML application to obtain further speedup. Our GPU acceleration strategy improves system throughput by more than 20×20 \times over an optimized CPU PIR implementation, and our PIR-ML co-design provides an over 5×5 \times additional throughput improvement at fixed model quality. Together, for various on-device ML applications such as recommendation and language modeling, our system on a single V100 GPU can serve up to 100,000100,000 queries per second -- a >100×>100 \times throughput improvement over a CPU-based baseline -- while maintaining model accuracy

    Efficient Design Strategies Based on the AES Round Function

    Get PDF
    We show several constructions based on the AES round function that can be used as building blocks for MACs and authenticated encryption schemes. They are found by a search of the space of all secure constructions based on an efficient design strategy that has been shown to be one of the most optimal among all the considered. We implement the constructions on the latest Intel\u27s processors. Our benchmarks show that on Intel Skylake the smallest construction runs at 0.188 c/B, while the fastest at only 0.125 c/B, i.e. five times faster than AES-128
    • …
    corecore