769 research outputs found

    Cryptographic algorithm acceleration using CUDA enabled GPUs in typical system configurations

    Get PDF
    The need to encrypt data is becoming more and more necessary. As the size of datasets continues to grow, the speed of encryption must increase to keep up or it will become a bottleneck. CUDA GPUs have been shown to offer performance improvements versus conventional CPUs for some data-intensive problems. This thesis evaluates the applicability of CUDA GPUs in accelerating the execution of cryptographic algorithms, which are increasingly used for growing amounts of data and thus will require significantly faster encryption and hashing throughput. Specifically, the CUDA environment was used to implement and experiment with three distinct cryptographic algorithms -- AES, SHA-2, and Keccak -- in order to show the applicability for various cryptographic algorithm classes. They were implemented in a system that emulates the conditions present in a real world environment, and the effects of offloading these tasks from the CPU to the GPU were assessed. Speedups up to 2.6x relative to the CPU were seen for single-kernel AES, but SHA-2 and Keccak did not perform as well as on the GPU as on the CPU. Multi-kernel AES saw speedups over single-kernel AES up to 1.4x, 1.65x, and 1.8x for two, three, and four kernels, respectively. This translates to speedups between 3.6x and 4.7x over CPU implementations of AES. Introducing a CPU load had a minimal effect on throughput whereas a GPU load was seen to decrease throughput by as much as 4%. Overall, CUDA GPUs appear to have potential for improving encryption throughputs if a parallelizable algorithm is selected

    Improving Hardware Implementation of Cryptographic AES Algorithm and the Block Cipher Modes of Operation

    Get PDF
    With ever increasing Internet traffic, more business and financial transactions are being conducted online. This is even more so during these days of COVID-19 pandemic when traditional businesses such as traditional face to face educational systems have gone online requiring huge amount of data being exchanged over Internet. Increase in the volume of data sent over the Internet has also increased the security vulnerabilities such as challenging the confidentiality of data being sent over the Internet. Due to sheer volume, all data will need to be effectively encrypted. Due to increase in the volume of data, it is also important to have encryption/decryption functions to work at a higher speed to maintain the confidentiality of sensitive data. In this thesis, our goal is to enhance the hardware speed of encryption process of the standard AES scheme and its four variants such as AES-128, AES-192, AES-256 and new AES-512 and implement such functions on an FPGA. We also consider the FPGA implementation of different modes of AES operation. By employing parallelism and pipelining approach, we attempt to speed up various computational components of AES implementations using the Quartus II onto Intel’s FPGA. This approach shows improvement in the response speed, data throughput and latency

    Hardware Mechanisms for Efficient Memory System Security

    Full text link
    The security of a computer system hinges on the trustworthiness of the operating system and the hardware, as applications rely on them to protect code and data. As a result, multiple protections for safeguarding the hardware and OS from attacks are being continuously proposed and deployed. These defenses, however, are far from ideal as they only provide partial protection, require complex hardware and software stacks, or incur high overheads. This dissertation presents hardware mechanisms for efficiently providing strong protections against an array of attacks on the memory hardware and the operating system’s code and data. In the first part of this dissertation, we analyze and optimize protections targeted at defending memory hardware from physical attacks. We begin by showing that, contrary to popular belief, current DDR3 and DDR4 memory systems that employ memory scrambling are still susceptible to cold boot attacks (where the DRAM is frozen to give it sufficient retention time and is then re-read by an attacker after reboot to extract sensitive data). We then describe how memory scramblers in modern memory controllers can be transparently replaced by strong stream ciphers without impacting performance. We also demonstrate how the large storage overheads associated with authenticated memory encryption schemes (which enable tamper-proof storage in off-chip memories) can be reduced by leveraging compact integer encodings and error-correcting code (ECC) DRAMs – without forgoing the error detection and correction capabilities of ECC DRAMs. The second part of this dissertation presents Neverland: a low-overhead, hardware-assisted, memory protection scheme that safeguards the operating system from rootkits and kernel-mode malware. Once the system is done booting, Neverland’s hardware takes away the operating system’s ability to overwrite certain configuration registers, as well as portions of its own physical address space that contain kernel code and security-critical data. Furthermore, it prohibits the CPU from fetching privileged code from any memory region lying outside the physical addresses assigned to the OS kernel and drivers. This combination of protections makes it extremely hard for an attacker to tamper with the kernel or introduce new privileged code into the system – even in the presence of software vulnerabilities. Neverland enables operating systems to reduce their attack surface without having to rely on complex integrity monitoring software or hardware. The hardware mechanisms we present in this dissertation provide building blocks for constructing a secure computing base while incurring lower overheads than existing protections.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/147604/1/salessaf_1.pd

    High throughput FPGA Implementation of Advanced Encryption Standard Algorithm

    Get PDF
     The growth of computer systems and electronic communications and transactions has meant that the need for effective security and reliability of data communication, processing and storage is more important than ever. In this context, cryptography is a high priority research area in engineering. The Advanced Encryption Standard (AES) is a symmetric-key criptographic algorithm for protecting sensitive information and is one of the most widely secure and used algorithm today. High-throughput, low power and compactness have always been topic of interest for implementing this type of algorithm. In this paper, we are interested on the development of high throughput architecture and implementation of AES algorithm, using the least amount of hardware possible. We have adopted a pipeline approach in order to reduce the critical path and achieve competitive performances in terms of throughput and efficiency. This approach is effectively tested on the AES S-Box substitution. The latter is a complex transformation and the key point to improve architecture performances. Considering the high delay and hardware required for this transformation, we proposed 7-stage pipelined S-box by using composite field in order to deal with the critical path and the occupied area resources. In addition, efficient AES key expansion architecture suitable for our proposed pipelined AES is presented. The implementation had been successfully done on Virtex-5 XC5VLX85 and Virtex-6 XC6VLX75T Field Programmable Gate Array (FPGA) devices using Xilinx ISE v14.7. Our AES design achieved a data encryption rate of 108.69 Gbps and used only 6361 slices ressource. Compared to the best previous work, this implementation improves data throughput by 5.6% and reduces the used slices to 77.69%

    FPGA-Augmented Secure Crash-Consistent Non-Volatile Memory

    Get PDF
    Emerging byte-addressable Non-Volatile Memory (NVM) technology, although promising superior memory density and ultra-low energy consumption, poses unique challenges to achieving persistent data privacy and computing security, both of which are critically important to the embedded and IoT applications. Specifically, to successfully restore NVMs to their working states after unexpected system crashes or power failure, maintaining and recovering all the necessary security-related metadata can severely increase memory traffic, degrade runtime performance, exacerbate write endurance problem, and demand costly hardware changes to off-the-shelf processors. In this thesis, we summarize and expand upon two of our innovative works, ARES and HERMES, to design a new FPGA-assisted processor-transparent security mechanism aiming at efficiently and effectively achieving all three aspects of a security triad—confidentiality, integrity, and recoverability—in modern embedded computing. Given the growing prominence of CPU-FPGA heterogeneous computing architectures, ARES leverages FPGA\u27s hardware reconfigurability to offload performance-critical and security-related functions to the programmable hardware without microprocessors\u27 involvement. In particular, recognizing that the traditional Merkle tree caching scheme cannot fully exploit FPGA\u27s parallelism due to its sequential and recursive function calls, ARES proposed a new Merkle tree cache architecture and a novel Merkle tree scheme which flattened and reorganized the computation in the traditional Merkle tree verification and update processes to fully exploit the parallel cache ports and to fully pipeline time-consuming hashing operations. To further optimize the throughput of BMT operations, HERMES proposed an optimally efficient dataflow architecture by processing multiple outstanding counter requests simultaneously. Specifically, HERMES explored and addressed three technical challenges when exploiting task-level parallelism of BMT and proposed a speculative execution approach with both low latency and high throughput

    A parallel block-based encryption schema for digital images using reversible cellular automata

    Get PDF
    AbstractWe propose a novel images encryption schema based on reversible one-dimensional cellular automata. Contrasting to the sequential operating mode of several existing approaches, the proposed one is fully parallelizable since the encryption/decryption tasks can be executed using multiple processes running independently for the same single image. The parallelization is made possible by defining a new RCA-based construction of an extended pseudorandom permutation that takes a nonce as a supplementary parameter. The defined PRP exploit the chaotic behavior and the high initial condition's sensitivity of the RCAs to ensure perfect cryptographic security properties. Results of various experiments and analysis show that high security and execution performances can be achieved using the approach, and furthermore, it provides the ability to perform a selective area decryption since any part of the ciphered-image can be deciphered independently from others, which is very useful for real time applications
    • …
    corecore