31 research outputs found

    Hardware Implementations for Symmetric Key Cryptosystems

    Get PDF
    The utilization of global communications network for supporting new electronic applications is growing. Many applications provided over the global communications network involve exchange of security-sensitive information between different entities. Often, communicating entities are located at different locations around the globe. This demands deployment of certain mechanisms for providing secure communications channels between these entities. For this purpose, cryptographic algorithms are used by many of today\u27s electronic applications to maintain security. Cryptographic algorithms provide set of primitives for achieving different security goals such as: confidentiality, data integrity, authenticity, and non-repudiation. In general, two main categories of cryptographic algorithms can be used to accomplish any of these security goals, namely, asymmetric key algorithms and symmetric key algorithms. The security of asymmetric key algorithms is based on the hardness of the underlying computational problems, which usually require large overhead of space and time complexities. On the other hand, the security of symmetric key algorithms is based on non-linear transformations and permutations, which provide efficient implementations compared to the asymmetric key ones. Therefore, it is common to use asymmetric key algorithms for key exchange, while symmetric key counterparts are deployed in securing the communications sessions. This thesis focuses on finding efficient hardware implementations for symmetric key cryptosystems targeting mobile communications and resource constrained applications. First, efficient lightweight hardware implementations of two members of the Welch-Gong (WG) family of stream ciphers, the WG(29,11)\left(29,11\right) and WG-1616, are considered for the mobile communications domain. Optimizations in the WG(29,11)\left(29,11\right) stream cipher are considered when the GF(229)GF\left(2^{29}\right) elements are represented in either the Optimal normal basis type-II (ONB-II) or the Polynomial basis (PB). For WG-1616, optimizations are considered only for PB representations of the GF(216)GF\left(2^{16}\right) elements. In this regard, optimizations for both ciphers are accomplished mainly at the arithmetic level through reducing the number of field multipliers, based on novel trace properties. In addition, other optimization techniques such as serialization and pipelining, are also considered. After this, the thesis explores efficient hardware implementations for digit-level multiplication over binary extension fields GF(2m)GF\left(2^{m}\right). Efficient digit-level GF(2m)GF\left(2^{m}\right) multiplications are advantageous for ultra-lightweight implementations, not only in symmetric key algorithms, but also in asymmetric key algorithms. The thesis introduces new architectures for digit-level GF(2m)GF\left(2^{m}\right) multipliers considering the Gaussian normal basis (GNB) and PB representations of the field elements. The new digit-level GF(2m)GF\left(2^{m}\right) single multipliers do not require loading of the two input field elements in advance to computations. This feature results in high throughput fast multiplication in resource constrained applications with limited capacity of input data-paths. The new digit-level GF(2m)GF\left(2^{m}\right) single multipliers are considered for both the GNB and PB. In addition, for the GNB representation, new architectures for digit-level GF(2m)GF\left(2^{m}\right) hybrid-double and hybrid-triple multipliers are introduced. The new digit-level GF(2m)GF\left(2^{m}\right) hybrid-double and hybrid-triple GNB multipliers, respectively, accomplish the multiplication of three and four field elements using the latency required for multiplying two field elements. Furthermore, a new hardware architecture for the eight-ary exponentiation scheme is proposed by utilizing the new digit-level GF(2m)GF\left(2^{m}\right) hybrid-triple GNB multipliers

    Hardware Implementations of Scalable and Unified Elliptic Curve Cryptosystem Processors

    Get PDF
    As the amount of information exchanged through the network grows, so does the demand for increased security over the transmission of this information. As the growth of computers increased in the past few decades, more sophisticated methods of cryptography have been developed. One method of transmitting data securely over the network is by using symmetric-key cryptography. However, a drawback of symmetric-key cryptography is the need to exchange the shared key securely. One of the solutions is to use public-key cryptography. One of the modern public-key cryptography algorithms is called Elliptic Curve Cryptography (ECC). The advantage of ECC over some older algorithms is the smaller number of key sizes to provide a similar level of security. As a result, implementations of ECC are much faster and consume fewer resources. In order to achieve better performance, ECC operations are often offloaded onto hardware to alleviate the workload from the servers' processors. The most important and complex operation in ECC schemes is the elliptic curve point multiplication (ECPM). This thesis explores the implementation of hardware accelerators that offload the ECPM operation to hardware. These processors are referred to as ECC processors, or simply ECPs. This thesis targets the efficient hardware implementation of ECPs specifically for the 15 elliptic curves recommended by the National Institute of Standards and Technology (NIST). The main contribution of this thesis is the implementation of highly efficient hardware for scalable and unified finite field arithmetic units that are used in the design of ECPs. In this thesis, scalability refers to the processor's ability to support multiple key sizes without the need to reconfigure the hardware. By doing so, the hardware does not need to be redesigned for the server to handle different levels of security. Unified refers to the ability of the ECP to handle both prime and binary fields. The resultant designs are valuable to the research community and industry, as a single hardware device is able to handle a wide range of ECC operations efficiently and at high speeds. Thus, improving the ability of network servers to handle secure transaction more quickly and improve productivity at lower costs

    A Brand-New, Area - Efficient Architecture for the FFT Algorithm Designed for Implementation of FPGAs

    Get PDF
    Elliptic curve cryptography, which is more commonly referred to by its acronym ECC, is widely regarded as one of the most effective new forms of cryptography developed in recent times. This is primarily due to the fact that elliptic curve cryptography utilises excellent performance across a wide range of hardware configurations in addition to having shorter key lengths. A High Throughput Multiplier design was described for Elliptic Cryptographic applications that are dependent on concurrent computations. A Proposed (Carry-Select) Division Architecture is explained and proposed throughout the whole of this work. Because of the carry-select architecture that was discussed in this article, the functionality of the divider has been significantly enhanced. The adder carry chain is reduced in length by this design by a factor of two, however this comes at the expense of additional adders and control. When it comes to designs for high throughput FFT, the total number of butterfly units that are implemented is what determines the amount of space that is needed by an FFT processor. In addition to blocks that may either add or subtract numbers, each butterfly unit also features blocks that can multiply numbers. The size of the region that is covered by these dual mathematical blocks is decided by the bit resolution of the models. When the bit resolution is increased, the area will also increase. The standard FFT approach requires that each stage contain  times as many butterfly units as the stage before it. This requirement must be met before moving on to the next stage

    Studies on high-speed hardware implementation of cryptographic algorithms

    Get PDF
    Cryptographic algorithms are ubiquitous in modern communication systems where they have a central role in ensuring information security. This thesis studies efficient implementation of certain widely-used cryptographic algorithms. Cryptographic algorithms are computationally demanding and software-based implementations are often too slow or power consuming which yields a need for hardware implementation. Field Programmable Gate Arrays (FPGAs) are programmable logic devices which have proven to be highly feasible implementation platforms for cryptographic algorithms because they provide both speed and programmability. Hence, the use of FPGAs for cryptography has been intensively studied in the research community and FPGAs are also the primary implementation platforms in this thesis. This thesis presents techniques allowing faster implementations than existing ones. Such techniques are necessary in order to use high-security cryptographic algorithms in applications requiring high data rates, for example, in heavily loaded network servers. The focus is on Advanced Encryption Standard (AES), the most commonly used secret-key cryptographic algorithm, and Elliptic Curve Cryptography (ECC), public-key cryptographic algorithms which have gained popularity in the recent years and are replacing traditional public-key cryptosystems, such as RSA. Because these algorithms are well-defined and widely-used, the results of this thesis can be directly applied in practice. The contributions of this thesis include improvements to both algorithms and techniques for implementing them. Algorithms are modified in order to make them more suitable for hardware implementation, especially, focusing on increasing parallelism. Several FPGA implementations exploiting these modifications are presented in the thesis including some of the fastest implementations available in the literature. The most important contributions of this thesis relate to ECC and, specifically, to a family of elliptic curves providing faster computations called Koblitz curves. The results of this thesis can, in their part, enable increasing use of cryptographic algorithms in various practical applications where high computation speed is an issue

    High-Performance Polynomial Multiplication Hardware Accelerators for KEM Saber and NTRU

    Get PDF
    Along the rapid development in building large-scale quantum computers, post-quantum cryptography (PQC) has drawn significant attention from research community recently as it is proven that the existing public-key cryptosystems are vulnerable to the quantum attacks. Following this direction, this paper presents a novel implementation of high-performance polynomial multiplication hardware accelerators for key encapsulation mechanism (KEM) Saber and NTRU, two PQC algorithms that are currently under the consideration by the National Institute of Standards and Technology (NIST) PQC standardization process. In total, we have carried out three layers of efforts to obtain the proposed work. First of all, we have proposed a new Dual Cyclic-Row Oriented Processing (Dual-CROP) technique to build a high-performance polynomial multiplication hardware accelerator for KEM Saber. Then, we have extended this hardware accelerator to NTRU with proper innovation and adjustment. Finally, through a series of complexity analysis and implementation based comparison, we have shown that the proposed hardware accelerators obtain better area-time complexities than known existing ones. It is expected that the outcome of this work can impact the ongoing NIST PQC standardization process and can be deployed further to construct efficient cryptoprocessors

    Hardware Architectures for Post-Quantum Cryptography

    Get PDF
    The rapid development of quantum computers poses severe threats to many commonly-used cryptographic algorithms that are embedded in different hardware devices to ensure the security and privacy of data and communication. Seeking for new solutions that are potentially resistant against attacks from quantum computers, a new research field called Post-Quantum Cryptography (PQC) has emerged, that is, cryptosystems deployed in classical computers conjectured to be secure against attacks utilizing large-scale quantum computers. In order to secure data during storage or communication, and many other applications in the future, this dissertation focuses on the design, implementation, and evaluation of efficient PQC schemes in hardware. Four PQC algorithms, each from a different family, are studied in this dissertation. The first hardware architecture presented in this dissertation is focused on the code-based scheme Classic McEliece. The research presented in this dissertation is the first that builds the hardware architecture for the Classic McEliece cryptosystem. This research successfully demonstrated that complex code-based PQC algorithm can be run efficiently on hardware. Furthermore, this dissertation shows that implementation of this scheme on hardware can be easily tuned to different configurations by implementing support for flexible choices of security parameters as well as configurable hardware performance parameters. The successful prototype of the Classic McEliece scheme on hardware increased confidence in this scheme, and helped Classic McEliece to get recognized as one of seven finalists in the third round of the NIST PQC standardization process. While Classic McEliece serves as a ready-to-use candidate for many high-end applications, PQC solutions are also needed for low-end embedded devices. Embedded devices play an important role in our daily life. Despite their typically constrained resources, these devices require strong security measures to protect them against cyber attacks. Towards securing this type of devices, the second research presented in this dissertation focuses on the hash-based digital signature scheme XMSS. This research is the first that explores and presents practical hardware based XMSS solution for low-end embedded devices. In the design of XMSS hardware, a heterogenous software-hardware co-design approach was adopted, which combined the flexibility of the soft core with the acceleration from the hard core. The practicability and efficiency of the XMSS software-hardware co-design is further demonstrated by providing a hardware prototype on an open-source RISC-V based System-on-a-Chip (SoC) platform. The third research direction covered in this dissertation focuses on lattice-based cryptography, which represents one of the most promising and popular alternatives to today\u27s widely adopted public key solutions. Prior research has presented hardware designs targeting the computing blocks that are necessary for the implementation of lattice-based systems. However, a recurrent issue in most existing designs is that these hardware designs are not fully scalable or parameterized, hence limited to specific cryptographic primitives and security parameter sets. The research presented in this dissertation is the first that develops hardware accelerators that are designed to be fully parameterized to support different lattice-based schemes and parameters. Further, these accelerators are utilized to realize the first software-harware co-design of provably-secure instances of qTESLA, which is a lattice-based digital signature scheme. This dissertation demonstrates that even demanding, provably-secure schemes can be realized efficiently with proper use of software-hardware co-design. The final research presented in this dissertation is focused on the isogeny-based scheme SIKE, which recently made it to the final round of the PQC standardization process. This research shows that hardware accelerators can be designed to offload compute-intensive elliptic curve and isogeny computations to hardware in a versatile fashion. These hardware accelerators are designed to be fully parameterized to support different security parameter sets of SIKE as well as flexible hardware configurations targeting different user applications. This research is the first that presents versatile hardware accelerators for SIKE that can be mapped efficiently to both FPGA and ASIC platforms. Based on these accelerators, an efficient software-hardwareco-design is constructed for speeding up SIKE. In the end, this dissertation demonstrates that, despite being embedded with expensive arithmetic, the isogeny-based SIKE scheme can be run efficiently by exploiting specialized hardware. These four research directions combined demonstrate the practicability of building efficient hardware architectures for complex PQC algorithms. The exploration of efficient PQC solutions for different hardware platforms will eventually help migrate high-end servers and low-end embedded devices towards the post-quantum era

    The Cost to Break SIKE: A Comparative Hardware-Based Analysis with AES and SHA-3

    Get PDF
    This work presents a detailed study of the classical security of the post-quantum supersingular isogeny key encapsulation (SIKE) protocol using a realistic budget-based cost model that considers the actual computing and memory costs that are needed for cryptanalysis. In this effort, we design especially-tailored hardware accelerators for the time-critical multiplication and isogeny computations that we use to model an ASIC-powered instance of the van Oorschot-Wiener (vOW) parallel collision search algorithm. We then extend the analysis to AES and SHA-3 in the context of the NIST post-quantum cryptography standardization process to carry out a parameter analysis based on our cost model. This analysis, together with the state-of-the-art quantum security analysis of SIKE, indicates that the current SIKE parameters offer higher practical security than currently believed, closing an open issue on the suitability of the parameters to match NIST\u27s security levels. In addition, we explore the possibility of using significantly smaller primes to enable more efficient and compact implementations with reduced bandwidth. Our improved cost model and analysis can be applied to other cryptographic settings and primitives, and can have implications for other post-quantum candidates in the NIST process

    Cryptography for Ultra-Low Power Devices

    Get PDF
    Ubiquitous computing describes the notion that computing devices will be everywhere: clothing, walls and floors of buildings, cars, forests, deserts, etc. Ubiquitous computing is becoming a reality: RFIDs are currently being introduced into the supply chain. Wireless distributed sensor networks (WSN) are already being used to monitor wildlife and to track military targets. Many more applications are being envisioned. For most of these applications some level of security is of utmost importance. Common to WSN and RFIDs are their severely limited power resources, which classify them as ultra-low power devices. Early sensor nodes used simple 8-bit microprocessors to implement basic communication, sensing and computing services. Security was an afterthought. The main power consumer is the RF-transceiver, or radio for short. In the past years specialized hardware for low-data rate and low-power radios has been developed. The new bottleneck are security services which employ computationally intensive cryptographic operations. Customized hardware implementations hold the promise of enabling security for severely power constrained devices. Most research groups are concerned with developing secure wireless communication protocols, others with designing efficient software implementations of cryptographic algorithms. There has not been a comprehensive study on hardware implementations of cryptographic algorithms tailored for ultra-low power applications. The goal of this dissertation is to develop a suite of cryptographic functions for authentication, encryption and integrity that is specifically fashioned to the needs of ultra-low power devices. This dissertation gives an introduction to the specific problems that security engineers face when they try to solve the seemingly contradictory challenge of providing lightweight cryptographic services that can perform on ultra-low power devices and shows an overview of our current work and its future direction
    corecore