Search CORE

14 research outputs found

Single-Precision and Double-Precision Merged Floating-Point Multiplication and Addition Units on FPGA

Author: Zhang Hao
Publication venue: 'University of Saskatchewan Library'
Publication date: 11/02/2020
Field of study

Floating-point (FP) operations defined in IEEE 754-2008 Standard for Floating-Point Arithmetic can provide wider dynamic range and higher precision than fixed-point operations. Many scientific computations and multimedia applications adopt FP operations. Among all the FP operations, addition and multiplication are the most frequent operations. In this thesis, the single-precision (SP) and double-precision (DP) merged FP multiplier and FP adder architectures are proposed. The proposed efficient iterative FP multiplier is designed based on the Karatsuba algorithm and implemented with the pipelined architecture. It can accomplish two parallel SP multiplication operations in one iteration with a latency of 6 clock cycles or one DP multiplication operation in two iterations with a latency of 9 clock cycles. Implemented on Xilinx Virtex-5 (xc5vlx155ff1760-3) FPGA device, the proposed multiplier runs at 348 MHz using 6 DSP48E blocks, 1117 LUTs, and 1370 FFs. Compared to previous FPGA based multiple-precision FP multiplier, the proposed designs runs at 4% faster clock frequency with reduction of 33% of DSP blocks, 17% latency for SP multiplication, and 28% latency for DP multiplication. The proposed high performance FP adder is designed based one the two-path FP addition algorithm. With fully pipelined architecture, the proposed adder can accomplish one DP or two parallel SP addition/subtraction operations in 6 clock cycles. The proposed adder architecture is implemented on both Altera and Xilinx 65nm process FPGA devices. The proposed adder can run up to 336 MHz with 1694 FFs, 1420 LUTs on Xilinx Virtex-5 (xc5vlx155ff1760-3) FPGA device. Compared to the combination of one DP and two SP architecture built with Xilinx FP operator, the proposed adder has 11.3% faster clock frequency. On Altera Stratix-III (EP3SL340F1760C2) FPGA device, the maximum clock frequency of the proposed adder can reach 358 MHz and 1686 ALUTs and 1556 registers are occupied. The proposed adder is 11.6% faster than the combination of one DP and two SP architecture built with Altera FP megafunction. For the reference of other researchers, the implementation results of the proposed FP multiplier and FP adder on the latest Xilinx Virtex-7 device and Altera Arria 10 device are also provided

University of Saskatchewan Research Archive

Efficient Implementation of Elliptic Curve Cryptography on FPGAs

Author: Shokrollahi Jamshid
Publication venue: Universitäts- und Landesbibliothek Bonn
Publication date
Field of study

This work presents the design strategies of an FPGA-based elliptic curve co-processor. Elliptic curve cryptography is an important topic in cryptography due to its relatively short key length and higher efficiency as compared to other well-known public key crypto-systems like RSA. The most important contributions of this work are: - Analyzing how different representations of finite fields and points on elliptic curves effect the performance of an elliptic curve co-processor and implementing a high performance co-processor. - Proposing a novel dynamic programming approach to find the optimum combination of different recursive polynomial multiplication methods. Here optimum means the method which has the smallest number of bit operations. - Designing a new normal-basis multiplier which is based on polynomial multipliers. The most important part of this multiplier is a circuit of size

O(n \log n)

for changing the representation between polynomial and normal basis

bonndoc – Der Publikationsserver der Universität Bonn

Chiffrement authentifié sur FPGAs de la partie reconfigurable à la partie static

Author: Moussa Ali Abdellatif Karim
Publication venue: HAL CCSD
Publication date: 07/10/2014
Field of study

Communication systems need to access, store, manipulate, or communicate sensitive information. Therefore, cryptographic primitives such as hash functions and block ciphers are deployed to provide encryption and authentication. Recently, techniques have been invented to combine encryption and authentication into a single algorithm which is called Authenticated Encryption (AE). Combining these two security services in hardware produces better performance compared to two separated algorithms since authentication and encryption can share a part of the computation. Because of combining the programmability with the performance ofcustom hardware, FPGAs become more common as an implementation target for such algorithms. The first part of this thesis is devoted to efficient and high-speed FPGA-based architectures of AE algorithms, AES-GCM and AEGIS-128, in order to be used in the reconfigurable part of FPGAs to support security services of communication systems. Our focus on the state of the art leads to the introduction of high-speed architectures for slow changing keys applications like Virtual Private Networks (VPNs). Furthermore, we present an efficient method for implementing the GF(2¹²⁸) multiplier, which is responsible for the authentication task in AES-GCM, to support high-speed applications. Additionally, an efficient AEGIS-128is also implemented using only five AES rounds. Our hardware implementations were evaluated using Virtex-5 and Virtex-4 FPGAs. The performance of the presented architectures (Thr./Slices) outperforms the previously reported ones.The second part of the thesis presents techniques for low cost solutions in order to secure the reconfiguration of FPGAs. We present different ranges of low cost implementations of AES-GCM, AES-CCM, and AEGIS-128, which are used in the static part of the FPGA in order to decrypt and authenticate the FPGA bitstream. Presented ASIC architectures were evaluated using 90 and 65 nm technologies and they present better performance compared to the previous work.Les systèmes de communication ont besoin d'accéder, stocker, manipuler, ou de communiquer des informations sensibles. Par conséquent, les primitives cryptographiques tels que les fonctions de hachage et le chiffrement par blocs sont déployés pour fournir le cryptage et l'authentification. Récemment, des techniques ont été inventés pour combiner cryptage et d'authentification en un seul algorithme qui est appelé authentifiés Encryption (AE). La combinaison de ces deux services de sécurité dans le matériel de meilleures performances par rapport aux deux algorithmes séparés puisque l'authentification et le cryptage peuvent partager une partie du calcul. En raison de la combinaison de la programmation de l'exécution de matériel personnalisé, FPGA deviennent plus communs comme cible d'une mise en œuvre de ces algorithmes. La première partie de cette thèse est consacrée aux architectures d'algorithmes AE, AES-GCM et AEGIS-128 à base de FPGA efficaces et à grande vitesse, afin d'être utilisé dans la partie reconfigurable FPGA pour soutenir les services de sécurité des systèmes de communication. Notre focalisation sur l'état de l'art conduit à la mise en place d'architectures à haute vitesse pour les applications lentes touches changeantes comme les réseaux privés virtuels (VPN). En outre, nous présentons un procédé efficace pour mettre en œuvre le GF(2¹²⁸) multiplicateur, qui est responsable de la tâche d'authentification en AES-GCM, pour supporter les applications à grande vitesse. En outre, un système efficace AEGIS-128 est également mis en œuvre en utilisant seulement cinq tours AES. Nos réalisations matérielles ont été évaluées à l'aide Virtex-5 et Virtex-4 FPGA. La performance des architectures présentées (Thr. / Parts) surpasse ceux signalés précédemment. La deuxième partie de la thèse présente des techniques pour des solutions à faible coût afin de garantir la reconfiguration du FPGA. Nous présentons différentes gammes de mises en œuvre à faible coût de AES-GCM, AES-CCM, et AEGIS-128, qui sont utilisés dans la partie statique du FPGA afin de décrypter et authentifier le bitstream FPGA. Architectures ASIC présentées ont été évaluées à l'aide de 90 et 65 technologies nm et présentent de meilleures performances par rapport aux travaux antérieurs

Thèses en Ligne

Design And Implementation Of Rsa Cryptosystem Using Partially Interleaved Modular Karatsuba-ofman Multiplier

Author: Arış Ahmet
Publication venue: 'Nara Institute of Science and Technology'
Publication date: 06/01/2013
Field of study

Tez (Yüksek Lisans) -- İstanbul Teknik Üniversitesi, Fen Bilimleri Enstitüsü, 2012Thesis (M.Sc.) -- İstanbul Technical University, Institute of Science and Technology, 2012Kriptografinin, yani şifreleme biliminin önemi gün geçtikçe artmaktadır. Kullanılan teknoloji ne olursa olsun güvenli iletişim her zaman en başta gelen ihtiyaçlardan birisi olacaktır. Günümüzde şifreleme, sistemlere kullanıcı hesabıyla giriş yapılıyorken, internetten herhangi bir hizmet ya da ürün satın alınıyorken, iletişim araçları kullanılıyorken, araçların kapıları uzaktan kilitlenip açılıyorken ve daha birçok yerde kullanılmaktadır. Kullanım alanına ve güvenlik ihtiyacının niteliğine göre değişik şifreleme algoritmaları farklı teknolojilerle karşımıza çıkmaktadır. Bu algoritmaların bir tanesi de Rivest-Shamir-Adleman(RSA) şifreleme sistemidir. RSA bankacılık başta olmak üzere birçok sektörde şifreleme ve sayısal imza işlemleri için sıklıkla kullanılmaktadır. RSA algoritması gücünü çok büyük sayıların asal çarpanlarına ayrılmalarındaki zorluktan almaktadır. Özetle çok büyük sayılarla yapılan modüler üs alma işlemlerinden oluşmaktadır. Modüler üs alma işlemleri de özünde modüler çarpma işlemlerinden ibaret olduğu için hızlı bir RSA gerçeklemesi ancak hızlı modüler çarpma işlemi yapan bir tasarımla mümkün olmaktadır. Güvenlik ve hız gibi sebeplerden ötürü RSA kriptosistemi genellikle modüler üs alma işleminin yazılımda gerçeklenmesi ve modüler çarpma işlemlerinin de donanımda tasarlanan özel bloklarla yapılması yoluyla gerçeklenir. Modüler üs alma işlemi için birçok yöntem mevcuttur. Bu yöntemler modüler üs alma işlemi esnasında yapılan modüler çarpma işlemi sayısını değişik yöntemlerle en aza indirmeye çalışırlar. Modüler çarpma işlemi için de bilim dünyasında epeyi çalışmalar yapılmıştır. Modüler çarpma, önce çarpıp sonra indirgeme yapma ya da çarpma ve indirgeme işlemlerini iç-içe yapma gibi iki yöntemle mümkündür. Alan kısıtlamaları sebebiyle çoğunlukla ikinci metot tercih edilmektedir. İndirgeme işleminin uygulanma yönüne göre modüler çarpma algoritmaları soldan sağa doğru işleyenler ve sağdan sola doğru işleyenler olmak üzere ikiye ayrılır. Soldan sağa doğru işleyen yöntemlerin en bilindik örnekleri Blakley ve Barrett algoritmalarıdır. Sağdan sola doğru indirgeme yapan sınıfa ise tek örnek Montgomery yöntemidir. Hızlı çarpma yapan Karatsuba-Ofman, Schönhage-Strassen gibi yöntemler olsa da bu hızlı çarpıcılar indirgeme algoritmalarıyla birleştirilememektedirler. Bu sorun paralel çalışmaya izin vermeyen indirgeme yöntemlerinden kaynaklanmaktadır. Ancak Kaihara ve Takagi tarafından bilim dünyasına sunulan İki Parçalı Modüler çarpma yöntemi bu soruna bir nebze de olsa çözüm bulmaktadır. Bu indirgeme metodu Montgomery algoritmasının sahip olduğu bir özelliği kullanarak modüler çarpmadaki çarpanı ikiye ayırmakta ve böylelikle Blakley ve Montgomery paralel olarak çalışabilmektedir. Hızlı çarpma algoritmalarından birisi olan Karatsuba-Ofman yöntemiyle İki Parçalı indirgemeyi birleştiren ilk çalışma Gökay Saldamlı tarafından ortaya atılmıştır. İki Parçalı Örgü Karatsuba-Ofman çarpıcısı iki parçalı indirgemeyi Karatsuba-Ofman rekürsif çarpma yönteminin en üst katmanında birleştirmektedir. Bu yeni yöntemin gerçeklenmesinde Montgomery çarpıcısına, Blakley modüler çarpıcısına ve standart çarpma yapan bloklara ihtiyaç vardır. Ancak bu algoritma Montgomery ve Blakley modüllerinde literatürdeki daha önceki gerçeklemelerde bulunmayan değerlerin de hesaplanmasına gereksinim duymaktadır. Bu tezde İki Parçalı Modüler Örgü Karatsuba-Ofman çarpıcısının donanımda gerçeklenmesine ait iki çalışma ve bu tasarımlardan birisi kullanılarak gerçeklenen RSA kriptosistemi anlatılmaktadır. Bu çalışmalardan ilki çarpanı birer bit işleyen gerçeklemedir. FPGA teknolojisinde gerçeklenmiştir. İkinci gerçekleme ise ilk tasarımdaki eksikleri kapatan ve daha hızlı bir modüler çarpma için kodlama yöntemleri, daha fazla sayıda bit işleme, donanımın çalışma frekansını arttırmak için en büyük gecikmeye sahip yolu kontrol sinyallerini kullanarak optimize etmeye çalışan bir ASIC gerçeklemesidir. Bu tezdeki donanım tasarımları İki Parçalı Örgü Karatsuba-Ofman çarpma yönteminin ilk gerçeklemeleridirler. Montgomery ve Blakley algoritmaları bu yeni yöntem için yeniden düzenlenmiştir. Tasarımların ikisi için de Maple’da kütüphaneler oluşturulmuş ve iki tasarım da Maple ortamında donanımla aynı yapıda gerçeklenmiş, gerekli testler yapılmış ve simülatörlerden gelen sonuçlarla yazılım gerçeklemesinden gelen sonuçlar karşılaştırılarak tasarımların doğru çalıştığı kanıtlanmıştır. İkinci modüler çarpma gerçeklemesi RSA kriptosistemi içinde modüler çarpıcı olarak kullanılmış ve piyasadaki RSA kırmıklarıyla karşılaştırılabilir sonuçlara ulaşılmıştır.Importance of cryptography is becoming more and more important day by day. Secure communication will always be a crucial need independent from the technology in use. Applications of cryptography can be seen in online banking, purchases of goods and services by means of credit cards, ID cards, remote lock and start systems for cars, telecommunication, and in many more places. Different cryptosystems are employed according to different needs and different security levels. Rivest-Shamir-Adleman(RSA) Algorithm is one of the most popular cryptosystem that is used in many sectors like banking. It takes its strength from factorization of very large integers. Briefly, it consists of modular exponentiations with large integers which are 1024-bit or 2048-bit numbers. As modular exponentiation operations are fundamentally composed of modular multiplications, designing a fast RSA cryptosystem becomes possible only with a fast modular multiplier implementation. Due to security and speed issues, modular exponentiation in RSA is implemented in software and modular multiplications are carried out in modular multipliers which are implemented in hardware. Many methods exist in the literature in order to perform modular exponentiation. These methods try to reduce the total number of multiplications that are required for a modular exponetiation operation. In addition to modular exponentiation algorithms, there are several techniques to do modular multiplication as well. Modular multiplication may be carried out either by multiplying first and performing reduction later on, or by interleaving multiplication and reduction stages. In order to reach compact hardware designs the latter approach is utilized. According to the direction of reduction operation, modular multiplication algorithms may be classified into two groups, which are algorithms reducing from left-to-right and from right-to-left. The most well known examples of left-to-right approach are Blakley and Barrett algorithms, whereas Montgomery multiplication is the only member of the right-to-left modular multiplication. Although fast multiplication algorithms, such as Karatsuba-Ofman(KO), Schönhage-Strassen exist, these multiplication methods can not be interleaved with reduction algorithms. This is caused from the reduction approaches not allowing parallel processing. However, Bipartite Modular Multiplication(BMM) method introduced by Kaihara and Takagi proposes a partial solution to the parallel reduction problem. This reduction method modifies a feature of Montgomery algorithm. This modification separates the multiplier bits into two halves so that a product can be reduced from left and right simultaneously without a dependency issue. The first method which combines Karatsuba-Ofman multiplication with bipartite reduction was proposed by Gökay Saldamlı. His algoritm, namely Partially Interleaved Modular Karatsuba-Ofman Multiplication, interleaves KO multiplier with the bipartite reduction on the uppermost layer of KO’s recursion. Implementation of this new method requires Montgomery multiplier, Blakley multiplier and standard integer multipliers. However, for this approach, both Montgomery and Blakley multipliers are needed to be designed in a way that, they compute not only the modular multiplication result, but also quotient values, what is somewhat different than existing implementations. In this thesis, two hardware implementations of Partially Interleaved Modular KO Multiplication and an RSA implementation utilizing the designed multiplier are proposed. The first design is Radix-2 implementation on FPGA technology. The second hardware implementation explores the design methodologies in order to reach a faster modular multiplier. These design methodologies include employing high radices, optimization of critical path according to the effect of control signals and analyzing parameter dependencies in Partially Interleaved Modular KO Multiplier and scheduling jobs effectively. Improved design was implemented on ASIC technology using 90 nm TSMC standard cell libraries in Design Compiler. Hardware implementations proposed in this thesis are the first hardware implementations of Partially Interleaved Modular KO Multiplication method. Montgomery and Blakley multiplication algorithms were modified in order to produce desired results. Maple libraries which emulate the operation of hardware building blocks were written and both hardware implementations were firstly implemented in Maple. Tests were run with random input vectors and when correct operation of Maple implementations were verified, hardwares were described in VHDL and implemented using FPGA and ASIC design tools respectively. The second implementation of Partially Interleaved Modular KO multiplier was used as a modular multiplier for RSA cryptosystem and very promising results which are comparable with commercial RSA chips were achieved.Yüksek LisansM.Sc

Ulusal Üniversitelerarası Açık Erişim Sistemi - İstanbul Teknik Üniversitesi

Secure architectures for pairing based public key cryptography

Author: Pan Weibo
Publication venue: 'University College Cork'
Publication date: 01/01/2013
Field of study

Along with the growing demand for cryptosystems in systems ranging from large servers to mobile devices, suitable cryptogrophic protocols for use under certain constraints are becoming more and more important. Constraints such as calculation time, area, efficiency and security, must be considered by the designer. Elliptic curves, since their introduction to public key cryptography in 1985 have challenged established public key and signature generation schemes such as RSA, offering more security per bit. Amongst Elliptic curve based systems, pairing based cryptographies are thoroughly researched and can be used in many public key protocols such as identity based schemes. For hardware implementions of pairing based protocols, all components which calculate operations over Elliptic curves can be considered. Designers of the pairing algorithms must choose calculation blocks and arrange the basic operations carefully so that the implementation can meet the constraints of time and hardware resource area. This thesis deals with different hardware architectures to accelerate the pairing based cryptosystems in the field of characteristic two. Using different top-level architectures the hardware efficiency of operations that run at different times is first considered in this thesis. Security is another important aspect of pairing based cryptography to be considered in practically Side Channel Analysis (SCA) attacks. The naively implemented hardware accelerators for pairing based cryptographies can be vulnerable when taking the physical analysis attacks into consideration. This thesis considered the weaknesses in pairing based public key cryptography and addresses the particular calculations in the systems that are insecure. In this case, countermeasures should be applied to protect the weak link of the implementation to improve and perfect the pairing based algorithms. Some important rules that the designers must obey to improve the security of the cryptosystems are proposed. According to these rules, three countermeasures that protect the pairing based cryptosystems against SCA attacks are applied. The implementations of the countermeasures are presented and their performances are investigated

Irish Universities

Cork Open Research Archive

Hardware Implementation of Barrett Reduction Exploiting Constant Multiplication

Author: Roma Crystal Andrea
Publication venue: 'University of Waterloo'
Publication date: 25/09/2019
Field of study

The efficient realization of an Elliptic Curve Cryptosystem is contingent on the efficiency of scalar multiplication. These systems can be improved by optimizing the underlying finite field arithmetic operations which are the most costly such as modular reduction. There are elliptic curves over prime fields for which very efficient reduction formulas are possible due to the special structure of the moduli. For prime moduli of arbitrary form, however, use of general reduction formulas, such as Barrett's reduction algorithm, are necessary. Barrett's algorithm performs modular reduction efficiently by using multiplication as opposed to division, an operation which is generally expensive to realize in hardware. We note, however, that when an Elliptic Curve Cryptosystem is defined over a fixed prime field, all multiplication steps in Barrett's scheme can be realized through constant multiplications; this allows for further optimization. In this thesis, we study the influence using constant multipliers has on four different Barrett reduction variants targeting the Virtex-7 (xc7vx485tffg1157-1). We use the FloPoCo core generator to construct constant multiplier implementations for the different multiplication steps required in each scheme. Then, we create a hybrid constant multiplier circuit based on Karatsuba multiplication which uses smaller FloPoCo-generated base multipliers. It is shown that for certain multiplication steps, the hybrid design provides an improvement in the resource utilization of the constant multiplier circuit at the cost of an increase in the critical path delay. A performance comparison of different Barrett reduction circuits using different combinations of constant multiplier architectures is presented. Additionally, a fully pipelined implementation of each Barrett reduction variant is also designed capable of achieving operational frequencies in the range of 496-504MHz depending on the Barrett scheme considered. With the addition of a 256-bit pipelined Karatsuba multiplier circuit, we also present a compact and fully pipelined modular multiplier based on these Barrett architectures capable of achieving very high throughput compared to others in the literature without the use of embedded multipliers

University of Waterloo's Institutional Repository

Design and analysis of an FPGA-based, multi-processor HW-SW system for SCC applications

Author: Fitzgerald Andrew
Publication venue: RIT Scholar Works
Publication date: 01/01/2010
Field of study

The last 30 years have seen an increase in the complexity of embedded systems from a collection of simple circuits to systems consisting of multiple processors managing a wide variety of devices. This ever increasing complexity frequently requires that high assurance, fail-safe and secure design techniques be applied to protect against possible failures and breaches. To facilitate the implementation of these embedded systems in an efficient way, the FPGA industry recently created new families of devices. New features added to these devices include anti-tamper monitoring, bit stream encryption, and optimized routing architectures for physical and functional logic partition isolation. These devices have high capacities and are capable of implementing processors using their reprogrammable logic structures. This allows for an unprecedented level of hardware and software interaction within a single FPGA chip. High assurance and fail-safe systems can now be implemented within the reconfigurable hardware fabric of an FPGA, enabling these systems to maintain flexibility and achieve high performance while providing a high level of data security. The objective of this thesis was to design and analyze an FPGA-based system containing two isolated, softcore Nios processors that share data through two crypto-engines. FPGA-based single-chip cryptographic (SCC) techniques were employed to ensure proper component isolation when the design is placed on a device supporting the appropriate security primitives. Each crypto-engine is an implementation of the Advanced Encryption Standard (AES), operating in Galois/Counter Mode (GCM) for both encryption and authentication. The features of the microprocessors and architectures of the AES crypto-engines were varied with the goal of determining combinations which best target high performance, minimal hardware usage, or a combination of the two

RIT Scholar Works

Low-Latency ECDSA Signature Verification - A Road Towards Safer Traffic -

Author: Miroslav Knezevic
Peter Rombouts
Ventzislav Nikov
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 22/10/2014
Field of study

Car-to-car and Car-to-Infrastructure messages exchanged in Intelligent Transportation Systems can reach reception rates up to and over 1000 messages per second. As these messages contain ECDSA signatures this puts a very heavy load onto the verification hardware. In fact the load is so high that currently it can only be achieved by implementations running on high end CPUs and FPGAs. These implementations are far from cost-effective nor energy efficient. In this paper we present an ASIC implementation of a dedicated ECDSA verification engine that can reach verification rates of up to 27.000 verifications per second using only 1.034 kGE

Cryptology ePrint Archive

Reliable and High-Performance Hardware Architectures for the Advanced Encryption Standard/Galois Counter Mode

Author: Mozaffari-Kermani Mehran
Publication venue: Scholarship@Western
Publication date: 27/06/2011
Field of study

The high level of security and the fast hardware and software implementations of the Advanced Encryption Standard (AES) have made it the first choice for many critical applications. Since its acceptance as the adopted symmetric-key algorithm, the AES has been utilized in various security-constrained applications, many of which are power and resource constrained and require reliable and efficient hardware implementations. In this thesis, first, we investigate the AES algorithm from the concurrent fault detection point of view. We note that in addition to the efficiency requirements of the AES, it must be reliable against transient and permanent internal faults or malicious faults aiming at revealing the secret key. This reliability analysis and proposing efficient and effective fault detection schemes are essential because fault attacks have become a serious concern in cryptographic applications. Therefore, we propose, design, and implement various novel concurrent fault detection schemes for different AES hardware architectures. These include different structure-dependent and independent approaches for detecting single and multiple stuck-at faults using single and multi-bit signatures. The recently standardized authentication mode of the AES, i.e., Galois/Counter Mode (GCM), is also considered in this thesis. We propose efficient architectures for the AES-GCM algorithm. In this regard, we investigate the AES algorithm and we propose low-complexity and low-power hardware implementations for it, emphasizing on its nonlinear transformation, i.e., SubByes (S-boxes). We present new formulations for this transformation and through exhaustive hardware implementations, we show that the proposed architectures outperform their counterparts in terms of efficiency. Moreover, we present parallel, high-performance new schemes for the hardware implementations of the GCM to improve its throughput and reduce its latency. The performance of the proposed efficient architectures for the AES-GCM and their fault detection approaches are benchmarked using application-specific integrated circuit (ASIC) and field-programmable gate array (FPGA) hardware platforms. Our comparison results show that the proposed hardware architectures outperform their existing counterparts in terms of efficiency and fault detection capability

Scholarship@Western

Optimized hardware implementations of cryptography algorithms for resource-constraint IoT devices and high-speed applications

Author: Shahbazi Karim
Publication venue: 'University of Saskatchewan Library'
Publication date: 02/09/2022
Field of study

The advent of technologies, including the Internet and smartphones, has made people’s lives easier. Nowadays, people get used to digital applications for e-business, communicating with others, and sending or receiving sensitive messages. Sending secure data across the private network or the Internet is an open concern for every person. Cryptography plays an important role in privacy, security, and confidentiality against adversaries. Public-key cryptography (PKC) is one of the cryptography techniques that provides security over a large network, such as the Internet of Things (IoT). The classical PKCs, such as Elliptic Curve Cryptography (ECC) and Rivest-Shamir-Adleman (RSA), are based on the hardness of certain number theoretic problems. According to Shor’s algorithm, these algorithms can be solved very efficiently on a quantum computer, and cryptography algorithms will be insecure and weak as quantum computers increase in number. Based on NIST, Lattice-based cryptography (LBC) is one of the accepted quantum-resistant public-key cryptography. Different variants of LBC include Learning With Error (LWE), Ring Learning With Error (Ring-LWE), Binary Ring Learning with Error (Ring-Bin LWE), and etc. AES is also one of the secure cryptography algorithm that has been widely used in different applications and platforms. Also, AES-256 is secure against quantum attack. It is very important to design a crypto-system based on the need and application. In general, each network has three different layers; cloud, edge, and end-node. The cloud and edge layer require to have a high-speed crypto-system, as it is used in high-traffic application to encrypt and decrypt data. Unfortunately, most of the end-node devices are resource-constraint and do not have enough area for security guard. Providing end-to-end security is vital for every network. To mitigate this issue, designing and implementing a lightweight cryto-system for resource-constraint devices is necessary. In this thesis, a high-throughput FPGA implementation of AES algorithm for high-traffic edge applications is introduced. To achieve this goal, some part of the algorithm has been modified to balance the latency. Inner and outer pipelining techniques and loop-unrolling have been employed. The proposed high-speed implementation of AES achieves a throughput of 79.7Gbps, FPGA efficiency of 13.3 Mbps/slice, and frequency of 622.4MHz. Compared to the state-of-the-art work, the proposed design has improved data throughput by 8.02% and FPGA-Eff by 22.63%. Moreover, a lightweight architecture of AES for resource-constraint devices is designed and implemented on FPGA and ASIC. Each module of the architecture is specified in which occupied less area; and some units are shared among different phases. To reduce the power consumption clock gating technique is applied. Application-specific integrated circuit (ASIC) implementation results show a respective improvement in the area over the previous similar works from 35% to 2.4%. Based on the results and NIST report, the proposed design is a suitable crypto-system for tiny devices and can be supplied by low-power devices. Furthermore, two lightweight crypto-systems based on Binary Ring-LWE are presented for IoT end-node devices. For one of them, a novel column-based multiplication is introduced. To execute the column-based multiplication only one register is employed to store the intermediate results. The multiplication unit for the other Binary Ring-LWE design is optimized in which the multiplication is executed in less clock cycles. Moreover, to increase the security for end-node devices, the fault resiliency architecture has been designed and applied to the architecture of Binary Ring-LWE. Based on the implementation results and NIST report, the proposed Binary Ring-LWE designs is a suitable crypto-system form resource-constraint devices

University of Saskatchewan Research Archive