    Design and analysis of an FPGA-based, multi-processor HW-SW system for SCC applications

    The last 30 years have seen an increase in the complexity of embedded systems from a collection of simple circuits to systems consisting of multiple processors managing a wide variety of devices. This ever increasing complexity frequently requires that high assurance, fail-safe and secure design techniques be applied to protect against possible failures and breaches. To facilitate the implementation of these embedded systems in an efficient way, the FPGA industry recently created new families of devices. New features added to these devices include anti-tamper monitoring, bit stream encryption, and optimized routing architectures for physical and functional logic partition isolation. These devices have high capacities and are capable of implementing processors using their reprogrammable logic structures. This allows for an unprecedented level of hardware and software interaction within a single FPGA chip. High assurance and fail-safe systems can now be implemented within the reconfigurable hardware fabric of an FPGA, enabling these systems to maintain flexibility and achieve high performance while providing a high level of data security. The objective of this thesis was to design and analyze an FPGA-based system containing two isolated, softcore Nios processors that share data through two crypto-engines. FPGA-based single-chip cryptographic (SCC) techniques were employed to ensure proper component isolation when the design is placed on a device supporting the appropriate security primitives. Each crypto-engine is an implementation of the Advanced Encryption Standard (AES), operating in Galois/Counter Mode (GCM) for both encryption and authentication. The features of the microprocessors and architectures of the AES crypto-engines were varied with the goal of determining combinations which best target high performance, minimal hardware usage, or a combination of the two

    Parallel Multiplier Designs for the Galois/Counter Mode of Operation

    The Galois/Counter Mode of Operation (GCM), recently standardized by NIST, simultaneously authenticates and encrypts data at speeds not previously possible for both software and hardware implementations. In GCM, data integrity is achieved by chaining Galois field multiplication operations while a symmetric key block cipher such as the Advanced Encryption Standard (AES), is used to meet goals of confidentiality. Area optimization in a number of proposed high throughput GCM designs have been approached through implementing efficient composite Sboxes for AES. Not as much work has been done in reducing area requirements of the Galois multiplication operation in the GCM which consists of up to 30% of the overall area using a bruteforce approach. Current pipelined implementations of GCM also have large key change latencies which potentially reduce the average throughput expected under traditional internet traffic conditions. This thesis aims to address these issues by presenting area efficient parallel multiplier designs for the GCM and provide an approach for achieving low latency key changes. The widely known Karatsuba parallel multiplier (KA) and the recently proposed Fan-Hasan multiplier (FH) were designed for the GCM and implemented on ASIC and FPGA architectures. This is the first time these multipliers have been compared with a practical implementation, and the FH multiplier showed note worthy improvements over the KA multiplier in terms of delay with similar area requirements. Using the composite Sbox, ASIC designs of GCM implemented with subquadratic multipliers are shown to have an area savings of up to 18%, without affecting the throughput, against designs using the brute force Mastrovito multiplier. For low delay LUT Sbox designs in GCM, although the subquadratic multipliers are a part of the critical path, implementations with the FH multiplier showed the highest efficiency in terms of area resources and throughput over all other designs. FPGA results similarly showed a significant reduction in the number of slices using subquadratic multipliers, and the highest throughput to date for FPGA implementations of GCM was also achieved. The proposed reduced latency key change design, which supports all key types of AES, showed a 20% improvement in average throughput over other GCM designs that do not use the same techniques. The GCM implementations provided in this thesis provide some of the most area efficient, yet high throughput designs to date

    Studies on high-speed hardware implementation of cryptographic algorithms

    Cryptographic algorithms are ubiquitous in modern communication systems where they have a central role in ensuring information security. This thesis studies efficient implementation of certain widely-used cryptographic algorithms. Cryptographic algorithms are computationally demanding and software-based implementations are often too slow or power consuming which yields a need for hardware implementation. Field Programmable Gate Arrays (FPGAs) are programmable logic devices which have proven to be highly feasible implementation platforms for cryptographic algorithms because they provide both speed and programmability. Hence, the use of FPGAs for cryptography has been intensively studied in the research community and FPGAs are also the primary implementation platforms in this thesis. This thesis presents techniques allowing faster implementations than existing ones. Such techniques are necessary in order to use high-security cryptographic algorithms in applications requiring high data rates, for example, in heavily loaded network servers. The focus is on Advanced Encryption Standard (AES), the most commonly used secret-key cryptographic algorithm, and Elliptic Curve Cryptography (ECC), public-key cryptographic algorithms which have gained popularity in the recent years and are replacing traditional public-key cryptosystems, such as RSA. Because these algorithms are well-defined and widely-used, the results of this thesis can be directly applied in practice. The contributions of this thesis include improvements to both algorithms and techniques for implementing them. Algorithms are modified in order to make them more suitable for hardware implementation, especially, focusing on increasing parallelism. Several FPGA implementations exploiting these modifications are presented in the thesis including some of the fastest implementations available in the literature. The most important contributions of this thesis relate to ECC and, specifically, to a family of elliptic curves providing faster computations called Koblitz curves. The results of this thesis can, in their part, enable increasing use of cryptographic algorithms in various practical applications where high computation speed is an issue

    Reliable and High-Performance Hardware Architectures for the Advanced Encryption Standard/Galois Counter Mode

    The high level of security and the fast hardware and software implementations of the Advanced Encryption Standard (AES) have made it the first choice for many critical applications. Since its acceptance as the adopted symmetric-key algorithm, the AES has been utilized in various security-constrained applications, many of which are power and resource constrained and require reliable and efficient hardware implementations. In this thesis, first, we investigate the AES algorithm from the concurrent fault detection point of view. We note that in addition to the efficiency requirements of the AES, it must be reliable against transient and permanent internal faults or malicious faults aiming at revealing the secret key. This reliability analysis and proposing efficient and effective fault detection schemes are essential because fault attacks have become a serious concern in cryptographic applications. Therefore, we propose, design, and implement various novel concurrent fault detection schemes for different AES hardware architectures. These include different structure-dependent and independent approaches for detecting single and multiple stuck-at faults using single and multi-bit signatures. The recently standardized authentication mode of the AES, i.e., Galois/Counter Mode (GCM), is also considered in this thesis. We propose efficient architectures for the AES-GCM algorithm. In this regard, we investigate the AES algorithm and we propose low-complexity and low-power hardware implementations for it, emphasizing on its nonlinear transformation, i.e., SubByes (S-boxes). We present new formulations for this transformation and through exhaustive hardware implementations, we show that the proposed architectures outperform their counterparts in terms of efficiency. Moreover, we present parallel, high-performance new schemes for the hardware implementations of the GCM to improve its throughput and reduce its latency. The performance of the proposed efficient architectures for the AES-GCM and their fault detection approaches are benchmarked using application-specific integrated circuit (ASIC) and field-programmable gate array (FPGA) hardware platforms. Our comparison results show that the proposed hardware architectures outperform their existing counterparts in terms of efficiency and fault detection capability

    Virtualized Reconfigurable Resources and Their Secured Provision in an Untrusted Cloud Environment

    The cloud computing business grows year after year. To keep up with increasing demand and to offer more services, data center providers are always searching for novel architectures. One of them are FPGAs, reconfigurable hardware with high compute power and energy efficiency. But some clients cannot make use of the remote processing capabilities. Not every involved party is trustworthy and the complex management software has potential security flaws. Hence, clients’ sensitive data or algorithms cannot be sufficiently protected. In this thesis state-of-the-art hardware, cloud and security concepts are analyzed and com- bined. On one side are reconfigurable virtual FPGAs. They are a flexible resource and fulfill the cloud characteristics at the price of security. But on the other side is a strong requirement for said security. To provide it, an immutable controller is embedded enabling a direct, confidential and secure transfer of clients’ configurations. This establishes a trustworthy compute space inside an untrusted cloud environment. Clients can securely transfer their sensitive data and algorithms without involving vulnerable software or a data center provider. This concept is implemented as a prototype. Based on it, necessary changes to current FPGAs are analyzed. To fully enable reconfigurable yet secure hardware in the cloud, a new hybrid architecture is required.Das Geschäft mit dem Cloud Computing wächst Jahr für Jahr. Um mit der steigenden Nachfrage mitzuhalten und neue Angebote zu bieten, sind Betreiber von Rechenzentren immer auf der Suche nach neuen Architekturen. Eine davon sind FPGAs, rekonfigurierbare Hardware mit hoher Rechenleistung und Energieeffizienz. Aber manche Kunden können die ausgelagerten Rechenkapazitäten nicht nutzen. Nicht alle Beteiligten sind vertrauenswürdig und die komplexe Verwaltungssoftware ist anfällig für Sicherheitslücken. Daher können die sensiblen Daten dieser Kunden nicht ausreichend geschützt werden. In dieser Arbeit werden modernste Hardware, Cloud und Sicherheitskonzept analysiert und kombiniert. Auf der einen Seite sind virtuelle FPGAs. Sie sind eine flexible Ressource und haben Cloud Charakteristiken zum Preis der Sicherheit. Aber auf der anderen Seite steht ein hohes Sicherheitsbedürfnis. Um dieses zu bieten ist ein unveränderlicher Controller eingebettet und ermöglicht eine direkte, vertrauliche und sichere Übertragung der Konfigurationen der Kunden. Das etabliert eine vertrauenswürdige Rechenumgebung in einer nicht vertrauenswürdigen Cloud Umgebung. Kunden können sicher ihre sensiblen Daten und Algorithmen übertragen ohne verwundbare Software zu nutzen oder den Betreiber des Rechenzentrums einzubeziehen. Dieses Konzept ist als Prototyp implementiert. Darauf basierend werden nötige Änderungen von modernen FPGAs analysiert. Um in vollem Umfang eine rekonfigurierbare aber dennoch sichere Hardware in der Cloud zu ermöglichen, wird eine neue hybride Architektur benötigt

    Chiffrement authentifié sur FPGAs de la partie reconfigurable à la partie static

    Communication systems need to access, store, manipulate, or communicate sensitive information. Therefore, cryptographic primitives such as hash functions and block ciphers are deployed to provide encryption and authentication. Recently, techniques have been invented to combine encryption and authentication into a single algorithm which is called Authenticated Encryption (AE). Combining these two security services in hardware produces better performance compared to two separated algorithms since authentication and encryption can share a part of the computation. Because of combining the programmability with the performance ofcustom hardware, FPGAs become more common as an implementation target for such algorithms. The first part of this thesis is devoted to efficient and high-speed FPGA-based architectures of AE algorithms, AES-GCM and AEGIS-128, in order to be used in the reconfigurable part of FPGAs to support security services of communication systems. Our focus on the state of the art leads to the introduction of high-speed architectures for slow changing keys applications like Virtual Private Networks (VPNs). Furthermore, we present an efficient method for implementing the GF(2¹²⁸) multiplier, which is responsible for the authentication task in AES-GCM, to support high-speed applications. Additionally, an efficient AEGIS-128is also implemented using only five AES rounds. Our hardware implementations were evaluated using Virtex-5 and Virtex-4 FPGAs. The performance of the presented architectures (Thr./Slices) outperforms the previously reported ones.The second part of the thesis presents techniques for low cost solutions in order to secure the reconfiguration of FPGAs. We present different ranges of low cost implementations of AES-GCM, AES-CCM, and AEGIS-128, which are used in the static part of the FPGA in order to decrypt and authenticate the FPGA bitstream. Presented ASIC architectures were evaluated using 90 and 65 nm technologies and they present better performance compared to the previous work.Les systèmes de communication ont besoin d'accéder, stocker, manipuler, ou de communiquer des informations sensibles. Par conséquent, les primitives cryptographiques tels que les fonctions de hachage et le chiffrement par blocs sont déployés pour fournir le cryptage et l'authentification. Récemment, des techniques ont été inventés pour combiner cryptage et d'authentification en un seul algorithme qui est appelé authentifiés Encryption (AE). La combinaison de ces deux services de sécurité dans le matériel de meilleures performances par rapport aux deux algorithmes séparés puisque l'authentification et le cryptage peuvent partager une partie du calcul. En raison de la combinaison de la programmation de l'exécution de matériel personnalisé, FPGA deviennent plus communs comme cible d'une mise en œuvre de ces algorithmes. La première partie de cette thèse est consacrée aux architectures d'algorithmes AE, AES-GCM et AEGIS-128 à base de FPGA efficaces et à grande vitesse, afin d'être utilisé dans la partie reconfigurable FPGA pour soutenir les services de sécurité des systèmes de communication. Notre focalisation sur l'état de l'art conduit à la mise en place d'architectures à haute vitesse pour les applications lentes touches changeantes comme les réseaux privés virtuels (VPN). En outre, nous présentons un procédé efficace pour mettre en œuvre le GF(2¹²⁸) multiplicateur, qui est responsable de la tâche d'authentification en AES-GCM, pour supporter les applications à grande vitesse. En outre, un système efficace AEGIS-128 est également mis en œuvre en utilisant seulement cinq tours AES. Nos réalisations matérielles ont été évaluées à l'aide Virtex-5 et Virtex-4 FPGA. La performance des architectures présentées (Thr. / Parts) surpasse ceux signalés précédemment. La deuxième partie de la thèse présente des techniques pour des solutions à faible coût afin de garantir la reconfiguration du FPGA. Nous présentons différentes gammes de mises en œuvre à faible coût de AES-GCM, AES-CCM, et AEGIS-128, qui sont utilisés dans la partie statique du FPGA afin de décrypter et authentifier le bitstream FPGA. Architectures ASIC présentées ont été évaluées à l'aide de 90 et 65 technologies nm et présentent de meilleures performances par rapport aux travaux antérieurs

    Optimized hardware implementations of cryptography algorithms for resource-constraint IoT devices and high-speed applications

    The advent of technologies, including the Internet and smartphones, has made people’s lives easier. Nowadays, people get used to digital applications for e-business, communicating with others, and sending or receiving sensitive messages. Sending secure data across the private network or the Internet is an open concern for every person. Cryptography plays an important role in privacy, security, and confidentiality against adversaries. Public-key cryptography (PKC) is one of the cryptography techniques that provides security over a large network, such as the Internet of Things (IoT). The classical PKCs, such as Elliptic Curve Cryptography (ECC) and Rivest-Shamir-Adleman (RSA), are based on the hardness of certain number theoretic problems. According to Shor’s algorithm, these algorithms can be solved very efficiently on a quantum computer, and cryptography algorithms will be insecure and weak as quantum computers increase in number. Based on NIST, Lattice-based cryptography (LBC) is one of the accepted quantum-resistant public-key cryptography. Different variants of LBC include Learning With Error (LWE), Ring Learning With Error (Ring-LWE), Binary Ring Learning with Error (Ring-Bin LWE), and etc. AES is also one of the secure cryptography algorithm that has been widely used in different applications and platforms. Also, AES-256 is secure against quantum attack. It is very important to design a crypto-system based on the need and application. In general, each network has three different layers; cloud, edge, and end-node. The cloud and edge layer require to have a high-speed crypto-system, as it is used in high-traffic application to encrypt and decrypt data. Unfortunately, most of the end-node devices are resource-constraint and do not have enough area for security guard. Providing end-to-end security is vital for every network. To mitigate this issue, designing and implementing a lightweight cryto-system for resource-constraint devices is necessary. In this thesis, a high-throughput FPGA implementation of AES algorithm for high-traffic edge applications is introduced. To achieve this goal, some part of the algorithm has been modified to balance the latency. Inner and outer pipelining techniques and loop-unrolling have been employed. The proposed high-speed implementation of AES achieves a throughput of 79.7Gbps, FPGA efficiency of 13.3 Mbps/slice, and frequency of 622.4MHz. Compared to the state-of-the-art work, the proposed design has improved data throughput by 8.02% and FPGA-Eff by 22.63%. Moreover, a lightweight architecture of AES for resource-constraint devices is designed and implemented on FPGA and ASIC. Each module of the architecture is specified in which occupied less area; and some units are shared among different phases. To reduce the power consumption clock gating technique is applied. Application-specific integrated circuit (ASIC) implementation results show a respective improvement in the area over the previous similar works from 35% to 2.4%. Based on the results and NIST report, the proposed design is a suitable crypto-system for tiny devices and can be supplied by low-power devices. Furthermore, two lightweight crypto-systems based on Binary Ring-LWE are presented for IoT end-node devices. For one of them, a novel column-based multiplication is introduced. To execute the column-based multiplication only one register is employed to store the intermediate results. The multiplication unit for the other Binary Ring-LWE design is optimized in which the multiplication is executed in less clock cycles. Moreover, to increase the security for end-node devices, the fault resiliency architecture has been designed and applied to the architecture of Binary Ring-LWE. Based on the implementation results and NIST report, the proposed Binary Ring-LWE designs is a suitable crypto-system form resource-constraint devices