7 research outputs found

    Design and analysis of an FPGA-based, multi-processor HW-SW system for SCC applications

    Get PDF
    The last 30 years have seen an increase in the complexity of embedded systems from a collection of simple circuits to systems consisting of multiple processors managing a wide variety of devices. This ever increasing complexity frequently requires that high assurance, fail-safe and secure design techniques be applied to protect against possible failures and breaches. To facilitate the implementation of these embedded systems in an efficient way, the FPGA industry recently created new families of devices. New features added to these devices include anti-tamper monitoring, bit stream encryption, and optimized routing architectures for physical and functional logic partition isolation. These devices have high capacities and are capable of implementing processors using their reprogrammable logic structures. This allows for an unprecedented level of hardware and software interaction within a single FPGA chip. High assurance and fail-safe systems can now be implemented within the reconfigurable hardware fabric of an FPGA, enabling these systems to maintain flexibility and achieve high performance while providing a high level of data security. The objective of this thesis was to design and analyze an FPGA-based system containing two isolated, softcore Nios processors that share data through two crypto-engines. FPGA-based single-chip cryptographic (SCC) techniques were employed to ensure proper component isolation when the design is placed on a device supporting the appropriate security primitives. Each crypto-engine is an implementation of the Advanced Encryption Standard (AES), operating in Galois/Counter Mode (GCM) for both encryption and authentication. The features of the microprocessors and architectures of the AES crypto-engines were varied with the goal of determining combinations which best target high performance, minimal hardware usage, or a combination of the two

    Parallel Multiplier Designs for the Galois/Counter Mode of Operation

    Get PDF
    The Galois/Counter Mode of Operation (GCM), recently standardized by NIST, simultaneously authenticates and encrypts data at speeds not previously possible for both software and hardware implementations. In GCM, data integrity is achieved by chaining Galois field multiplication operations while a symmetric key block cipher such as the Advanced Encryption Standard (AES), is used to meet goals of confidentiality. Area optimization in a number of proposed high throughput GCM designs have been approached through implementing efficient composite Sboxes for AES. Not as much work has been done in reducing area requirements of the Galois multiplication operation in the GCM which consists of up to 30% of the overall area using a bruteforce approach. Current pipelined implementations of GCM also have large key change latencies which potentially reduce the average throughput expected under traditional internet traffic conditions. This thesis aims to address these issues by presenting area efficient parallel multiplier designs for the GCM and provide an approach for achieving low latency key changes. The widely known Karatsuba parallel multiplier (KA) and the recently proposed Fan-Hasan multiplier (FH) were designed for the GCM and implemented on ASIC and FPGA architectures. This is the first time these multipliers have been compared with a practical implementation, and the FH multiplier showed note worthy improvements over the KA multiplier in terms of delay with similar area requirements. Using the composite Sbox, ASIC designs of GCM implemented with subquadratic multipliers are shown to have an area savings of up to 18%, without affecting the throughput, against designs using the brute force Mastrovito multiplier. For low delay LUT Sbox designs in GCM, although the subquadratic multipliers are a part of the critical path, implementations with the FH multiplier showed the highest efficiency in terms of area resources and throughput over all other designs. FPGA results similarly showed a significant reduction in the number of slices using subquadratic multipliers, and the highest throughput to date for FPGA implementations of GCM was also achieved. The proposed reduced latency key change design, which supports all key types of AES, showed a 20% improvement in average throughput over other GCM designs that do not use the same techniques. The GCM implementations provided in this thesis provide some of the most area efficient, yet high throughput designs to date

    Multiplication in Finite Fields and Elliptic Curves

    Get PDF
    La cryptographie à clef publique permet de s'échanger des clefs de façon distante, d'effectuer des signatures électroniques, de s'authentifier à distance, etc. Dans cette thèse d'HDR nous allons présenter quelques contributions concernant l'implantation sûre et efficace de protocoles cryptographiques basés sur les courbes elliptiques. L'opération de base effectuée dans ces protocoles est la multiplication scalaire d'un point de la courbe. Chaque multiplication scalaire nécessite plusieurs milliers d'opérations dans un corps fini.Dans la première partie du manuscrit nous nous intéressons à la multiplication dans les corps finis car c'est l'opération la plus coûteuse et la plus utilisée. Nous présentons d'abord des contributions sur les multiplieurs parallèles dans les corps binaires. Un premier résultat concerne l'approche sous-quadratique dans une base normale optimale de type 2. Plus précisément, nous améliorons un multiplieur basé sur un produit de matrice de Toeplitz avec un vecteur en utilisant une recombinaison des blocs qui supprime certains calculs redondants. Nous présentons aussi un multiplieur pous les corps binaires basé sur une extension d'une optimisation de la multiplication polynomiale de Karatsuba.Ensuite nous présentons des résultats concernant la multiplication dans un corps premier. Nous présentons en particulier une approche de type Montgomery pour la multiplication dans une base adaptée à l'arithmétique modulaire. Cette approche cible la multiplication modulo un premier aléatoire. Nous présentons alors une méthode pour la multiplication dans des corps utilisés dans la cryptographie sur les couplages : les extensions de petits degrés d'un corps premier aléatoire. Cette méthode utilise une base adaptée engendrée par une racine de l'unité facilitant la multiplication polynomiale basée sur la FFT. Dans la dernière partie de cette thèse d'HDR nous nous intéressons à des résultats qui concernent la multiplication scalaire sur les courbes elliptiques. Nous présentons une parallélisation de l'échelle binaire de Montgomery dans le cas de E(GF(2^n)). Nous survolons aussi quelques contributions sur des formules de division par 3 dans E(GF(3^n)) et une parallélisation de type (third,triple)-and-add. Dans le dernier chapitre nous développons quelques directions de recherches futures. Nous discutons d'abord de possibles extensions des travaux faits sur les corps binaires. Nous présentons aussi des axes de recherche liés à la randomisation de l'arithmétique qui permet une protection contre les attaques matérielles

    Near Data Processing for Efficient and Trusted Systems

    Full text link
    We live in a world which constantly produces data at a rate which only increases with time. Conventional processor architectures fail to process this abundant data in an efficient manner as they expend significant energy in instruction processing and moving data over deep memory hierarchies. Furthermore, to process large amounts of data in a cost effective manner, there is increased demand for remote computation. While cloud service providers have come up with innovative solutions to cater to this increased demand, the security concerns users feel for their data remains a strong impediment to their wide scale adoption. An exciting technique in our repertoire to deal with these challenges is near-data processing. Near-data processing (NDP) is a data-centric paradigm which moves computation to where data resides. This dissertation exploits NDP to both process the data deluge we face efficiently and design low-overhead secure hardware designs. To this end, we first propose Compute Caches, a novel NDP technique. Simple augmentations to underlying SRAM design enable caches to perform commonly used operations. In-place computation in caches not only avoids excessive data movement over memory hierarchy, but also significantly reduces instruction processing energy as independent sub-units inside caches perform computation in parallel. Compute Caches significantly improve the performance and reduce energy expended for a suite of data intensive applications. Second, this dissertation identifies security advantages of NDP. While memory bus side channel has received much attention, a low-overhead hardware design which defends against it remains elusive. We observe that smart memory, memory with compute capability, can dramatically simplify this problem. To exploit this observation, we propose InvisiMem which uses the logic layer in the smart memory to implement cryptographic primitives, which aid in addressing memory bus side channel efficiently. Our solutions obviate the need for expensive constructs like Oblivious RAM (ORAM) and Merkle trees, and have one to two orders of magnitude lower overheads for performance, space, energy, and memory bandwidth, compared to prior solutions. This dissertation also addresses a related vulnerability of page fault side channel in which the Operating System (OS) induces page faults to learn application's address trace and deduces application secrets from it. To tackle it, we propose Sanctuary which obfuscates page fault channel while allowing the OS to manage memory as a resource. To do so, we design a novel construct, Oblivious Page Management (OPAM) which is derived from ORAM but is customized for page management context. We employ near-memory page moves to reduce OPAM overhead and also propose a novel memory partition to reduce OPAM transactions required. For a suite of cloud applications which process sensitive data we show that page fault channel can be tackled at reasonable overheads.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/144139/1/shaizeen_1.pd

    On Software Implementation of High Performance GHASH Algorithms

    Get PDF
    There have been several modes of operations available for symmetric key block ciphers, among which Galois Counter Mode (GCM) of operation is a standard. GCM mode of operation provides confidentiality with the help of symmetric key block cipher operating in counter mode. The authentication component of GCM comprises of Galois hash (GHASH) computation which is a keyed hash function. The most important component of GHASH computation is carry-less multiplication of 128-bit operands which is followed by a modulo reduction. There have been a number of schemes proposed for efficient software implementation of carry-less multiplication to improve performance of GHASH by increasing the speed of multiplications. This thesis focuses on providing an efficient way of software implementation of high performance GHASH function as being proposed by Meloni et al., and also on the implementation of GHASH using a carry-less multiplication instruction provided by Intel on their Westmere architecture. The thesis work includes implementation of the high performance GHASH and its comparison to the older or standard implementation of GHASH function. It also includes comparison of the two implementations using Intel’s carry-less multiplication instruction. This is the first time that this kind of comparison is being done on software implementations of these algorithms. Our software implementations suggest that the new GHASH algorithm, which was originally proposed for the hardware implementations due to the required parallelization, can't take advantage of the Intel carry-less multiplication instruction PCLMULQDQ. On the other hand, when implementations are done without using the PCLMULQDQ instruction the new algorithm performs better, even if its inherent parallelization is not utilized. This suggest that the new algorithm will perform better on embedded systems that do not support PCLMULQDQ

    Reliable and High-Performance Hardware Architectures for the Advanced Encryption Standard/Galois Counter Mode

    Get PDF
    The high level of security and the fast hardware and software implementations of the Advanced Encryption Standard (AES) have made it the first choice for many critical applications. Since its acceptance as the adopted symmetric-key algorithm, the AES has been utilized in various security-constrained applications, many of which are power and resource constrained and require reliable and efficient hardware implementations. In this thesis, first, we investigate the AES algorithm from the concurrent fault detection point of view. We note that in addition to the efficiency requirements of the AES, it must be reliable against transient and permanent internal faults or malicious faults aiming at revealing the secret key. This reliability analysis and proposing efficient and effective fault detection schemes are essential because fault attacks have become a serious concern in cryptographic applications. Therefore, we propose, design, and implement various novel concurrent fault detection schemes for different AES hardware architectures. These include different structure-dependent and independent approaches for detecting single and multiple stuck-at faults using single and multi-bit signatures. The recently standardized authentication mode of the AES, i.e., Galois/Counter Mode (GCM), is also considered in this thesis. We propose efficient architectures for the AES-GCM algorithm. In this regard, we investigate the AES algorithm and we propose low-complexity and low-power hardware implementations for it, emphasizing on its nonlinear transformation, i.e., SubByes (S-boxes). We present new formulations for this transformation and through exhaustive hardware implementations, we show that the proposed architectures outperform their counterparts in terms of efficiency. Moreover, we present parallel, high-performance new schemes for the hardware implementations of the GCM to improve its throughput and reduce its latency. The performance of the proposed efficient architectures for the AES-GCM and their fault detection approaches are benchmarked using application-specific integrated circuit (ASIC) and field-programmable gate array (FPGA) hardware platforms. Our comparison results show that the proposed hardware architectures outperform their existing counterparts in terms of efficiency and fault detection capability
    corecore