50 research outputs found

    High Speed and Low Latency ECC Implementation over GF(2m) on FPGA

    Get PDF
    In this paper, a novel high-speed elliptic curve cryptography (ECC) processor implementation for point multiplication (PM) on field-programmable gate array (FPGA) is proposed. A new segmented pipelined full-precision multiplier is used to reduce the latency, and the Lopez-Dahab Montgomery PM algorithm is modified for careful scheduling to avoid data dependency resulting in a drastic reduction in the number of clock cycles (CCs) required. The proposed ECC architecture has been implemented on Xilinx FPGAs' Virtex4, Virtex5, and Virtex7 families. To the best of our knowledge, our single- and three-multiplier-based designs show the fastest performance to date when compared with reported works individually. Our one-multiplier-based ECC processor also achieves the highest reported speed together with the best reported area-time performance on Virtex4 (5.32 μs at 210 MHz), on Virtex5 (4.91 μs at 228 MHz), and on the more advanced Virtex7 (3.18 μs at 352 MHz). Finally, the proposed three-multiplier-based ECC implementation is the first work reporting the lowest number of CCs and the fastest ECC processor design on FPGA (450 CCs to get 2.83 μs on Virtex7)

    Multiplication in Finite Fields and Elliptic Curves

    Get PDF
    La cryptographie à clef publique permet de s'échanger des clefs de façon distante, d'effectuer des signatures électroniques, de s'authentifier à distance, etc. Dans cette thèse d'HDR nous allons présenter quelques contributions concernant l'implantation sûre et efficace de protocoles cryptographiques basés sur les courbes elliptiques. L'opération de base effectuée dans ces protocoles est la multiplication scalaire d'un point de la courbe. Chaque multiplication scalaire nécessite plusieurs milliers d'opérations dans un corps fini.Dans la première partie du manuscrit nous nous intéressons à la multiplication dans les corps finis car c'est l'opération la plus coûteuse et la plus utilisée. Nous présentons d'abord des contributions sur les multiplieurs parallèles dans les corps binaires. Un premier résultat concerne l'approche sous-quadratique dans une base normale optimale de type 2. Plus précisément, nous améliorons un multiplieur basé sur un produit de matrice de Toeplitz avec un vecteur en utilisant une recombinaison des blocs qui supprime certains calculs redondants. Nous présentons aussi un multiplieur pous les corps binaires basé sur une extension d'une optimisation de la multiplication polynomiale de Karatsuba.Ensuite nous présentons des résultats concernant la multiplication dans un corps premier. Nous présentons en particulier une approche de type Montgomery pour la multiplication dans une base adaptée à l'arithmétique modulaire. Cette approche cible la multiplication modulo un premier aléatoire. Nous présentons alors une méthode pour la multiplication dans des corps utilisés dans la cryptographie sur les couplages : les extensions de petits degrés d'un corps premier aléatoire. Cette méthode utilise une base adaptée engendrée par une racine de l'unité facilitant la multiplication polynomiale basée sur la FFT. Dans la dernière partie de cette thèse d'HDR nous nous intéressons à des résultats qui concernent la multiplication scalaire sur les courbes elliptiques. Nous présentons une parallélisation de l'échelle binaire de Montgomery dans le cas de E(GF(2^n)). Nous survolons aussi quelques contributions sur des formules de division par 3 dans E(GF(3^n)) et une parallélisation de type (third,triple)-and-add. Dans le dernier chapitre nous développons quelques directions de recherches futures. Nous discutons d'abord de possibles extensions des travaux faits sur les corps binaires. Nous présentons aussi des axes de recherche liés à la randomisation de l'arithmétique qui permet une protection contre les attaques matérielles

    Subquadratic Space Complexity Binary Field Multiplier Using Double Polynomial Representation

    Full text link

    Parallel Multiplier Designs for the Galois/Counter Mode of Operation

    Get PDF
    The Galois/Counter Mode of Operation (GCM), recently standardized by NIST, simultaneously authenticates and encrypts data at speeds not previously possible for both software and hardware implementations. In GCM, data integrity is achieved by chaining Galois field multiplication operations while a symmetric key block cipher such as the Advanced Encryption Standard (AES), is used to meet goals of confidentiality. Area optimization in a number of proposed high throughput GCM designs have been approached through implementing efficient composite Sboxes for AES. Not as much work has been done in reducing area requirements of the Galois multiplication operation in the GCM which consists of up to 30% of the overall area using a bruteforce approach. Current pipelined implementations of GCM also have large key change latencies which potentially reduce the average throughput expected under traditional internet traffic conditions. This thesis aims to address these issues by presenting area efficient parallel multiplier designs for the GCM and provide an approach for achieving low latency key changes. The widely known Karatsuba parallel multiplier (KA) and the recently proposed Fan-Hasan multiplier (FH) were designed for the GCM and implemented on ASIC and FPGA architectures. This is the first time these multipliers have been compared with a practical implementation, and the FH multiplier showed note worthy improvements over the KA multiplier in terms of delay with similar area requirements. Using the composite Sbox, ASIC designs of GCM implemented with subquadratic multipliers are shown to have an area savings of up to 18%, without affecting the throughput, against designs using the brute force Mastrovito multiplier. For low delay LUT Sbox designs in GCM, although the subquadratic multipliers are a part of the critical path, implementations with the FH multiplier showed the highest efficiency in terms of area resources and throughput over all other designs. FPGA results similarly showed a significant reduction in the number of slices using subquadratic multipliers, and the highest throughput to date for FPGA implementations of GCM was also achieved. The proposed reduced latency key change design, which supports all key types of AES, showed a 20% improvement in average throughput over other GCM designs that do not use the same techniques. The GCM implementations provided in this thesis provide some of the most area efficient, yet high throughput designs to date

    Modern Computer Arithmetic (version 0.5.1)

    Full text link
    This is a draft of a book about algorithms for performing arithmetic, and their implementation on modern computers. We are concerned with software more than hardware - we do not cover computer architecture or the design of computer hardware. Instead we focus on algorithms for efficiently performing arithmetic operations such as addition, multiplication and division, and their connections to topics such as modular arithmetic, greatest common divisors, the Fast Fourier Transform (FFT), and the computation of elementary and special functions. The algorithms that we present are mainly intended for arbitrary-precision arithmetic. They are not limited by the computer word size, only by the memory and time available for the computation. We consider both integer and real (floating-point) computations. The book is divided into four main chapters, plus an appendix. Our aim is to present the latest developments in a concise manner. At the same time, we provide a self-contained introduction for the reader who is not an expert in the field, and exercises at the end of each chapter. Chapter titles are: 1, Integer Arithmetic; 2, Modular Arithmetic and the FFT; 3, Floating-Point Arithmetic; 4, Elementary and Special Function Evaluation; 5 (Appendix), Implementations and Pointers. The book also contains a bibliography of 236 entries, index, summary of notation, and summary of complexities.Comment: Preliminary version of a book to be published by Cambridge University Press. xvi+247 pages. Cite as "Modern Computer Arithmetic, Version 0.5.1, 5 March 2010". For further details, updates and errata see http://wwwmaths.anu.edu.au/~brent/pub/pub226.html or http://www.loria.fr/~zimmerma/mca/pub226.htm

    Throughput/Area-efficient ECC Processor Using Montgomery Point Multiplication on FPGA

    Get PDF
    High throughput while maintaining low resource is a key issue for elliptic curve cryptography (ECC) hardware implementations in many applications. In this brief, an ECC processor architecture over Galois fields is presented, which achieves the best reported throughput/area performance on field-programmable gate array (FPGA) to date. A novel segmented pipelining digit serial multiplier is developed to speed up ECC point multiplication. To achieve low latency, a new combined algorithm is developed for point addition and point doubling with careful scheduling. A compact and flexible distributed-RAM-based memory unit design is developed to increase speed while keeping area low. Further optimizations were made via timing constraints and logic level modifications at the implementation level. The proposed architecture is implemented on Virtex4 (V4), Virtex5 (V5), and Virtex7 (V7) FPGA technologies and, respectively, achieved throughout/slice figures of 19.65, 65.30, and 64.48 (106/(Seconds × Slices))

    Efficient Design and implementation of Elliptic Curve Cryptography on FPGA

    Get PDF

    Digit-Level Serial-In Parallel-Out Multiplier Using Redundant Representation for a Class of Finite Fields

    Get PDF
    Two digit-level finite field multipliers in F2m using redundant representation are presented. Embedding F2m in cyclotomic field F2(n) causes a certain amount of redundancy and consequently performing field multiplication using redundant representation would require more hardware resources. Based on a specific feature of redundant representation in a class of finite fields, two new multiplication algorithms along with their pertaining architectures are proposed to alleviate this problem. Considering area-delay product as a measure of evaluation, it has been shown that both the proposed architectures considerably outperform existing digit-level multipliers using the same basis. It is also shown that for a subset of the fields, the proposed multipliers are of higher performance in terms of area-delay complexities among several recently proposed optimal normal basis multipliers. The main characteristics of the postplace&route application specific integrated circuit implementation of the proposed multipliers for three practical digit sizes are also reported
    corecore