26 research outputs found

    Introduction to the Journal of Cryptographic Engineering

    Full text link

    Efficient subgroup exponentiation in quadratic and sixth degree extensions

    No full text
    This paper describes several speedups for computation in the order p + 1 subgroup of F p * 2 and the order p 2 - p + 1 subgroup of F p * 6 These results are in a way complementary to LUC and XTR, where computations in these groups are sped up using trace maps. As a side result, we present an efficient method for XTR with p = 3 mod 4

    Time efficient dual-field unit for cryptography-related processing

    No full text
    [ITALIAN VERSION, English version below] Il lavoro si colloca nell'ambito dell'aritmetica dei calcolatori. In particolare, esso tratta la moltiplicazione modulare, un'operazione centrale in diverse aree quali i codici per il controllo degli errori e l'implementazione di algoritmi crittografici. In effetti, gli onerosi algoritmi di crittografia a chiave pubblica, quali i crittosistemi basati sull’algoritmo Rivest-Shamir-Adleman (RSA) e su Curve Ellittiche (EC), sono profondamente influenzati nelle loro prestazioni dalla moltiplicazione modulare. In crittografia, tale operazione può essere effettuata in due differenti strutture algebriche, precisamente i campi finiti GF(N) e GF(2^m), che normalmente richiedono soluzioni hardware distinte per velocizzare i calcoli. Il prodotto di Montgomery rappresenta la soluzione maggiormente adottata in quanto essa consente un'efficiente implementazione hardware, purché si adotti una definizione leggermente modificata di prodotto modulare. In questo lavoro, si propone una innovativa architettura unificata per il prodotto di Montgomery parallelo che consenta operazioni sia nel campo finito GF(N) che in GF(2^m), critici per i crittosistemi a chiave pubblica RSA ed ECC. Lo schema hardware intercala moltiplicazione e riduzione modulare. Inoltre, esso manipola il moltiplicando attraverso uno schema modificato di ricodifica di Booth, ed adotta uno approccio radix-4 per il modulo, consentendo tempi di esecuzione ridotti anche per dimensioni degli operandi ragionevolmente grandi. Si presenta inoltre nell’articolo un'architettura pipelined basata sui blocchi paralleli precedentemente introdotti, che richiede un numero di colpi di clock estremamente ridotto ed alti livelli di throughput per gli operandi lunghi tipicamente adoperati in applicazioni crittografiche. Una serie di risultati sperimentali, basati su una tecnologia CMOS a 0,18 µm, dimostra l'efficacia delle tecniche proposte superando i migliori risultati presentati precedentemente in letteratura. [ENGLISH VERSION] This work deals with computer arithmetic. Specifically, it focuses on modular multiplication, a central operation in several areas such as error control codes and cryptographic computing. In fact, computational demanding public key cryptographic algorithms, such as Rivest-Shamir-Adleman (RSA) and Elliptic Curve (EC) cryptosystems, are deeply affected by modular multiplication for their performance. Modular multiplication used in cryptography may be performed in two different algebraic structures, namely GF(N) and GF(2^n), which normally require distinct hardware solutions for speeding up performance. Montgomery multiplication is the most widely adopted solution, as it enables efficient hardware implementations, provided that a slightly modified definition of modular multiplication is adopted. In the paper, we propose a novel unified architecture for parallel Montgomery multiplication supporting both GF(N) and GF(2^n) finite field operations, which are critical for RSA ad ECC public key cryptosystems. The hardware scheme interleaves multiplication and modulo reduction. Furthermore, it relies on a modified Booth recoding scheme for the multiplicand and a radix-4 scheme for the modulus, enabling reduced time delays even for moderately large operand widths. In addition, we present a pipelined architecture based on the parallel blocks previously introduced, enabling very low clock counts and high throughput levels for long operands used in cryptographic applications. Experimental results, based on 0.18 µm CMOS technology, prove the effectiveness of the proposed techniques, and outperform the best results previously presented in the technical literature

    Modular Number Systems: Beyond the Mersenne Family

    No full text
    à paraître dans : SAC'04: 11th Workshop on Selected Areas in CryptographyModular Number Systems: Beyond the Mersenne Famil

    A black hen lays white eggs. Bipartite multiplier out of Montgomery one for on-line RSA verification

    No full text
    This paper proposes novel algorithms for computing double- size modular multiplications with few modulus-dependent precomputations. Low-end devices such as smartcards are usually equipped with hardware Montgomery multipliers. However, due to progresses of mathematical attacks, security institutions such as NIST have steadily demanded longer bit-lengths for public-key cryptography, making the multipliers quickly obsolete. In an attempt to extend the lifespan of such multipliers, double-size techniques compute modular multiplications with twice the bit-length of the multipliers. Techniques are known for extending the bit-length of classical Euclidean multipliers, of Montgomery multipliers and the combination thereof, namely bipartite multipliers. However, unlike classical and bipartite multiplications, Montgomery multiplications involve modulus-dependent precomputations, which amount to a large part of an RSA encryption or signature verification. The proposed double-size technique simulates double-size multiplications based on single-size Montgomery multipliers, and yet precomputations are essentially free: in an 2048-bit RSA encryption or signature verification with public exponent e = 2/sup 16/ + 1, the proposal with a 1024-bit Montgomery multiplier is 1.4 times faster than the best previous technique.Anglai

    An exploration of mechanisms for dynamic cryptographic instruction set extension

    No full text
    Abstract. Instruction Set Extensions (ISEs) supplement a host processor with special-purpose, typically fixed-function hardware components and instructions to utilize them. For cryptographic use-cases, this can be very effective due to the demand for non-standard or niche operations that are not supported by general-purpose architectures. However, one disadvantage of fixed-function ISEs is inflexibility, contradicting a need for “algorithm agility. ” This paper explores a new approach, namely the provision of re-configurable mechanisms to support dynamic (run-time changeable) ISEs. Our results, obtained using an FPGA-based LEON3 prototype, show that this approach provides a flexible general-purpose platform for cryptographic ISEs with all known advantages of previous work, but relies on careful analysis of the associated security issues. Key words: FPGA, embedded processor, instruction set extension.

    A Simple Architectural Enhancement for Fast and Flexible Elliptic Curve Cryptography Over Binary Finite Fields GF(2m)

    No full text
    Abstract. Mobile and wireless devices like cell phones and networkenhanced PDAs have become increasingly popular in recent years. The security of data transmitted via these devices is a topic of growing importance and methods of public-key cryptography are able to satisfy this need. Elliptic curve cryptography (ECC) is especially attractive for devices which have restrictions in terms of computing power and energy supply. The efficiency of ECC implementations is highly dependent on the performance of arithmetic operations in the underlying finite field. This work presents a simple architectural enhancement to a generalpurpose processor core which facilitates arithmetic operations in binary finite fields GF(2 m). A custom instruction for a multiply step for binary polynomials has been integrated into a SPARC V8 core, which subsequently served to compare the merits of the enhancement for two different ECC implementations. One was tailored to the use of GF(2 191) with a fixed reduction polynomial. The tailored implementation was sped up by 90 % and its code size was reduced. The second implementation worked for arbitrary binary fields with a range of reduction polynomials. The flexible implementation was accelerated by a factor of nearly 10
    corecore