18 research outputs found

    Low Complexity Bit-Parallel Square Root Computation over GF(2m2^m) for all Trinomials

    Get PDF
    In this contribution we introduce a low-complexity bit-parallel algorithm for computing square roots over binary extension fields. Our proposed method can be applied for any type of irreducible polynomials. We derive explicit formulae for the space and time complexities associated to the square root operator when working with binary extension fields generated using irreducible trinomials. We show that for those finite fields, it is possible to compute the square root of an arbitrary field element with equal or better hardware efficiency than the one associated to the field squaring operation. Furthermore, a practical application of the square root operator in the domain of field exponentiation computation is presented. It is shown that by using as building blocks squarers, multipliers and square root blocks, a parallel version of the classical square-and-multiply exponentiation algorithm can be obtained. A hardware implementation of that parallel version may provide a speedup of up to 50\% percent when compared with the traditional version

    Quantum resource estimates for computing elliptic curve discrete logarithms

    Get PDF
    We give precise quantum resource estimates for Shor's algorithm to compute discrete logarithms on elliptic curves over prime fields. The estimates are derived from a simulation of a Toffoli gate network for controlled elliptic curve point addition, implemented within the framework of the quantum computing software tool suite LIQUiUi|\rangle. We determine circuit implementations for reversible modular arithmetic, including modular addition, multiplication and inversion, as well as reversible elliptic curve point addition. We conclude that elliptic curve discrete logarithms on an elliptic curve defined over an nn-bit prime field can be computed on a quantum computer with at most 9n+2log2(n)+109n + 2\lceil\log_2(n)\rceil+10 qubits using a quantum circuit of at most 448n3log2(n)+4090n3448 n^3 \log_2(n) + 4090 n^3 Toffoli gates. We are able to classically simulate the Toffoli networks corresponding to the controlled elliptic curve point addition as the core piece of Shor's algorithm for the NIST standard curves P-192, P-224, P-256, P-384 and P-521. Our approach allows gate-level comparisons to recent resource estimates for Shor's factoring algorithm. The results also support estimates given earlier by Proos and Zalka and indicate that, for current parameters at comparable classical security levels, the number of qubits required to tackle elliptic curves is less than for attacking RSA, suggesting that indeed ECC is an easier target than RSA.Comment: 24 pages, 2 tables, 11 figures. v2: typos fixed and reference added. ASIACRYPT 201

    Cryptography for Ultra-Low Power Devices

    Get PDF
    Ubiquitous computing describes the notion that computing devices will be everywhere: clothing, walls and floors of buildings, cars, forests, deserts, etc. Ubiquitous computing is becoming a reality: RFIDs are currently being introduced into the supply chain. Wireless distributed sensor networks (WSN) are already being used to monitor wildlife and to track military targets. Many more applications are being envisioned. For most of these applications some level of security is of utmost importance. Common to WSN and RFIDs are their severely limited power resources, which classify them as ultra-low power devices. Early sensor nodes used simple 8-bit microprocessors to implement basic communication, sensing and computing services. Security was an afterthought. The main power consumer is the RF-transceiver, or radio for short. In the past years specialized hardware for low-data rate and low-power radios has been developed. The new bottleneck are security services which employ computationally intensive cryptographic operations. Customized hardware implementations hold the promise of enabling security for severely power constrained devices. Most research groups are concerned with developing secure wireless communication protocols, others with designing efficient software implementations of cryptographic algorithms. There has not been a comprehensive study on hardware implementations of cryptographic algorithms tailored for ultra-low power applications. The goal of this dissertation is to develop a suite of cryptographic functions for authentication, encryption and integrity that is specifically fashioned to the needs of ultra-low power devices. This dissertation gives an introduction to the specific problems that security engineers face when they try to solve the seemingly contradictory challenge of providing lightweight cryptographic services that can perform on ultra-low power devices and shows an overview of our current work and its future direction

    Efficient Square-based Montgomery Multiplier for All Type C.1 Pentanomials

    Get PDF
    In this paper, we present a low complexity bit-parallel Montgomery multiplier for GF(2m)GF(2^m) generated with a special class of irreducible pentanomials xm+xm1+xk+x+1x^m+x^{m-1}+x^k+x+1. Based on a combination of generalized polynomial basis (GPB) squarer and a newly proposed square-based divide and conquer approach, we can partition field multiplications into a composition of sub-polynomial multiplications and Montgomery/GPB squarings, which have simpler architecture and thus can be implemented efficiently. Consequently, the proposed multiplier roughly saves 1/4 logic gates compared with the fastest multipliers, while the time complexity matches previous multipliers using divide and conquer algorithms

    Efficient Implementation of Elliptic Curve Cryptography on FPGAs

    Get PDF
    This work presents the design strategies of an FPGA-based elliptic curve co-processor. Elliptic curve cryptography is an important topic in cryptography due to its relatively short key length and higher efficiency as compared to other well-known public key crypto-systems like RSA. The most important contributions of this work are: - Analyzing how different representations of finite fields and points on elliptic curves effect the performance of an elliptic curve co-processor and implementing a high performance co-processor. - Proposing a novel dynamic programming approach to find the optimum combination of different recursive polynomial multiplication methods. Here optimum means the method which has the smallest number of bit operations. - Designing a new normal-basis multiplier which is based on polynomial multipliers. The most important part of this multiplier is a circuit of size O(nlogn)O(n \log n) for changing the representation between polynomial and normal basis

    Squared Law Algorithms: Theory and Applications.

    Get PDF
    This dissertation focuses on a new approach for a hardware implementation of the cyclic convolution operation. The cyclic convolution operation is the core of several functions used in applications related to digital signal processing and error control. Since the operation is multiplication intensive and the cost of a multiplication operation is very high, most of the present research effort attempts to reduce the number of multiplications. Our approach, however, aims at obtaining an efficient implementation by relying on the properties of the special case of multiplication, namely, the squaring operation. Due to the properties exhibited by the squaring operation the hardware cost and time delay of a squarer unit is both cheaper and faster than that of a multiplication unit. This is true for both memory and non-memory based implementations. In this dissertation we have developed all the necessary theory required to express the cyclic convolution of two n-point sequences, where n is a power of 2, in terms of the elementary arithmetic operations add, square, and subtract. Our algorithms require fewer squaring operations than multiplication operations required by a traditional implementation of the cyclic convolution operation, do not introduce any round-off errors, place no restriction on word length, and are valid when the number of points to be convolved is a power of two. We then clearly demonstrate that our algorithms are also more hardware efficient for both memory and non-memory based implementations. Further, schemes to multiply two numbers based on the cyclic convolution operation are presented. Finally, efficient ways of computing the squaring operation when arithmetic is performed in modular rings are developed

    Computer Architectures for Cryptosystems Based on Hyperelliptic Curves

    Get PDF
    Security issues play an important role in almost all modern communication and computer networks. As Internet applications continue to grow dramatically, security requirements have to be strengthened. Hyperelliptic curve cryptosystems (HECC) allow for shorter operands at the same level of security than other public-key cryptosystems, such as RSA or Diffie-Hellman. These shorter operands appear promising for many applications. Hyperelliptic curves are a generalization of elliptic curves and they can also be used for building discrete logarithm public-key schemes. A major part of this work is the development of computer architectures for the different algorithms needed for HECC. The architectures are developed for a reconfigurable platform based on Field Programmable Gate Arrays (FPGAs). FPGAs combine the flexibility of software solutions with the security of traditional hardware implementations. In particular, it is possible to easily change all algorithm parameters such as curve coefficients and underlying finite field. In this work we first summarized the theoretical background of hyperelliptic curve cryptosystems. In order to realize the operation addition and doubling on the Jacobian, we developed architectures for the composition and reduction step. These in turn are based on architectures for arithmetic in the underlying field and for arithmetic in the polynomial ring. The architectures are described in VHDL (VHSIC Hardware Description Language) and the code was functionally verified. Some of the arithmetic modules were also synthesized. We provide estimates for the clock cycle count for a group operation in the Jacobian. The system targeted was HECC of genus four over GF(2^41)

    Hardware Implementations of the WG-16 Stream Cipher with Composite Field Arithmetic

    Get PDF
    The WG stream cipher family consists of stream ciphers based on the Welch-Gong (WG) transformations that are used as a nonlinear filter applied to the output of a linear feedback shift register (LFSR). The aim of this thesis is an exploration of the design space of the WG-16 stream cipher. Five different representations of the field elements were analyzed, namely the polynomial basis representation, the normal basis representation and three isomorphic tower field constructions of F216: F(((22)2)2)2, F(24)4 and F(28)2. Each design option begins with an in-depth description of different field constructions and their impact on the top-level WG transformation circuit. Normal basis representation of elements for each level of the tower was chosen for field constructions F(((22)2)2)2 and F(24)4, and a mixed basis, with polynomial basis for the lower and normal basis for the higher level of the tower for F(28)2. Representation of field elements affects the field arithmetic, which in turn affects the entire design. Targeting high throughput, pipelined architectures were developed, and pipelining was based on the particular field construction: each extension over the prime field offers a new pipelining possibility. Pipelining at a lower level of the tower field reduces the clock period. Most flexible pipelining options are possible for F(((22)2)2)2, a highly regular construction, which permits an algebraic optimization of the WG transformation resulting in two multiplications being removed. High speed, achieved by adequate pipelining granularity, and smaller area due to removed multipliers deem the F(((22)2)2)2 to be the most suitable field construction for the implementation of WG-16. The best WG-16 modules achieve a throughput of 222 Mbit/s with 476 slices used on the Xilinx Spartan-6 FPGA device xc6slx9 (using Xilinx Synthesis Tool (XST) for synthesis and ISE for implementation [47]) and a throughput of 529 Mbit/s with area cost of 12215 GEs for ASIC implementation, using the 65 nm CMOS technology (using Synopsys Design Compiler for synthesis [45] and Cadence SoC Encounter to complete the Place-and-Route phase)

    Doctor of Philosophy

    Get PDF
    dissertationAbstraction plays an important role in digital design, analysis, and verification, as it allows for the refinement of functions through different levels of conceptualization. This dissertation introduces a new method to compute a symbolic, canonical, word-level abstraction of the function implemented by a combinational logic circuit. This abstraction provides a representation of the function as a polynomial Z = F(A) over the Galois field F2k , expressed over the k-bit input to the circuit, A. This representation is easily utilized for formal verification (equivalence checking) of combinational circuits. The approach to abstraction is based upon concepts from commutative algebra and algebraic geometry, notably the Grobner basis theory. It is shown that the polynomial F(A) can be derived by computing a Grobner basis of the polynomials corresponding to the circuit, using a specific elimination term order based on the circuits topology. However, computing Grobner bases using elimination term orders is infeasible for large circuits. To overcome these limitations, this work introduces an efficient symbolic computation to derive the word-level polynomial. The presented algorithms exploit i) the structure of the circuit, ii) the properties of Grobner bases, iii) characteristics of Galois fields F2k , and iv) modern algorithms from symbolic computation. A custom abstraction tool is designed to efficiently implement the abstraction procedure. While the concept is applicable to any arbitrary combinational logic circuit, it is particularly powerful in verification and equivalence checking of hierarchical, custom designed and structurally dissimilar Galois field arithmetic circuits. In most applications, the field size and the datapath size k in the circuits is very large, up to 1024 bits. The proposed abstraction procedure can exploit the hierarchy of the given Galois field arithmetic circuits. Our experiments show that, using this approach, our tool can abstract and verify Galois field arithmetic circuits up to 1024 bits in size. Contemporary techniques fail to verify these types of circuits beyond 163 bits and cannot abstract a canonical representation beyond 32 bits
    corecore