12 research outputs found

    Area- Efficient VLSI Implementation of Serial-In Parallel-Out Multiplier Using Polynomial Representation in Finite Field GF(2m)

    Full text link
    Finite field multiplier is mainly used in elliptic curve cryptography, error-correcting codes and signal processing. Finite field multiplier is regarded as the bottleneck arithmetic unit for such applications and it is the most complicated operation over finite field GF(2m) which requires a huge amount of logic resources. In this paper, a new modified serial-in parallel-out multiplication algorithm with interleaved modular reduction is suggested. The proposed method offers efficient area architecture as compared to proposed algorithms in the literature. The reduced finite field multiplier complexity is achieved by means of utilizing logic NAND gate in a particular architecture. The efficiency of the proposed architecture is evaluated based on criteria such as time (latency, critical path) and space (gate-latch number) complexity. A detailed comparative analysis indicates that, the proposed finite field multiplier based on logic NAND gate outperforms previously known resultsComment: 19 pages, 4 figure

    Novel Single and Hybrid Finite Field Multipliers over GF(2m) for Emerging Cryptographic Systems

    Get PDF
    With the rapid development of economic and technical progress, designers and users of various kinds of ICs and emerging embedded systems like body-embedded chips and wearable devices are increasingly facing security issues. All of these demands from customers push the cryptographic systems to be faster, more efficient, more reliable and safer. On the other hand, multiplier over GF(2m) as the most important part of these emerging cryptographic systems, is expected to be high-throughput, low-complexity, and low-latency. Fortunately, very large scale integration (VLSI) digital signal processing techniques offer great facilities to design efficient multipliers over GF(2m). This dissertation focuses on designing novel VLSI implementation of high-throughput low-latency and low-complexity single and hybrid finite field multipliers over GF(2m) for emerging cryptographic systems. Low-latency (latency can be chosen without any restriction) high-speed pentanomial basis multipliers are presented. For the first time, the dissertation also develops three high-throughput digit-serial multipliers based on pentanomials. Then a novel realization of digit-level implementation of multipliers based on redundant basis is introduced. Finally, single and hybrid reordered normal basis bit-level and digit-level high-throughput multipliers are presented. To the authors knowledge, this is the first time ever reported on multipliers with multiple throughput rate choices. All the proposed designs are simple and modular, therefore suitable for VLSI implementation for various emerging cryptographic systems

    LFSR-based bit-serial GF(^2m) multipliers using irreducible trinomials

    Get PDF
    In this article, a new architecture of bit-serial polynomial basis (PB) multipliers over the binary extension field GF(^2m) generated by irreducible trinomials is presented. Bit-serial GF(^2m) PB multiplication offers a performance/area trade-off that is very useful in resource constrained applications. The architecture here proposed is based on LFSR (Linear-Feedback Shift Register) and can perform a multiplication in m clock cycles with a constant propagation delay of T_A + T_X. These values match the best time results found in the literature for bit-serial PB multipliers with a slight reduction of the space complexity. Furthermore, the proposed architecture can perform the multiplication of two operands for t different finite fields GF(^2m) generated by t irreducible trinomials simultaneously in m clock cycles with the inclusion of t(m - 1) flipflops and tm XOR gates

    Bit Serial Systolic Architectures for Multiplicative Inversion and Division over GF(2<sup>m</sup>)

    Get PDF
    Systolic architectures are capable of achieving high throughput by maximizing pipelining and by eliminating global data interconnects. Recursive algorithms with regular data flows are suitable for systolization. The computation of multiplicative inversion using algorithms based on EEA (Extended Euclidean Algorithm) are particularly suitable for systolization. Implementations based on EEA present a high degree of parallelism and pipelinability at bit level which can be easily optimized to achieve local data flow and to eliminate the global interconnects which represent most important bottleneck in todays sub-micron design process. The net result is to have high clock rate and performance based on efficient systolic architectures. This thesis examines high performance but also scalable implementations of multiplicative inversion or field division over Galois fields GF(2m) in the specific case of cryptographic applications where field dimension m may be very large (greater than 400) and either m or defining irreducible polynomial may vary. For this purpose, many inversion schemes with different basis representation are studied and most importantly variants of EEA and binary (Stein's) GCD computation implementations are reviewed. A set of common as well as contrasting characteristics of these variants are discussed. As a result a generalized and optimized variant of EEA is proposed which can compute division, and multiplicative inversion as its subset, with divisor in either polynomial or triangular basis representation. Further results regarding Hankel matrix formation for double-basis inversion is provided. The validity of using the same architecture to compute field division with polynomial or triangular basis representation is proved. Next, a scalable unidirectional bit serial systolic array implementation of this proposed variant of EEA is implemented. Its complexity measures are defined and these are compared against the best known architectures. It is shown that assuming the requirements specified above, this proposed architecture may achieve a higher clock rate performance w. r. t. other designs while being more flexible, reliable and with minimum number of inter-cell interconnects. The main contribution at system level architecture is the substitution of all counter or adder/subtractor elements with a simpler distributed and free of carry propagation delays structure. Further a novel restoring mechanism for result sequences of EEA is proposed using a double delay element implementation. Finally, using this systolic architecture a CMD (Combined Multiplier Divider) datapath is designed which is used as the core of a novel systolic elliptic curve processor. This EC processor uses affine coordinates to compute scalar point multiplication which results in having a very small control unit and negligible with respect to the datapath for all practical values of m. The throughput of this EC based on this bit serial systolic architecture is comparable with designs many times larger than itself reported previously

    Efficient Arithmetic for the Implementation of Elliptic Curve Cryptography

    Get PDF
    The technology of elliptic curve cryptography is now an important branch in public-key based crypto-system. Cryptographic mechanisms based on elliptic curves depend on the arithmetic of points on the curve. The most important arithmetic is multiplying a point on the curve by an integer. This operation is known as elliptic curve scalar (or point) multiplication operation. A cryptographic device is supposed to perform this operation efficiently and securely. The elliptic curve scalar multiplication operation is performed by combining the elliptic curve point routines that are defined in terms of the underlying finite field arithmetic operations. This thesis focuses on hardware architecture designs of elliptic curve operations. In the first part, we aim at finding new architectures to implement the finite field arithmetic multiplication operation more efficiently. In this regard, we propose novel schemes for the serial-out bit-level (SOBL) arithmetic multiplication operation in the polynomial basis over F_2^m. We show that the smallest SOBL scheme presented here can provide about 26-30\% reduction in area-complexity cost and about 22-24\% reduction in power consumptions for F_2^{163} compared to the current state-of-the-art bit-level multiplier schemes. Then, we employ the proposed SOBL schemes to present new hybrid-double multiplication architectures that perform two multiplications with latency comparable to the latency of a single multiplication. Then, in the second part of this thesis, we investigate the different algorithms for the implementation of elliptic curve scalar multiplication operation. We focus our interest in three aspects, namely, the finite field arithmetic cost, the critical path delay, and the protection strength from side-channel attacks (SCAs) based on simple power analysis. In this regard, we propose a novel scheme for the scalar multiplication operation that is based on processing three bits of the scalar in the exact same sequence of five point arithmetic operations. We analyse the security of our scheme and show that its security holds against both SCAs and safe-error fault attacks. In addition, we show how the properties of the proposed elliptic curve scalar multiplication scheme yields an efficient hardware design for the implementation of a single scalar multiplication on a prime extended twisted Edwards curve incorporating 8 parallel multiplication operations. Our comparison results show that the proposed hardware architecture for the twisted Edwards curve model implemented using the proposed scalar multiplication scheme is the fastest secure SCA protected scalar multiplication scheme over prime field reported in the literature

    Design And Implementation Of Efficient Multiplier Using Vedic Sutras

    Get PDF
    The primary target of this venture is to outline a productive excess parallel multiplier utilizing recursive disintegration calculation. By reasonable projection of SFG of proposed calculation and distinguishing appropriate cut-sets for bolster forward cut-set retiming, three novel high-throughput digit-serial RB multipliers are inferred to accomplish altogether less zone time-control complexities than the current ones. It is demonstrated that the proposed high-throughput structures are the best among the relating plans, for FPGA and ASIC execution. This undertaking is actualized by utilizing vedic duplications. Urdhva Tiryakbhyam sutra is utilized for planning this multiplier. Range is fundamental diminishment with this kind of multiplier. Through productive projection of flag stream chart (SFG) of the proposed calculation, a profoundly normal processor-space stream diagram (PSFG) is inferred

    Hardware Implementations for Symmetric Key Cryptosystems

    Get PDF
    The utilization of global communications network for supporting new electronic applications is growing. Many applications provided over the global communications network involve exchange of security-sensitive information between different entities. Often, communicating entities are located at different locations around the globe. This demands deployment of certain mechanisms for providing secure communications channels between these entities. For this purpose, cryptographic algorithms are used by many of today\u27s electronic applications to maintain security. Cryptographic algorithms provide set of primitives for achieving different security goals such as: confidentiality, data integrity, authenticity, and non-repudiation. In general, two main categories of cryptographic algorithms can be used to accomplish any of these security goals, namely, asymmetric key algorithms and symmetric key algorithms. The security of asymmetric key algorithms is based on the hardness of the underlying computational problems, which usually require large overhead of space and time complexities. On the other hand, the security of symmetric key algorithms is based on non-linear transformations and permutations, which provide efficient implementations compared to the asymmetric key ones. Therefore, it is common to use asymmetric key algorithms for key exchange, while symmetric key counterparts are deployed in securing the communications sessions. This thesis focuses on finding efficient hardware implementations for symmetric key cryptosystems targeting mobile communications and resource constrained applications. First, efficient lightweight hardware implementations of two members of the Welch-Gong (WG) family of stream ciphers, the WG(29,11)\left(29,11\right) and WG-1616, are considered for the mobile communications domain. Optimizations in the WG(29,11)\left(29,11\right) stream cipher are considered when the GF(229)GF\left(2^{29}\right) elements are represented in either the Optimal normal basis type-II (ONB-II) or the Polynomial basis (PB). For WG-1616, optimizations are considered only for PB representations of the GF(216)GF\left(2^{16}\right) elements. In this regard, optimizations for both ciphers are accomplished mainly at the arithmetic level through reducing the number of field multipliers, based on novel trace properties. In addition, other optimization techniques such as serialization and pipelining, are also considered. After this, the thesis explores efficient hardware implementations for digit-level multiplication over binary extension fields GF(2m)GF\left(2^{m}\right). Efficient digit-level GF(2m)GF\left(2^{m}\right) multiplications are advantageous for ultra-lightweight implementations, not only in symmetric key algorithms, but also in asymmetric key algorithms. The thesis introduces new architectures for digit-level GF(2m)GF\left(2^{m}\right) multipliers considering the Gaussian normal basis (GNB) and PB representations of the field elements. The new digit-level GF(2m)GF\left(2^{m}\right) single multipliers do not require loading of the two input field elements in advance to computations. This feature results in high throughput fast multiplication in resource constrained applications with limited capacity of input data-paths. The new digit-level GF(2m)GF\left(2^{m}\right) single multipliers are considered for both the GNB and PB. In addition, for the GNB representation, new architectures for digit-level GF(2m)GF\left(2^{m}\right) hybrid-double and hybrid-triple multipliers are introduced. The new digit-level GF(2m)GF\left(2^{m}\right) hybrid-double and hybrid-triple GNB multipliers, respectively, accomplish the multiplication of three and four field elements using the latency required for multiplying two field elements. Furthermore, a new hardware architecture for the eight-ary exponentiation scheme is proposed by utilizing the new digit-level GF(2m)GF\left(2^{m}\right) hybrid-triple GNB multipliers

    UpWB: An Uncoupled Architecture Design for White-box Cryptography Using Vectorized Montgomery Multiplication

    Get PDF
    White-box cryptography (WBC) seeks to protect secret keys even if the attacker has full control over the execution environment. One of the techniques to hide the key is space hardness approach, which conceals the key into a large lookup table generated from a reliable small block cipher. Despite its provable security, space-hard WBC also suffers from heavy performance overhead when executed on general purpose hardware platform, hundreds of magnitude slower than conventional block ciphers. Specifically, recent studies adopt nested substitution permutation network (NSPN) to construct dedicated white-box block cipher [BIT16], whose performance is limited by a massive number of rounds, nested loop dependency and high-dimension dynamic maximal distance separable (MDS) matrices. To address these limitations, we put forward UpWB, an uncoupled and efficient accelerator for NSPN-structure WBC. We propose holistic optimization techniques across timing schedule, algorithms and operators. For the high-level timing schedule, we propose a fine-grained task partition (FTP) mechanism to decouple the parameteroriented nested loop with different trip counts. The FTP mechanism narrows down the idle time for synchronization and avoids the extra usage of FIFO, which efficiently increases the computation throughput. For the optimization of arithmetic operators, we devise a flexible and vectorized modular multiplier (VMM) based on the complexity-reduced Montgomery algorithm, which can process multi-precision variable data, multi-size matrix-vector multiplication and different irreducible polynomials. Then, a configurable matrix-vector multiplication (MVM) architecture with diagonal-major dataflow is presented to handle the dynamic MDS matrix. The multi-scale (Inv)Mixcolumns are also unified in a compact manner by intensively sharing the common sub-operations and customizing the constant multiplier. To verify the proposed methodology, we showcase the unified design implementation for three recent families of WBCs, including SPNbox-8/16/24/32, Yoroi-16/32 and WARX-16. Evaluated on FPGA platform, UpWB outperforms the optimized software counterpart (executed on 3.2 GHz Intel CPU with AES-NI and AVX2 instructions) by 7x to 30x in terms of computation throughput. Synthesized under TSMC 28nm technology, 36x to 164x improvement of computation throughput is achieved when UpWB operates at the maximum frequency of 1.3 GHz and consumes a modest area 0.14 mm2. Besides, the proposed VMM also offers about 30% improvement of area efficiency without pulling flexibility down when compared to state-of-the-art work

    Studies on high-speed hardware implementation of cryptographic algorithms

    Get PDF
    Cryptographic algorithms are ubiquitous in modern communication systems where they have a central role in ensuring information security. This thesis studies efficient implementation of certain widely-used cryptographic algorithms. Cryptographic algorithms are computationally demanding and software-based implementations are often too slow or power consuming which yields a need for hardware implementation. Field Programmable Gate Arrays (FPGAs) are programmable logic devices which have proven to be highly feasible implementation platforms for cryptographic algorithms because they provide both speed and programmability. Hence, the use of FPGAs for cryptography has been intensively studied in the research community and FPGAs are also the primary implementation platforms in this thesis. This thesis presents techniques allowing faster implementations than existing ones. Such techniques are necessary in order to use high-security cryptographic algorithms in applications requiring high data rates, for example, in heavily loaded network servers. The focus is on Advanced Encryption Standard (AES), the most commonly used secret-key cryptographic algorithm, and Elliptic Curve Cryptography (ECC), public-key cryptographic algorithms which have gained popularity in the recent years and are replacing traditional public-key cryptosystems, such as RSA. Because these algorithms are well-defined and widely-used, the results of this thesis can be directly applied in practice. The contributions of this thesis include improvements to both algorithms and techniques for implementing them. Algorithms are modified in order to make them more suitable for hardware implementation, especially, focusing on increasing parallelism. Several FPGA implementations exploiting these modifications are presented in the thesis including some of the fastest implementations available in the literature. The most important contributions of this thesis relate to ECC and, specifically, to a family of elliptic curves providing faster computations called Koblitz curves. The results of this thesis can, in their part, enable increasing use of cryptographic algorithms in various practical applications where high computation speed is an issue

    Multiple bit error correcting architectures over finite fields

    Get PDF
    This thesis proposes techniques to mitigate multiple bit errors in GF arithmetic circuits. As GF arithmetic circuits such as multipliers constitute the complex and important functional unit of a crypto-processor, making them fault tolerant will improve the reliability of circuits that are employed in safety applications and the errors may cause catastrophe if not mitigated. Firstly, a thorough literature review has been carried out. The merits of efficient schemes are carefully analyzed to study the space for improvement in error correction, area and power consumption. Proposed error correction schemes include bit parallel ones using optimized BCH codes that are useful in applications where power and area are not prime concerns. The scheme is also extended to dynamically correcting scheme to reduce decoder delay. Other method that suits low power and area applications such as RFIDs and smart cards using cross parity codes is also proposed. The experimental evaluation shows that the proposed techniques can mitigate single and multiple bit errors with wider error coverage compared to existing methods with lesser area and power consumption. The proposed scheme is used to mask the errors appearing at the output of the circuit irrespective of their cause. This thesis also investigates the error mitigation schemes in emerging technologies (QCA, CNTFET) to compare area, power and delay with existing CMOS equivalent. Though the proposed novel multiple error correcting techniques can not ensure 100% error mitigation, inclusion of these techniques to actual design can improve the reliability of the circuits or increase the difficulty in hacking crypto-devices. Proposed schemes can also be extended to non GF digital circuits
    corecore