1,360 research outputs found

    A versatile Montgomery multiplier architecture with characteristic three support

    Get PDF
    We present a novel unified core design which is extended to realize Montgomery multiplication in the fields GF(2n), GF(3m), and GF(p). Our unified design supports RSA and elliptic curve schemes, as well as the identity-based encryption which requires a pairing computation on an elliptic curve. The architecture is pipelined and is highly scalable. The unified core utilizes the redundant signed digit representation to reduce the critical path delay. While the carry-save representation used in classical unified architectures is only good for addition and multiplication operations, the redundant signed digit representation also facilitates efficient computation of comparison and subtraction operations besides addition and multiplication. Thus, there is no need for a transformation between the redundant and the non-redundant representations of field elements, which would be required in the classical unified architectures to realize the subtraction and comparison operations. We also quantify the benefits of the unified architectures in terms of area and critical path delay. We provide detailed implementation results. The metric shows that the new unified architecture provides an improvement over a hypothetical non-unified architecture of at least 24.88%, while the improvement over a classical unified architecture is at least 32.07%

    A high-speed integrated circuit with applications to RSA Cryptography

    Get PDF
    Merged with duplicate record 10026.1/833 on 01.02.2017 by CS (TIS)The rapid growth in the use of computers and networks in government, commercial and private communications systems has led to an increasing need for these systems to be secure against unauthorised access and eavesdropping. To this end, modern computer security systems employ public-key ciphers, of which probably the most well known is the RSA ciphersystem, to provide both secrecy and authentication facilities. The basic RSA cryptographic operation is a modular exponentiation where the modulus and exponent are integers typically greater than 500 bits long. Therefore, to obtain reasonable encryption rates using the RSA cipher requires that it be implemented in hardware. This thesis presents the design of a high-performance VLSI device, called the WHiSpER chip, that can perform the modular exponentiations required by the RSA cryptosystem for moduli and exponents up to 506 bits long. The design has an expected throughput in excess of 64kbit/s making it attractive for use both as a general RSA processor within the security function provider of a security system, and for direct use on moderate-speed public communication networks such as ISDN. The thesis investigates the low-level techniques used for implementing high-speed arithmetic hardware in general, and reviews the methods used by designers of existing modular multiplication/exponentiation circuits with respect to circuit speed and efficiency. A new modular multiplication algorithm, MMDDAMMM, based on Montgomery arithmetic, together with an efficient multiplier architecture, are proposed that remove the speed bottleneck of previous designs. Finally, the implementation of the new algorithm and architecture within the WHiSpER chip is detailed, along with a discussion of the application of the chip to ciphering and key generation

    Algorithms and VLSI architectures for parametric additive synthesis

    Get PDF
    A parametric additive synthesis approach to sound synthesis is advantageous as it can model sounds in a large scale manner, unlike the classical sinusoidal additive based synthesis paradigms. It is known that a large body of naturally occurring sounds are resonant in character and thus fit the concept well. This thesis is concerned with the computational optimisation of a super class of form ant synthesis which extends the sinusoidal parameters with a spread parameter known as band width. Here a modified formant algorithm is introduced which can be traced back to work done at IRCAM, Paris. When impulse driven, a filter based approach to modelling a formant limits the computational work-load. It is assumed that the filter's coefficients are fixed at initialisation, thus avoiding interpolation which can cause the filter to become chaotic. A filter which is more complex than a second order section is required. Temporal resolution of an impulse generator is achieved by using a two stage polyphase decimator which drives many filterbanks. Each filterbank describes one formant and is composed of sub-elements which allow variation of the formant’s parameters. A resource manager is discussed to overcome the possibility of all sub- banks operating in unison. All filterbanks for one voice are connected in series to the impulse generator and their outputs are summed and scaled accordingly. An explorative study of number systems for DSP algorithms and their architectures is investigated. I invented a new theoretical mechanism for multi-level logic based DSP. Its aims are to reduce the number of transistors and to increase their functionality. A review of synthesis algorithms and VLSI architectures are discussed in a case study between a filter based bit-serial and a CORDIC based sinusoidal generator. They are both of similar size, but the latter is always guaranteed to be stable

    Versatile Montgomery Multiplier Architectures

    Get PDF
    Several algorithms for Public Key Cryptography (PKC), such as RSA, Diffie-Hellman, and Elliptic Curve Cryptography, require modular multiplication of very large operands (sizes from 160 to 4096 bits) as their core arithmetic operation. To perform this operation reasonably fast, general purpose processors are not always the best choice. This is why specialized hardware, in the form of cryptographic co-processors, become more attractive. Based upon the analysis of recent publications on hardware design for modular multiplication, this M.S. thesis presents a new architecture that is scalable with respect to word size and pipelining depth. To our knowledge, this is the first time a word based algorithm for Montgomery\u27s method is realized using high-radix bit-parallel multipliers working with two different types of finite fields (unified architecture for GF(p) and GF(2n)). Previous approaches have relied mostly on bit serial multiplication in combination with massive pipelining, or Radix-8 multiplication with the limitation to a single type of finite field. Our approach is centered around the notion that the optimal delay in bit-parallel multipliers grows with logarithmic complexity with respect to the operand size n, O(log3/2 n), while the delay of bit serial implementations grows with linear complexity O(n). Our design has been implemented in VHDL, simulated and synthesized in 0.5μ CMOS technology. The synthesized net list has been verified in back-annotated timing simulations and analyzed in terms of performance and area consumption

    Multiplierless DCT Algorithm for Image Compression Applications

    Get PDF
    This paper presents a novel error-free (infinite-precision) architecture for the fast implementation of 8x8 2-D Discrete Cosine Transform. The architecture uses a new algebraic integer encoding of a 1-D radix-8 DCT that allows the separable computation of a 2-D 8x8 DCT without any intermediate number representation conversions. This is a considerable improvement on previously introduced algebraic integer encoding techniques to compute both DCT and IDCT which eliminates the requirements to approximate the transformation matrix ele- ments by obtaining their exact representations and hence mapping the transcendental functions without any errors. Apart from the multiplication-free nature, this new mapping scheme fits to this algorithm, eliminating any computational or quantization errors and resulting short-word-length and high-speed-design

    Efficient Side-Channel Aware Elliptic Curve Cryptosystems over Prime Fields

    Get PDF
    Elliptic Curve Cryptosystems (ECCs) are utilized as an alternative to traditional public-key cryptosystems, and are more suitable for resource limited environments due to smaller parameter size. In this dissertation we carry out a thorough investigation of side-channel attack aware ECC implementations over finite fields of prime characteristic including the recently introduced Edwards formulation of elliptic curves, which have built-in resiliency against simple side-channel attacks. We implement Joye\u27s highly regular add-always scalar multiplication algorithm both with the Weierstrass and Edwards formulation of elliptic curves. We also propose a technique to apply non-adjacent form (NAF) scalar multiplication algorithm with side-channel security using the Edwards formulation. Our results show that the Edwards formulation allows increased area-time performance with projective coordinates. However, the Weierstrass formulation with affine coordinates results in the simplest architecture, and therefore has the best area-time performance as long as an efficient modular divider is available

    A computer-aided design for digital filter implementation

    Get PDF
    Imperial Users onl

    Residue Number Systems: a Survey

    Get PDF

    Design and implementation of high-radix arithmetic systems based on the SDNR/RNS data representation

    Get PDF
    This project involved the design and implementation of high-radix arithmetic systems based on the hybrid SDNRIRNS data representation. Some real-time applications require a real-time arithmetic system. An SDNR/RNS arithmetic system provides parallel, real-time processing. The advantages and disadvantages of high-radix SDNR/RNS arithmetic, and the feasibility of implementing SDNR/RNS arithmetic systems in CMOS VLSI technology, were investigated in this project. A common methodological model, which included the stages of analysis, design, implementation, testing, and simulation, was followed. The combination of the SDNR and RNS transforms potential complex logic networks into simpler logic blocks. It was found that when constructing a SDNRIRNS adder, factors such as the radix, digit set, and moduli must be taken into account. There are many avenues still to explore. For example, implementing other arithmetic systems in the same CMOS VLSI technology used in this project and comparing them to equivalent SDNR/RNS systems would provide a set of benchmarks. These benchmarks would be useful in addressing issues relating to relative performance
    • …
    corecore