993 research outputs found

    Generalised Mersenne Numbers Revisited

    Get PDF
    Generalised Mersenne Numbers (GMNs) were defined by Solinas in 1999 and feature in the NIST (FIPS 186-2) and SECG standards for use in elliptic curve cryptography. Their form is such that modular reduction is extremely efficient, thus making them an attractive choice for modular multiplication implementation. However, the issue of residue multiplication efficiency seems to have been overlooked. Asymptotically, using a cyclic rather than a linear convolution, residue multiplication modulo a Mersenne number is twice as fast as integer multiplication; this property does not hold for prime GMNs, unless they are of Mersenne's form. In this work we exploit an alternative generalisation of Mersenne numbers for which an analogue of the above property --- and hence the same efficiency ratio --- holds, even at bitlengths for which schoolbook multiplication is optimal, while also maintaining very efficient reduction. Moreover, our proposed primes are abundant at any bitlength, whereas GMNs are extremely rare. Our multiplication and reduction algorithms can also be easily parallelised, making our arithmetic particularly suitable for hardware implementation. Furthermore, the field representation we propose also naturally protects against side-channel attacks, including timing attacks, simple power analysis and differential power analysis, which is essential in many cryptographic scenarios, in constrast to GMNs.Comment: 32 pages. Accepted to Mathematics of Computatio

    Residue Number Systems: a Survey

    Get PDF

    VLSI architectures for public key cryptology

    Get PDF

    Versatile Montgomery Multiplier Architectures

    Get PDF
    Several algorithms for Public Key Cryptography (PKC), such as RSA, Diffie-Hellman, and Elliptic Curve Cryptography, require modular multiplication of very large operands (sizes from 160 to 4096 bits) as their core arithmetic operation. To perform this operation reasonably fast, general purpose processors are not always the best choice. This is why specialized hardware, in the form of cryptographic co-processors, become more attractive. Based upon the analysis of recent publications on hardware design for modular multiplication, this M.S. thesis presents a new architecture that is scalable with respect to word size and pipelining depth. To our knowledge, this is the first time a word based algorithm for Montgomery\u27s method is realized using high-radix bit-parallel multipliers working with two different types of finite fields (unified architecture for GF(p) and GF(2n)). Previous approaches have relied mostly on bit serial multiplication in combination with massive pipelining, or Radix-8 multiplication with the limitation to a single type of finite field. Our approach is centered around the notion that the optimal delay in bit-parallel multipliers grows with logarithmic complexity with respect to the operand size n, O(log3/2 n), while the delay of bit serial implementations grows with linear complexity O(n). Our design has been implemented in VHDL, simulated and synthesized in 0.5μ CMOS technology. The synthesized net list has been verified in back-annotated timing simulations and analyzed in terms of performance and area consumption

    High Performance Quantum Modular Multipliers

    Full text link
    We present a novel set of reversible modular multipliers applicable to quantum computing, derived from three classical techniques: 1) traditional integer division, 2) Montgomery residue arithmetic, and 3) Barrett reduction. Each multiplier computes an exact result for all binary input values, while maintaining the asymptotic resource complexity of a single (non-modular) integer multiplier. We additionally conduct an empirical resource analysis of our designs in order to determine the total gate count and circuit depth of each fully constructed circuit, with inputs as large as 2048 bits. Our comparative analysis considers both circuit implementations which allow for arbitrary (controlled) rotation gates, as well as those restricted to a typical fault-tolerant gate set

    Scalable Energy-efficient Microarchitectures with Computational Error Tolerance

    Get PDF
    Dennard scaling of conventional semiconductor technology has reached its limit resulting in issues pertaining to leakage current and threshold voltage. Energy-savings found at the transistor level by simply lowering supply voltage are no longer available for these devices (e.g., MOSFETs) and has reached the Landauer-Shannon limit. Recent proposals of minivolt switch technologies aim to extend the technology scaling roadmap by maintaining a high on/off ratio of drain current with a much lower supply voltage. However, high intermittent error probabilities in millivolt switches constraints their Vdd reduction for traditional architectures. Thus, there is an urgent need for scalable and energy-efficient micro-architectures with computational error-tolerance. This thesis systematically leverages the error detection and correction properties of the Redundant Residue Number System (RRNS) by varying the number of non-redundant (n) and redundant (r) components (residues), and selects and discusses trade-offs about configuration points from a two-dimensional (n, r)-RRNS design plane that meet certain capabilities of error detection and/or correction. Being able to efficiently handle resilience in this (n, r)-RRNS plane significantly improves reliability, allowing further Vdd reduction and energy savings. First, the necessary implementation details of RRNS cores are discussed. Second, scalable RRNS micro-architectures that simultaneously support both error-correction and checkpointing with restart capabilities for uncorrectable errors are proposed. Third, novel RRNS-based adaptive checkpointing&restart mechanisms are designed that automatically guarantee reliability while minimizing the energy-delay product (EDP). Finally, the RRNS design space is explored to find the optimal (n, r) configuration points. For similar reliability when compared to a conventional binary core (running at high Vdd) without computational error tolerance, the proposed RRNS scalable micro-architecture reduces EDP by 53% on average for memory-intensive workloads and by 67% on average for non-memory-intensive workloads. This thesis's second topic is to alleviate fault rate and power consumption issues of exascale computing. Faults in High-Performance Computing (HPC) have become an urgent challenge with estimated Mean Time Between Failures (MTBF) of exascale system projected as only several minutes with contemporary methodologies. Unfortunately, existing error-tolerance technologies in the context of HPC systems have serious deficiencies such as insufficient error-tolerance coverage, high power consumption, and difficult integration with existing workloads. Considering Department of Energy (DOE) guidelines that limit exascale power consumption to 20 MW, this thesis highlights the issue of energy usage and proposes a thread-level fault tolerance mechanism compatible with current state-of-the art exascale programming models while simultaneously meeting the requirements of full system error protection. Additionally, an efficient micro-architecture and corresponding mechanisms that can support thread level RRNS are discussed. Experimental results show that this strategy reduces energy consumption by 62.25% and the Energy-Delay-Product by 58.67% on average when compared with state-of-the-art black box resilience techniques.Ph.D

    BINARY TO RNS ENCODER FOR THE MODULI SET {2n−1,2n ,2n+1} WITH EMBEDDED DIMINISHED-1 CHANNEL FOR DSP APPLICATION

    Get PDF
    Architecture of binary to residue number system encoder based onthe moduli set {2n − 1,2n,2n + 1} with embedded modulo 2n + 1 channel in the diminished-1 representation, which can be used instead of the standard modulo 2n+1 channel, is presented. We consider the binary numbers with dynamic range of proposed moduli set which is 23n−2n. Within this dynamic range, 3n-bit binary number is partitioned into three n-bit parts and converted to residue numbers. The proposed architecture based on moduli set {2n−1,2n ,2n+1} with embedded diminished-1 encoded channel have been mapped on Xilinx FPGA chip. The proposed architecture can be utilized in conjunction with any fast binary adder without requiring any extra hardware

    Architectures and implementations for the Polynomial Ring Engine over small residue rings

    Get PDF
    This work considers VLSI implementations for the recently introduced Polynomial Ring Engine (PRE) using small residue rings. To allow for a comprehensive approach to the implementation of the PRE mappings for DSP algorithms, this dissertation introduces novel techniques ranging from system level architectures to transistor level considerations. The Polynomial Ring Engine combines both classical residue mappings and new polynomial mappings. This dissertation develops a systematic approach for generating pipelined systolic/ semi-systolic structures for the PRE mappings. An example architecture is constructed and simulated to illustrate the properties of the new architectures. To simultaneously achieve large computational dynamic range and high throughput rate the basic building blocks of the PRE architecture use transistor size profiling. Transistor sizing software is developed for profiling the Switching Tree dynamic logic used to build the basic modulo blocks. The software handles complex nFET structures using a simple iterative algorithm. Issues such as convergence of the iterative technique and validity of the sizing formulae have been treated with an appropriate mathematical analysis. As an illustration of the use of PRE architectures for modem DSP computational problems, a Wavelet Transform for HDTV image compression is implemented. An interesting use is made of the PRE technique of using polynomial indeterminates as \u27placeholders\u27 for components of the processed data. In this case we use an indeterminate to symbolically handle the irrational number [square root of 3] of the Daubechie mother wavelet for N = 4. Finally, a multi-level fault tolerant PRE architecture is developed by combining the classical redundant residue approach and the circuit parity check approach. The proposed architecture uses syndromes to correct faulty residue channels and an embedded parity check to correct faulty computational channels. The architecture offers superior fault detection and correction with online data interruption

    Tamper-Resistant Arithmetic for Public-Key Cryptography

    Get PDF
    Cryptographic hardware has found many uses in many ubiquitous and pervasive security devices with a small form factor, e.g. SIM cards, smart cards, electronic security tokens, and soon even RFIDs. With applications in banking, telecommunication, healthcare, e-commerce and entertainment, these devices use cryptography to provide security services like authentication, identification and confidentiality to the user. However, the widespread adoption of these devices into the mass market, and the lack of a physical security perimeter have increased the risk of theft, reverse engineering, and cloning. Despite the use of strong cryptographic algorithms, these devices often succumb to powerful side-channel attacks. These attacks provide a motivated third party with access to the inner workings of the device and therefore the opportunity to circumvent the protection of the cryptographic envelope. Apart from passive side-channel analysis, which has been the subject of intense research for over a decade, active tampering attacks like fault analysis have recently gained increased attention from the academic and industrial research community. In this dissertation we address the question of how to protect cryptographic devices against this kind of attacks. More specifically, we focus our attention on public key algorithms like elliptic curve cryptography and their underlying arithmetic structure. In our research we address challenges such as the cost of implementation, the level of protection, and the error model in an adversarial situation. The approaches that we investigated all apply concepts from coding theory, in particular the theory of cyclic codes. This seems intuitive, since both public key cryptography and cyclic codes share finite field arithmetic as a common foundation. The major contributions of our research are (a) a generalization of cyclic codes that allow embedding of finite fields into redundant rings under a ring homomorphism, (b) a new family of non-linear arithmetic residue codes with very high error detection probability, (c) a set of new low-cost arithmetic primitives for optimal extension field arithmetic based on robust codes, and (d) design techniques for tamper resilient finite state machines
    • …
    corecore