583 research outputs found

    Fast Quantum Modular Exponentiation

    Full text link
    We present a detailed analysis of the impact on modular exponentiation of architectural features and possible concurrent gate execution. Various arithmetic algorithms are evaluated for execution time, potential concurrency, and space tradeoffs. We find that, to exponentiate an n-bit number, for storage space 100n (twenty times the minimum 5n), we can execute modular exponentiation two hundred to seven hundred times faster than optimized versions of the basic algorithms, depending on architecture, for n=128. Addition on a neighbor-only architecture is limited to O(n) time when non-neighbor architectures can reach O(log n), demonstrating that physical characteristics of a computing device have an important impact on both real-world running time and asymptotic behavior. Our results will help guide experimental implementations of quantum algorithms and devices.Comment: to appear in PRA 71(5); RevTeX, 12 pages, 12 figures; v2 revision is substantial, with new algorithmic variants, much shorter and clearer text, and revised equation formattin

    An algorithmic and architectural study on Montgomery exponentiation in RNS

    Get PDF
    The modular exponentiation on large numbers is computationally intensive. An effective way for performing this operation consists in using Montgomery exponentiation in the Residue Number System (RNS). This paper presents an algorithmic and architectural study of such exponentiation approach. From the algorithmic point of view, new and state-of-the-art opportunities that come from the reorganization of operations and precomputations are considered. From the architectural perspective, the design opportunities offered by well-known computer arithmetic techniques are studied, with the aim of developing an efficient arithmetic cell architecture. Furthermore, since the use of efficient RNS bases with a low Hamming weight are being considered with ever more interest, four additional cell architectures specifically tailored to these bases are developed and the tradeoff between benefits and drawbacks is carefully explored. An overall comparison among all the considered algorithmic approaches and cell architectures is presented, with the aim of providing the reader with an extensive overview of the Montgomery exponentiation opportunities in RNS

    Realizing arbitrary-precision modular multiplication with a fixed-precision multiplier datapath

    Get PDF
    Within the context of cryptographic hardware, the term scalability refers to the ability to process operands of any size, regardless of the precision of the underlying data path or registers. In this paper we present a simple yet effective technique for increasing the scalability of a fixed-precision Montgomery multiplier. Our idea is to extend the datapath of a Montgomery multiplier in such a way that it can also perform an ordinary multiplication of two n-bit operands (without modular reduction), yielding a 2n-bit result. This conventional (nxn->2n)-bit multiplication is then used as a “sub-routine” to realize arbitrary-precision Montgomery multiplication according to standard software algorithms such as Coarsely Integrated Operand Scanning (CIOS). We show that performing a 2n-bit modular multiplication on an n-bit multiplier can be done in 5n clock cycles, whereby we assume that the n-bit modular multiplication takes n cycles. Extending a Montgomery multiplier for this extra functionality requires just some minor modifications of the datapath and entails a slight increase in silicon area

    Montgomery and RNS for RSA Hardware Implementation

    Get PDF
    There are many architectures for RSA hardware implementation which improve its performance. Two main methods for this purpose are Montgomery and RNS. These are fast methods to convert plaintext to ciphertext in RSA algorithm with hardware implementation. RNS is faster than Montgomery but it uses more area. The goal of this paper is to compare these two methods based on the speed and on the used area. For this purpose the architecture that has a better performance for each method is selected, and some modification is done to enhance their performance. This comparison can be used to select the proper method for hardware implementation in both FPGA and ASIC design

    Residue Number System Based Building Blocks for Applications in Digital Signal Processing

    Get PDF
    Předkládaná disertační práce se zabývá návrhem základních bloků v systému zbytkových tříd pro zvýšení výkonu aplikací určených pro digitální zpracování signálů (DSP). Systém zbytkových tříd (RNS) je neváhová číselná soustava, jež umožňuje provádět paralelizovatelné, vysokorychlostní, bezpečné a proti chybám odolné aritmetické operace, které jsou zpracovávány bez přenosu mezi řády. Tyto vlastnosti jej činí značně perspektivním pro použití v DSP aplikacích náročných na výpočetní výkon a odolných proti chybám. Typický RNS systém se skládá ze tří hlavních částí: převodníku z binárního kódu do RNS, který počítá ekvivalent vstupních binárních hodnot v systému zbytkových tříd, dále jsou to paralelně řazené RNS aritmetické jednotky, které provádějí aritmetické operace s operandy již převedenými do RNS. Poslední část pak tvoří převodník z RNS do binárního kódu, který převádí výsledek zpět do výchozího binárního kódu. Hlavním cílem této disertační práce bylo navrhnout nové struktury základních bloků výše zmiňovaného systému zbytkových tříd, které mohou být využity v aplikacích DSP. Tato disertační práce předkládá zlepšení a návrhy nových struktur komponent RNS, simulaci a také ověření jejich funkčnosti prostřednictvím implementace v obvodech FPGA. Kromě návrhů nové struktury základních komponentů RNS je prezentován také podrobný výzkum různých sad modulů, který je srovnává a determinuje nejefektivnější sadu pro různé dynamické rozsahy. Dalším z klíčových přínosů disertační práce je objevení a ověření podmínky určující výběr optimální sady modulů, která umožňuje zvýšit výkonnost aplikací DSP. Dále byla navržena aplikace pro zpracování obrazu využívající RNS, která má vůči klasické binární implementanci nižší spotřebu a vyšší maximální pracovní frekvenci. V závěru práce byla vyhodnocena hlavní kritéria při rozhodování, zda je vhodnější pro danou aplikaci využít binární číselnou soustavu nebo RNS.This doctoral thesis deals with designing residue number system based building blocks to enhance the performance of digital signal processing applications. The residue number system (RNS) is a non-weighted number system that provides carry-free, parallel, high speed, secure and fault tolerant arithmetic operations. These features make it very attractive to be used in high-performance and fault tolerant digital signal processing (DSP) applications. A typical RNS system consists of three main components; the first one is the binary to residue converter that computes the RNS equivalent of the inputs represented in the binary number system. The second component in this system is parallel residue arithmetic units that perform arithmetic operations on the operands already represented in RNS. The last component is the residue to binary converter, which converts the outputs back into their binary representation. The main aim of this thesis was to propose novel structures of the basic components of this system in order to be later used as fundamental units in DSP applications. This thesis encloses improving and designing novel structures of these components, simulating and verifying their efficiency via FPGA implementation. In addition to suggesting novel structures of basic RNS components, a detailed study on different moduli sets that compares and determines the most efficient one for different dynamic range requirements is also presented. One of the main outcomes of this thesis is concluding and verifying the main condition that should be met when choosing a moduli set, in order to improve the timing performance of a DSP application. An RNS-based image processing application is also proposed. Its efficiency, in terms of timing performance and power consumption, is proved via comparing it with a binary-based one. Finally, the main considerations that should be taken into account when choosing to use the binary number system or RNS are also discussed in details.

    Residue Number Systems: a Survey

    Get PDF

    Design of reverse converters for the multi-moduli residue number systems with moduli of forms 2a, 2b - 1, 2c + 1

    Get PDF
    Residue number system (RNS) is a non-weighted integer number representation system that is capable of supporting parallel, carry-free and high speed arithmetic. This system is error-resilient and facilitates error detection, error correction and fault tolerance in digital systems. It finds applications in Digital Signal Processing (DSP) intensive computations like digital filtering, convolution, correlation, Discrete Fourier Transform, Fast Fourier Transform, etc. The basis for an RNS system is a moduli set consisting of relatively prime integers. Proper selection of this moduli set plays a significant role in RNS design because the speed of internal RNS arithmetic circuits as well as the speed and complexity of the residue to binary converter (R/B or Reverse Converter) have a large dependency on the form and number of the selected moduli. Moduli of forms 2a, 2b- 1, 2c + 1 (a, b and c are natural numbers) have the most use in RNS moduli sets as these moduli can be efficiently implemented using usual binary hardware that lead to simple design. Another important consideration for the reverse converter design is the selection of an appropriate conversion algorithm from Chinese Remainder Theorem (CRT), Mixed Radix Conversion (MRC) and the new Chinese Remainder Theorems (New CRT I and New CRT II). This research is focused on designing reverse converters for the multi-moduli RNS sets especially four and five moduli sets with moduli of forms 2a, 2b- 1, 2c + 1 . The residue to binary converters are designed by applying the above conversion algorithms in different possible ways and facilitating the use of modulo (2k) and modulo (2k – 1) adders that lead to simple design of adder based architectures and VLSI efficient implementations (k is a natural number). The area and delay of the proposed converters is analyzed and an efficient reverse converter is suggested from each of the various four and five moduli set converters for a given dynamic range

    Horner's Rule-Based Multiplication over Fp and Fp^n: A Survey

    Get PDF
    International audienceThis paper aims at surveying multipliers based on Horner's rule for finite field arithmetic. We present a generic architecture based on five processing elements and introduce a classification of several algorithms based on our model. We provide the readers with a detailed description of each scheme which should allow them to write a VHDL description or a VHDL code generator

    The use of reversible logic gates in the design of residue number systems

    Get PDF
    Reversible computing is an emerging technique to achieve ultra-low-power circuits. Reversible arithmetic circuits allow for achieving energy-efficient high-performance computational systems. Residue number systems (RNS) provide parallel and fault-tolerant additions and multiplications without carry propagation between residue digits. The parallelism and fault-tolerance features of RNS can be leveraged to achieve high-performance reversible computing. This paper proposed RNS full reversible circuits, including forward converters, modular adders and multipliers, and reverse converters used for a class of RNS moduli sets with the composite form {2k, 2p-1}. Modulo 2n-1, 2n, and 2n+1 adders and multipliers were designed using reversible gates. Besides, reversible forward and reverse converters for the 3-moduli set {2n-1, 2n+k, 2n+1} have been designed. The proposed RNS-based reversible computing approach has been applied for consecutive multiplications with an improvement of above 15% in quantum cost after the twelfth iteration, and above 27% in quantum depth after the ninth iteration. The findings show that the use of the proposed RNS-based reversible computing in convolution results in a significant improvement in quantum depth in comparison to conventional methods based on weighted binary adders and multipliers
    corecore