1,747 research outputs found

    Fast Quantum Modular Exponentiation

    Full text link
    We present a detailed analysis of the impact on modular exponentiation of architectural features and possible concurrent gate execution. Various arithmetic algorithms are evaluated for execution time, potential concurrency, and space tradeoffs. We find that, to exponentiate an n-bit number, for storage space 100n (twenty times the minimum 5n), we can execute modular exponentiation two hundred to seven hundred times faster than optimized versions of the basic algorithms, depending on architecture, for n=128. Addition on a neighbor-only architecture is limited to O(n) time when non-neighbor architectures can reach O(log n), demonstrating that physical characteristics of a computing device have an important impact on both real-world running time and asymptotic behavior. Our results will help guide experimental implementations of quantum algorithms and devices.Comment: to appear in PRA 71(5); RevTeX, 12 pages, 12 figures; v2 revision is substantial, with new algorithmic variants, much shorter and clearer text, and revised equation formattin

    A VLSI Array Architecture for Realization of DFT, DHT, DCT and DST

    No full text
    A unified array architecture is described for computation of DFT, DHT, DCT and DST using a modified CORDIC (CoOrdinate Rotation DIgital Computer) arithmetic unit as the basic Processing Element (PE). All these four transforms can be computed by simple rearrangement of input samples. Compared to five other existing architectures, this one has the advantage in speed in terms of latency and throughput. Moreover, the simple local neighborhood interprocessor connections make it convenient for VLSI implementation. The architecture can be extended to compute transformation of longer length by judicially cascading the modules of shorter transformation length which will be suitable for Wafer Scale Integration (WSI). CORDIC is designed using Transmission Gate Logic (TGL) on sea of gates semicustom environment. Simulation results show that this architecture may be a suitable candidate for low power/low voltage applications

    Realizing arbitrary-precision modular multiplication with a fixed-precision multiplier datapath

    Get PDF
    Within the context of cryptographic hardware, the term scalability refers to the ability to process operands of any size, regardless of the precision of the underlying data path or registers. In this paper we present a simple yet effective technique for increasing the scalability of a fixed-precision Montgomery multiplier. Our idea is to extend the datapath of a Montgomery multiplier in such a way that it can also perform an ordinary multiplication of two n-bit operands (without modular reduction), yielding a 2n-bit result. This conventional (nxn->2n)-bit multiplication is then used as a “sub-routine” to realize arbitrary-precision Montgomery multiplication according to standard software algorithms such as Coarsely Integrated Operand Scanning (CIOS). We show that performing a 2n-bit modular multiplication on an n-bit multiplier can be done in 5n clock cycles, whereby we assume that the n-bit modular multiplication takes n cycles. Extending a Montgomery multiplier for this extra functionality requires just some minor modifications of the datapath and entails a slight increase in silicon area

    Efficient modular arithmetic units for low power cryptographic applications

    Get PDF
    The demand for high security in energy constrained devices such as mobiles and PDAs is growing rapidly. This leads to the need for efficient design of cryptographic algorithms which offer data integrity, authentication, non-repudiation and confidentiality of the encrypted data and communication channels. The public key cryptography is an ideal choice for data integrity, authentication and non-repudiation whereas the private key cryptography ensures the confidentiality of the data transmitted. The latter has an extremely high encryption speed but it has certain limitations which make it unsuitable for use in certain applications. Numerous public key cryptographic algorithms are available in the literature which comprise modular arithmetic modules such as modular addition, multiplication, inversion and exponentiation. Recently, numerous cryptographic algorithms have been proposed based on modular arithmetic which are scalable, do word based operations and efficient in various aspects. The modular arithmetic modules play a crucial role in the overall performance of the cryptographic processor. Hence, better results can be obtained by designing efficient arithmetic modules such as modular addition, multiplication, exponentiation and squaring. This thesis is organized into three papers, describes the efficient implementation of modular arithmetic units, application of these modules in International Data Encryption Algorithm (IDEA). Second paper describes the IDEA algorithm implementation using the existing techniques and using the proposed efficient modular units. The third paper describes the fault tolerant design of a modular unit which has online self-checking capability --Abstract, page iv

    The use of reversible logic gates in the design of residue number systems

    Get PDF
    Reversible computing is an emerging technique to achieve ultra-low-power circuits. Reversible arithmetic circuits allow for achieving energy-efficient high-performance computational systems. Residue number systems (RNS) provide parallel and fault-tolerant additions and multiplications without carry propagation between residue digits. The parallelism and fault-tolerance features of RNS can be leveraged to achieve high-performance reversible computing. This paper proposed RNS full reversible circuits, including forward converters, modular adders and multipliers, and reverse converters used for a class of RNS moduli sets with the composite form {2k, 2p-1}. Modulo 2n-1, 2n, and 2n+1 adders and multipliers were designed using reversible gates. Besides, reversible forward and reverse converters for the 3-moduli set {2n-1, 2n+k, 2n+1} have been designed. The proposed RNS-based reversible computing approach has been applied for consecutive multiplications with an improvement of above 15% in quantum cost after the twelfth iteration, and above 27% in quantum depth after the ninth iteration. The findings show that the use of the proposed RNS-based reversible computing in convolution results in a significant improvement in quantum depth in comparison to conventional methods based on weighted binary adders and multipliers

    Residue Number System Based Building Blocks for Applications in Digital Signal Processing

    Get PDF
    PƙedklĂĄdanĂĄ disertačnĂ­ prĂĄce se zabĂœvĂĄ nĂĄvrhem zĂĄkladnĂ­ch blokĆŻ v systĂ©mu zbytkovĂœch tƙíd pro zvĂœĆĄenĂ­ vĂœkonu aplikacĂ­ určenĂœch pro digitĂĄlnĂ­ zpracovĂĄnĂ­ signĂĄlĆŻ (DSP). SystĂ©m zbytkovĂœch tƙíd (RNS) je nevĂĄhovĂĄ číselnĂĄ soustava, jeĆŸ umoĆŸĆˆuje provĂĄdět paralelizovatelnĂ©, vysokorychlostnĂ­, bezpečnĂ© a proti chybĂĄm odolnĂ© aritmetickĂ© operace, kterĂ© jsou zpracovĂĄvĂĄny bez pƙenosu mezi ƙády. Tyto vlastnosti jej činĂ­ značně perspektivnĂ­m pro pouĆŸitĂ­ v DSP aplikacĂ­ch nĂĄročnĂœch na vĂœpočetnĂ­ vĂœkon a odolnĂœch proti chybĂĄm. TypickĂœ RNS systĂ©m se sklĂĄdĂĄ ze tƙí hlavnĂ­ch částĂ­: pƙevodnĂ­ku z binĂĄrnĂ­ho kĂłdu do RNS, kterĂœ počítĂĄ ekvivalent vstupnĂ­ch binĂĄrnĂ­ch hodnot v systĂ©mu zbytkovĂœch tƙíd, dĂĄle jsou to paralelně ƙazenĂ© RNS aritmetickĂ© jednotky, kterĂ© provĂĄdějĂ­ aritmetickĂ© operace s operandy jiĆŸ pƙevedenĂœmi do RNS. PoslednĂ­ část pak tvoƙí pƙevodnĂ­k z RNS do binĂĄrnĂ­ho kĂłdu, kterĂœ pƙevĂĄdĂ­ vĂœsledek zpět do vĂœchozĂ­ho binĂĄrnĂ­ho kĂłdu. HlavnĂ­m cĂ­lem tĂ©to disertačnĂ­ prĂĄce bylo navrhnout novĂ© struktury zĂĄkladnĂ­ch blokĆŻ vĂœĆĄe zmiƈovanĂ©ho systĂ©mu zbytkovĂœch tƙíd, kterĂ© mohou bĂœt vyuĆŸity v aplikacĂ­ch DSP. Tato disertačnĂ­ prĂĄce pƙedklĂĄdĂĄ zlepĆĄenĂ­ a nĂĄvrhy novĂœch struktur komponent RNS, simulaci a takĂ© ověƙenĂ­ jejich funkčnosti prostƙednictvĂ­m implementace v obvodech FPGA. Kromě nĂĄvrhĆŻ novĂ© struktury zĂĄkladnĂ­ch komponentĆŻ RNS je prezentovĂĄn takĂ© podrobnĂœ vĂœzkum rĆŻznĂœch sad modulĆŻ, kterĂœ je srovnĂĄvĂĄ a determinuje nejefektivnějĆĄĂ­ sadu pro rĆŻznĂ© dynamickĂ© rozsahy. DalĆĄĂ­m z klíčovĂœch pƙínosĆŻ disertačnĂ­ prĂĄce je objevenĂ­ a ověƙenĂ­ podmĂ­nky určujĂ­cĂ­ vĂœběr optimĂĄlnĂ­ sady modulĆŻ, kterĂĄ umoĆŸĆˆuje zvĂœĆĄit vĂœkonnost aplikacĂ­ DSP. DĂĄle byla navrĆŸena aplikace pro zpracovĂĄnĂ­ obrazu vyuĆŸĂ­vajĂ­cĂ­ RNS, kterĂĄ mĂĄ vƯči klasickĂ© binĂĄrnĂ­ implementanci niĆŸĆĄĂ­ spotƙebu a vyĆĄĆĄĂ­ maximĂĄlnĂ­ pracovnĂ­ frekvenci. V zĂĄvěru prĂĄce byla vyhodnocena hlavnĂ­ kritĂ©ria pƙi rozhodovĂĄnĂ­, zda je vhodnějĆĄĂ­ pro danou aplikaci vyuĆŸĂ­t binĂĄrnĂ­ číselnou soustavu nebo RNS.This doctoral thesis deals with designing residue number system based building blocks to enhance the performance of digital signal processing applications. The residue number system (RNS) is a non-weighted number system that provides carry-free, parallel, high speed, secure and fault tolerant arithmetic operations. These features make it very attractive to be used in high-performance and fault tolerant digital signal processing (DSP) applications. A typical RNS system consists of three main components; the first one is the binary to residue converter that computes the RNS equivalent of the inputs represented in the binary number system. The second component in this system is parallel residue arithmetic units that perform arithmetic operations on the operands already represented in RNS. The last component is the residue to binary converter, which converts the outputs back into their binary representation. The main aim of this thesis was to propose novel structures of the basic components of this system in order to be later used as fundamental units in DSP applications. This thesis encloses improving and designing novel structures of these components, simulating and verifying their efficiency via FPGA implementation. In addition to suggesting novel structures of basic RNS components, a detailed study on different moduli sets that compares and determines the most efficient one for different dynamic range requirements is also presented. One of the main outcomes of this thesis is concluding and verifying the main condition that should be met when choosing a moduli set, in order to improve the timing performance of a DSP application. An RNS-based image processing application is also proposed. Its efficiency, in terms of timing performance and power consumption, is proved via comparing it with a binary-based one. Finally, the main considerations that should be taken into account when choosing to use the binary number system or RNS are also discussed in details.
    • 

    corecore