1,747 research outputs found
Fast Quantum Modular Exponentiation
We present a detailed analysis of the impact on modular exponentiation of
architectural features and possible concurrent gate execution. Various
arithmetic algorithms are evaluated for execution time, potential concurrency,
and space tradeoffs. We find that, to exponentiate an n-bit number, for storage
space 100n (twenty times the minimum 5n), we can execute modular exponentiation
two hundred to seven hundred times faster than optimized versions of the basic
algorithms, depending on architecture, for n=128. Addition on a neighbor-only
architecture is limited to O(n) time when non-neighbor architectures can reach
O(log n), demonstrating that physical characteristics of a computing device
have an important impact on both real-world running time and asymptotic
behavior. Our results will help guide experimental implementations of quantum
algorithms and devices.Comment: to appear in PRA 71(5); RevTeX, 12 pages, 12 figures; v2 revision is
substantial, with new algorithmic variants, much shorter and clearer text,
and revised equation formattin
A VLSI Array Architecture for Realization of DFT, DHT, DCT and DST
A unified array architecture is described for computation of DFT, DHT, DCT and DST using a modified CORDIC (CoOrdinate Rotation DIgital Computer) arithmetic unit as the basic Processing Element (PE). All these four transforms can be computed by simple rearrangement of input samples. Compared to five other existing architectures, this one has the advantage in speed in terms of latency and throughput. Moreover, the simple local neighborhood interprocessor connections make it convenient for VLSI implementation. The architecture can be extended to compute transformation of longer length by judicially cascading the modules of shorter transformation length which will be suitable for Wafer Scale Integration (WSI). CORDIC is designed using Transmission Gate Logic (TGL) on sea of gates semicustom environment. Simulation results show that this architecture may be a suitable candidate for low power/low voltage applications
Realizing arbitrary-precision modular multiplication with a fixed-precision multiplier datapath
Within the context of cryptographic hardware, the term scalability refers to the ability to process operands of any size, regardless of the precision of the underlying data path or registers. In this paper we present a simple yet effective technique for increasing the scalability of a fixed-precision Montgomery multiplier. Our idea is to extend the datapath of a Montgomery multiplier in such a way that it can also perform an ordinary multiplication of two n-bit operands (without modular reduction), yielding a 2n-bit result. This
conventional (nxn->2n)-bit multiplication is then used as a âsub-routineâ to realize arbitrary-precision Montgomery multiplication according to standard software algorithms such as Coarsely Integrated Operand Scanning (CIOS). We
show that performing a 2n-bit modular multiplication on an n-bit multiplier can be done in 5n clock cycles, whereby we assume that the n-bit modular multiplication takes n cycles. Extending a Montgomery multiplier for this extra
functionality requires just some minor modifications of the datapath and entails a slight increase in silicon area
Efficient modular arithmetic units for low power cryptographic applications
The demand for high security in energy constrained devices such as mobiles and PDAs is growing rapidly. This leads to the need for efficient design of cryptographic algorithms which offer data integrity, authentication, non-repudiation and confidentiality of the encrypted data and communication channels. The public key cryptography is an ideal choice for data integrity, authentication and non-repudiation whereas the private key cryptography ensures the confidentiality of the data transmitted. The latter has an extremely high encryption speed but it has certain limitations which make it unsuitable for use in certain applications. Numerous public key cryptographic algorithms are available in the literature which comprise modular arithmetic modules such as modular addition, multiplication, inversion and exponentiation. Recently, numerous cryptographic algorithms have been proposed based on modular arithmetic which are scalable, do word based operations and efficient in various aspects. The modular arithmetic modules play a crucial role in the overall performance of the cryptographic processor. Hence, better results can be obtained by designing efficient arithmetic modules such as modular addition, multiplication, exponentiation and squaring. This thesis is organized into three papers, describes the efficient implementation of modular arithmetic units, application of these modules in International Data Encryption Algorithm (IDEA). Second paper describes the IDEA algorithm implementation using the existing techniques and using the proposed efficient modular units. The third paper describes the fault tolerant design of a modular unit which has online self-checking capability --Abstract, page iv
The use of reversible logic gates in the design of residue number systems
Reversible computing is an emerging technique to achieve ultra-low-power circuits. Reversible arithmetic circuits allow for achieving energy-efficient high-performance computational systems. Residue number systems (RNS) provide parallel and fault-tolerant additions and multiplications without carry propagation between residue digits. The parallelism and fault-tolerance features of RNS can be leveraged to achieve high-performance reversible computing. This paper proposed RNS full reversible circuits, including forward converters, modular adders and multipliers, and reverse converters used for a class of RNS moduli sets with the composite form {2k, 2p-1}. Modulo 2n-1, 2n, and 2n+1 adders and multipliers were designed using reversible gates. Besides, reversible forward and reverse converters for the 3-moduli set {2n-1, 2n+k, 2n+1} have been designed. The proposed RNS-based reversible computing approach has been applied for consecutive multiplications with an improvement of above 15% in quantum cost after the twelfth iteration, and above 27% in quantum depth after the ninth iteration. The findings show that the use of the proposed RNS-based reversible computing in convolution results in a significant improvement in quantum depth in comparison to conventional methods based on weighted binary adders and multipliers
Residue Number System Based Building Blocks for Applications in Digital Signal Processing
PĆedklĂĄdanĂĄ disertaÄnĂ prĂĄce se zabĂœvĂĄ nĂĄvrhem zĂĄkladnĂch blokĆŻ v systĂ©mu zbytkovĂœch tĆĂd pro zvĂœĆĄenĂ vĂœkonu aplikacĂ urÄenĂœch pro digitĂĄlnĂ zpracovĂĄnĂ signĂĄlĆŻ (DSP). SystĂ©m zbytkovĂœch tĆĂd (RNS) je nevĂĄhovĂĄ ÄĂselnĂĄ soustava, jeĆŸ umoĆŸĆuje provĂĄdÄt paralelizovatelnĂ©, vysokorychlostnĂ, bezpeÄnĂ© a proti chybĂĄm odolnĂ© aritmetickĂ© operace, kterĂ© jsou zpracovĂĄvĂĄny bez pĆenosu mezi ĆĂĄdy. Tyto vlastnosti jej ÄinĂ znaÄnÄ perspektivnĂm pro pouĆŸitĂ v DSP aplikacĂch nĂĄroÄnĂœch na vĂœpoÄetnĂ vĂœkon a odolnĂœch proti chybĂĄm. TypickĂœ RNS systĂ©m se sklĂĄdĂĄ ze tĆĂ hlavnĂch ÄĂĄstĂ: pĆevodnĂku z binĂĄrnĂho kĂłdu do RNS, kterĂœ poÄĂtĂĄ ekvivalent vstupnĂch binĂĄrnĂch hodnot v systĂ©mu zbytkovĂœch tĆĂd, dĂĄle jsou to paralelnÄ ĆazenĂ© RNS aritmetickĂ© jednotky, kterĂ© provĂĄdÄjĂ aritmetickĂ© operace s operandy jiĆŸ pĆevedenĂœmi do RNS. PoslednĂ ÄĂĄst pak tvoĆĂ pĆevodnĂk z RNS do binĂĄrnĂho kĂłdu, kterĂœ pĆevĂĄdĂ vĂœsledek zpÄt do vĂœchozĂho binĂĄrnĂho kĂłdu. HlavnĂm cĂlem tĂ©to disertaÄnĂ prĂĄce bylo navrhnout novĂ© struktury zĂĄkladnĂch blokĆŻ vĂœĆĄe zmiĆovanĂ©ho systĂ©mu zbytkovĂœch tĆĂd, kterĂ© mohou bĂœt vyuĆŸity v aplikacĂch DSP. Tato disertaÄnĂ prĂĄce pĆedklĂĄdĂĄ zlepĆĄenĂ a nĂĄvrhy novĂœch struktur komponent RNS, simulaci a takĂ© ovÄĆenĂ jejich funkÄnosti prostĆednictvĂm implementace v obvodech FPGA. KromÄ nĂĄvrhĆŻ novĂ© struktury zĂĄkladnĂch komponentĆŻ RNS je prezentovĂĄn takĂ© podrobnĂœ vĂœzkum rĆŻznĂœch sad modulĆŻ, kterĂœ je srovnĂĄvĂĄ a determinuje nejefektivnÄjĆĄĂ sadu pro rĆŻznĂ© dynamickĂ© rozsahy. DalĆĄĂm z klĂÄovĂœch pĆĂnosĆŻ disertaÄnĂ prĂĄce je objevenĂ a ovÄĆenĂ podmĂnky urÄujĂcĂ vĂœbÄr optimĂĄlnĂ sady modulĆŻ, kterĂĄ umoĆŸĆuje zvĂœĆĄit vĂœkonnost aplikacĂ DSP. DĂĄle byla navrĆŸena aplikace pro zpracovĂĄnĂ obrazu vyuĆŸĂvajĂcĂ RNS, kterĂĄ mĂĄ vĆŻÄi klasickĂ© binĂĄrnĂ implementanci niĆŸĆĄĂ spotĆebu a vyĆĄĆĄĂ maximĂĄlnĂ pracovnĂ frekvenci. V zĂĄvÄru prĂĄce byla vyhodnocena hlavnĂ kritĂ©ria pĆi rozhodovĂĄnĂ, zda je vhodnÄjĆĄĂ pro danou aplikaci vyuĆŸĂt binĂĄrnĂ ÄĂselnou soustavu nebo RNS.This doctoral thesis deals with designing residue number system based building blocks to enhance the performance of digital signal processing applications. The residue number system (RNS) is a non-weighted number system that provides carry-free, parallel, high speed, secure and fault tolerant arithmetic operations. These features make it very attractive to be used in high-performance and fault tolerant digital signal processing (DSP) applications. A typical RNS system consists of three main components; the first one is the binary to residue converter that computes the RNS equivalent of the inputs represented in the binary number system. The second component in this system is parallel residue arithmetic units that perform arithmetic operations on the operands already represented in RNS. The last component is the residue to binary converter, which converts the outputs back into their binary representation. The main aim of this thesis was to propose novel structures of the basic components of this system in order to be later used as fundamental units in DSP applications. This thesis encloses improving and designing novel structures of these components, simulating and verifying their efficiency via FPGA implementation. In addition to suggesting novel structures of basic RNS components, a detailed study on different moduli sets that compares and determines the most efficient one for different dynamic range requirements is also presented. One of the main outcomes of this thesis is concluding and verifying the main condition that should be met when choosing a moduli set, in order to improve the timing performance of a DSP application. An RNS-based image processing application is also proposed. Its efficiency, in terms of timing performance and power consumption, is proved via comparing it with a binary-based one. Finally, the main considerations that should be taken into account when choosing to use the binary number system or RNS are also discussed in details.
- âŠ