Search CORE

48 research outputs found

Radix-8 Booth Encoded Modulo Multiplier

Author: Anuradha K
Rupesh Kumar Penugonda
Thammishety Narasimharao
Publication venue
Publication date: 24/04/2020
Field of study

Abstract To design an efficient integrated circuit in terms of area, power and speed, has become a challenging task in modern VLSI design field. The encryption and decryption of PKC algorithms are performed by repeated modulo multiplications these multiplications differ from those encountered in signal processing and general computing applications. The Residue Number System (RNS) has emerged as a promising alternative number representation for the design of faster and low power multipliers owing to its merit to distribute a long integer multiplication into several shorter and independent modulo multiplications. The multipliers are the essential elements of the digital signal processing such as filtering, convolution, transformations and Inner products. RNS has also been successfully employed to design fault tolerant digital circuits. The modulo multiplier is usually the noncritical data path among all modulo multipliers in such high-DR RNS multiplier. This timing slack can be exploited to reduce the system area and power consumption without compromising the system performance. With this precept, a family of radix-8 Booth encoded modulo multipliers, with delay adaptable to the RNS multiplier delay, is proposed. In this paper, the radix-8 Booth encoded modulo multipliers whose delay can be tuned to match the RNS delay. In the proposed multiplier, the hard multiple is implemented using small word-length ripple carry adders (RCAs) operating in parallel. The carry-out bits from the adders are not propagated but treated as partial product bits to be accumulated in the CSA tree. The delay of the modulo multiplier can be directly controlled by the word-length of the RCAs to equal the delay of the critical modulo multiplier of the RNS. By combining radix-8 Booth encoded modulo multiplier, CSA and prefix architecture of multiplier, for high speed and low-power is achieved

CiteSeerX

Investigating the VLSI Characterization of Parallel Signed Multipliers for RNS Applications Using FPGAs

Author: Elango S
Gayathri K
Pradeep N
Sampath P
Publication venue: Global Journals Inc. (US)
Publication date: 15/01/2015
Field of study

Signed multiplication is a complex arithmetic operation, which is reflected in its relatively high signal propagation delay, high power dissipation, and large area requirement. High reliability applications such as Cryptography, Residue Number System (RNS) and Digital Signal Processing (DSP)2019;s effective performance is mainly depend on its arithmetic circuit's performance. Trend of using Residue Number System (RNS) instead of Constrain over-whelming Binary representation is promising technique in VLSI Systems and Multiplier is the basic building block of such systems. In this paper we have considered signed Modified Baugh Wooley Multiplier and Modified Booth Encoding (MBE) Multiplier logic for analysis and synthesized on best suited application platform. Analysis has taken account of Delay, Number of Logic Element requirements; Number of Signal Transition for particular sample input and its Power Consumption were analyzed for both Modified Baugh Wooley Multiplier and Modified Booth Encoding Multiplier. Analysis of Multiplier is described in Verilog HDL and Simulated using two different simulators namely Xilinx ISIM and Altera Quartus II. Then for comparative study, both multipliers are synthesized with Xilinx Virtex 7 XCV2000T-2FLG1925 and Altera Cyclone II EP2C35F672C6 and same parameter as discussed above are also evaluated. Booth Recoding provides overall advent of 9.691% in terms of area and approximately 43 % in terms of Delay compared to Modified Baugh Wooley Multiplier implemented using FPGA Technology

Global Journal of Computer Science and Technology (GJCST)

Efficient Computation and FPGA implementation of Fully Homomorphic Encryption with Cloud Computing Significance

Author: Zeng Qiang
Publication venue: 'University of Windsor Leddy Library'
Publication date: 20/12/2018
Field of study

Homomorphic Encryption provides unique security solution for cloud computing. It ensures not only that data in cloud have confidentiality but also that data processing by cloud server does not compromise data privacy. The Fully Homomorphic Encryption (FHE) scheme proposed by Lopez-Alt, Tromer, and Vaikuntanathan (LTV), also known as NTRU(Nth degree truncated polynomial ring) based method, is considered one of the most important FHE methods suitable for practical implementation. In this thesis, an efficient algorithm and architecture for LTV Fully Homomorphic Encryption is proposed. Conventional linear feedback shift register (LFSR) structure is expanded and modified for performing the truncated polynomial ring multiplication in LTV scheme in parallel. Novel and efficient modular multiplier, modular adder and modular subtractor are proposed to support high speed processing of LFSR operations. In addition, a family of special moduli are selected for high speed computation of modular operations. Though the area keeps the complexity of O(Nn^2) with no advantage in circuit level. The proposed architecture effectively reduces the time complexity from O(N log N) to linear time, O(N), compared to the best existing works. An FPGA implementation of the proposed architecture for LTV FHE is achieved and demonstrated. An elaborate comparison of the existing methods and the proposed work is presented, which shows the proposed work gains significant speed up over existing works

Scholarship at UWindsor

Design And Implementation Of Rsa Cryptosystem Using Partially Interleaved Modular Karatsuba-ofman Multiplier

Author: Arış Ahmet
Publication venue: 'Nara Institute of Science and Technology'
Publication date: 06/01/2013
Field of study

Tez (Yüksek Lisans) -- İstanbul Teknik Üniversitesi, Fen Bilimleri Enstitüsü, 2012Thesis (M.Sc.) -- İstanbul Technical University, Institute of Science and Technology, 2012Kriptografinin, yani şifreleme biliminin önemi gün geçtikçe artmaktadır. Kullanılan teknoloji ne olursa olsun güvenli iletişim her zaman en başta gelen ihtiyaçlardan birisi olacaktır. Günümüzde şifreleme, sistemlere kullanıcı hesabıyla giriş yapılıyorken, internetten herhangi bir hizmet ya da ürün satın alınıyorken, iletişim araçları kullanılıyorken, araçların kapıları uzaktan kilitlenip açılıyorken ve daha birçok yerde kullanılmaktadır. Kullanım alanına ve güvenlik ihtiyacının niteliğine göre değişik şifreleme algoritmaları farklı teknolojilerle karşımıza çıkmaktadır. Bu algoritmaların bir tanesi de Rivest-Shamir-Adleman(RSA) şifreleme sistemidir. RSA bankacılık başta olmak üzere birçok sektörde şifreleme ve sayısal imza işlemleri için sıklıkla kullanılmaktadır. RSA algoritması gücünü çok büyük sayıların asal çarpanlarına ayrılmalarındaki zorluktan almaktadır. Özetle çok büyük sayılarla yapılan modüler üs alma işlemlerinden oluşmaktadır. Modüler üs alma işlemleri de özünde modüler çarpma işlemlerinden ibaret olduğu için hızlı bir RSA gerçeklemesi ancak hızlı modüler çarpma işlemi yapan bir tasarımla mümkün olmaktadır. Güvenlik ve hız gibi sebeplerden ötürü RSA kriptosistemi genellikle modüler üs alma işleminin yazılımda gerçeklenmesi ve modüler çarpma işlemlerinin de donanımda tasarlanan özel bloklarla yapılması yoluyla gerçeklenir. Modüler üs alma işlemi için birçok yöntem mevcuttur. Bu yöntemler modüler üs alma işlemi esnasında yapılan modüler çarpma işlemi sayısını değişik yöntemlerle en aza indirmeye çalışırlar. Modüler çarpma işlemi için de bilim dünyasında epeyi çalışmalar yapılmıştır. Modüler çarpma, önce çarpıp sonra indirgeme yapma ya da çarpma ve indirgeme işlemlerini iç-içe yapma gibi iki yöntemle mümkündür. Alan kısıtlamaları sebebiyle çoğunlukla ikinci metot tercih edilmektedir. İndirgeme işleminin uygulanma yönüne göre modüler çarpma algoritmaları soldan sağa doğru işleyenler ve sağdan sola doğru işleyenler olmak üzere ikiye ayrılır. Soldan sağa doğru işleyen yöntemlerin en bilindik örnekleri Blakley ve Barrett algoritmalarıdır. Sağdan sola doğru indirgeme yapan sınıfa ise tek örnek Montgomery yöntemidir. Hızlı çarpma yapan Karatsuba-Ofman, Schönhage-Strassen gibi yöntemler olsa da bu hızlı çarpıcılar indirgeme algoritmalarıyla birleştirilememektedirler. Bu sorun paralel çalışmaya izin vermeyen indirgeme yöntemlerinden kaynaklanmaktadır. Ancak Kaihara ve Takagi tarafından bilim dünyasına sunulan İki Parçalı Modüler çarpma yöntemi bu soruna bir nebze de olsa çözüm bulmaktadır. Bu indirgeme metodu Montgomery algoritmasının sahip olduğu bir özelliği kullanarak modüler çarpmadaki çarpanı ikiye ayırmakta ve böylelikle Blakley ve Montgomery paralel olarak çalışabilmektedir. Hızlı çarpma algoritmalarından birisi olan Karatsuba-Ofman yöntemiyle İki Parçalı indirgemeyi birleştiren ilk çalışma Gökay Saldamlı tarafından ortaya atılmıştır. İki Parçalı Örgü Karatsuba-Ofman çarpıcısı iki parçalı indirgemeyi Karatsuba-Ofman rekürsif çarpma yönteminin en üst katmanında birleştirmektedir. Bu yeni yöntemin gerçeklenmesinde Montgomery çarpıcısına, Blakley modüler çarpıcısına ve standart çarpma yapan bloklara ihtiyaç vardır. Ancak bu algoritma Montgomery ve Blakley modüllerinde literatürdeki daha önceki gerçeklemelerde bulunmayan değerlerin de hesaplanmasına gereksinim duymaktadır. Bu tezde İki Parçalı Modüler Örgü Karatsuba-Ofman çarpıcısının donanımda gerçeklenmesine ait iki çalışma ve bu tasarımlardan birisi kullanılarak gerçeklenen RSA kriptosistemi anlatılmaktadır. Bu çalışmalardan ilki çarpanı birer bit işleyen gerçeklemedir. FPGA teknolojisinde gerçeklenmiştir. İkinci gerçekleme ise ilk tasarımdaki eksikleri kapatan ve daha hızlı bir modüler çarpma için kodlama yöntemleri, daha fazla sayıda bit işleme, donanımın çalışma frekansını arttırmak için en büyük gecikmeye sahip yolu kontrol sinyallerini kullanarak optimize etmeye çalışan bir ASIC gerçeklemesidir. Bu tezdeki donanım tasarımları İki Parçalı Örgü Karatsuba-Ofman çarpma yönteminin ilk gerçeklemeleridirler. Montgomery ve Blakley algoritmaları bu yeni yöntem için yeniden düzenlenmiştir. Tasarımların ikisi için de Maple’da kütüphaneler oluşturulmuş ve iki tasarım da Maple ortamında donanımla aynı yapıda gerçeklenmiş, gerekli testler yapılmış ve simülatörlerden gelen sonuçlarla yazılım gerçeklemesinden gelen sonuçlar karşılaştırılarak tasarımların doğru çalıştığı kanıtlanmıştır. İkinci modüler çarpma gerçeklemesi RSA kriptosistemi içinde modüler çarpıcı olarak kullanılmış ve piyasadaki RSA kırmıklarıyla karşılaştırılabilir sonuçlara ulaşılmıştır.Importance of cryptography is becoming more and more important day by day. Secure communication will always be a crucial need independent from the technology in use. Applications of cryptography can be seen in online banking, purchases of goods and services by means of credit cards, ID cards, remote lock and start systems for cars, telecommunication, and in many more places. Different cryptosystems are employed according to different needs and different security levels. Rivest-Shamir-Adleman(RSA) Algorithm is one of the most popular cryptosystem that is used in many sectors like banking. It takes its strength from factorization of very large integers. Briefly, it consists of modular exponentiations with large integers which are 1024-bit or 2048-bit numbers. As modular exponentiation operations are fundamentally composed of modular multiplications, designing a fast RSA cryptosystem becomes possible only with a fast modular multiplier implementation. Due to security and speed issues, modular exponentiation in RSA is implemented in software and modular multiplications are carried out in modular multipliers which are implemented in hardware. Many methods exist in the literature in order to perform modular exponentiation. These methods try to reduce the total number of multiplications that are required for a modular exponetiation operation. In addition to modular exponentiation algorithms, there are several techniques to do modular multiplication as well. Modular multiplication may be carried out either by multiplying first and performing reduction later on, or by interleaving multiplication and reduction stages. In order to reach compact hardware designs the latter approach is utilized. According to the direction of reduction operation, modular multiplication algorithms may be classified into two groups, which are algorithms reducing from left-to-right and from right-to-left. The most well known examples of left-to-right approach are Blakley and Barrett algorithms, whereas Montgomery multiplication is the only member of the right-to-left modular multiplication. Although fast multiplication algorithms, such as Karatsuba-Ofman(KO), Schönhage-Strassen exist, these multiplication methods can not be interleaved with reduction algorithms. This is caused from the reduction approaches not allowing parallel processing. However, Bipartite Modular Multiplication(BMM) method introduced by Kaihara and Takagi proposes a partial solution to the parallel reduction problem. This reduction method modifies a feature of Montgomery algorithm. This modification separates the multiplier bits into two halves so that a product can be reduced from left and right simultaneously without a dependency issue. The first method which combines Karatsuba-Ofman multiplication with bipartite reduction was proposed by Gökay Saldamlı. His algoritm, namely Partially Interleaved Modular Karatsuba-Ofman Multiplication, interleaves KO multiplier with the bipartite reduction on the uppermost layer of KO’s recursion. Implementation of this new method requires Montgomery multiplier, Blakley multiplier and standard integer multipliers. However, for this approach, both Montgomery and Blakley multipliers are needed to be designed in a way that, they compute not only the modular multiplication result, but also quotient values, what is somewhat different than existing implementations. In this thesis, two hardware implementations of Partially Interleaved Modular KO Multiplication and an RSA implementation utilizing the designed multiplier are proposed. The first design is Radix-2 implementation on FPGA technology. The second hardware implementation explores the design methodologies in order to reach a faster modular multiplier. These design methodologies include employing high radices, optimization of critical path according to the effect of control signals and analyzing parameter dependencies in Partially Interleaved Modular KO Multiplier and scheduling jobs effectively. Improved design was implemented on ASIC technology using 90 nm TSMC standard cell libraries in Design Compiler. Hardware implementations proposed in this thesis are the first hardware implementations of Partially Interleaved Modular KO Multiplication method. Montgomery and Blakley multiplication algorithms were modified in order to produce desired results. Maple libraries which emulate the operation of hardware building blocks were written and both hardware implementations were firstly implemented in Maple. Tests were run with random input vectors and when correct operation of Maple implementations were verified, hardwares were described in VHDL and implemented using FPGA and ASIC design tools respectively. The second implementation of Partially Interleaved Modular KO multiplier was used as a modular multiplier for RSA cryptosystem and very promising results which are comparable with commercial RSA chips were achieved.Yüksek LisansM.Sc

Ulusal Üniversitelerarası Açık Erişim Sistemi - İstanbul Teknik Üniversitesi

A high-performance inner-product processor for real and complex numbers.

Author: Wang Guoping.
Publication venue
Publication date: 01/01/2003
Field of study

A novel, high-performance fixed-point inner-product processor based on a redundant binary number system is investigated in this dissertation. This scheme decreases the number of partial products to 50%, while achieving better speed and area performance, as well as providing pipeline extension opportunities. When modified Booth coding is used, partial products are reduced by almost 75%, thereby significantly reducing the multiplier addition depth. The design is applicable for digital signal and image processing applications that require real and/or complex numbers inner-product arithmetic, such as digital filters, correlation and convolution. This design is well suited for VLSI implementation and can also be embedded as an inner-product core inside a general purpose or DSP FPGA-based processor. Dynamic control of the computing structure permits different computations, such as a variety of inner-product real and complex number computations, parallel multiplication for real and complex numbers, and real and complex number division. The same structure can also be controlled to accept redundant binary number inputs for multiplication and inner-product computations. An improved 2's-complement to redundant binary converter is also presented

SHAREOK repository

Design and modeling of high speed modulo multipliers for cryptosystems

Author: Mahmoud Muhammad Yahya Imam
Publication venue
Publication date: 01/05/2004
Field of study

KFUPM ePrints

Design and modeling of high speed modulo multipliers for cryptosystems

Author: Mahmoud Muhammad Yahya Imam
Publication venue
Publication date: 01/05/2004
Field of study

Recommended from our members

Formal Analysis of Arithmetic Circuits using Computer Algebra - Verification, Abstraction and Reverse Engineering

Author: YU CUNXI
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/11/2017
Field of study

Despite a considerable progress in verification and abstraction of random and control logic, advances in formal verification of arithmetic designs have been lagging. This can be attributed mostly to the difficulty in an efficient modeling of arithmetic circuits and datapaths without resorting to computationally expensive Boolean methods, such as Binary Decision Diagrams (BDDs) and Boolean Satisfiability (SAT), that require “bit blasting”, i.e., flattening the design to a bit-level netlist. Approaches that rely on computer algebra and Satisfiability Modulo Theories (SMT) methods are either too abstract to handle the bit-level nature of arithmetic designs or require solving computationally expensive decision or satisfiability problems. The work proposed in this thesis aims at overcoming the limitations of analyzing arithmetic circuits, specifically at the post-synthesized phase. It addresses the verification, abstraction and reverse engineering problems of arithmetic circuits at an algebraic level, treating an arithmetic circuit and its specification as a properly constructed algebraic system. The proposed technique solves these problems by function extraction, i.e., by deriving arithmetic function computed by the circuit from its low-level circuit implementation using computer algebraic rewriting technique. The proposed techniques work on large integer arithmetic circuits and finite field arithmetic circuits, up to 512-bit wide containing millions of logic gates

ScholarWorks@UMass Amherst

Low-cost, low-power FPGA implementation of ED25519 and CURVE25519 point multiplication

Author: Doche Christophe
Mehrabi Ali (R20330)
Publication venue: 'MDPI AG'
Publication date: 01/01/2019
Field of study

Twisted Edwards curves have been at the center of attention since their introduction by Bernstein et al. in 2007. The curve ED25519, used for Edwards-curve Digital Signature Algorithm (EdDSA), provides faster digital signatures than existing schemes without sacrificing security. The CURVE25519 is a Montgomery curve that is closely related to ED25519. It provides a simple, constant time, and fast point multiplication, which is used by the key exchange protocol X25519. Software implementations of EdDSA and X25519 are used in many web-based PC and Mobile applications. In this paper, we introduce a low-power, low-area FPGA implementation of the ED25519 and CURVE25519 scalar multiplication that is particularly relevant for Internet of Things (IoT) applications. The efficiency of the arithmetic modulo the prime number 2 255 − 19, in particular the modular reduction and modular multiplication, are key to the efficiency of both EdDSA and X25519. To reduce the complexity of the hardware implementation, we propose a high-radix interleaved modular multiplication algorithm. One benefit of this architecture is to avoid the use of large-integer multipliers relying on FPGA DSP modules

Western Sydney ResearchDirect

Algorithms and VLSI architectures for parametric additive synthesis

Author: Spanier Jonathan Robert
Publication venue
Publication date: 01/01/1999
Field of study

A parametric additive synthesis approach to sound synthesis is advantageous as it can model sounds in a large scale manner, unlike the classical sinusoidal additive based synthesis paradigms. It is known that a large body of naturally occurring sounds are resonant in character and thus fit the concept well. This thesis is concerned with the computational optimisation of a super class of form ant synthesis which extends the sinusoidal parameters with a spread parameter known as band width. Here a modified formant algorithm is introduced which can be traced back to work done at IRCAM, Paris. When impulse driven, a filter based approach to modelling a formant limits the computational work-load. It is assumed that the filter's coefficients are fixed at initialisation, thus avoiding interpolation which can cause the filter to become chaotic. A filter which is more complex than a second order section is required. Temporal resolution of an impulse generator is achieved by using a two stage polyphase decimator which drives many filterbanks. Each filterbank describes one formant and is composed of sub-elements which allow variation of the formant’s parameters. A resource manager is discussed to overcome the possibility of all sub- banks operating in unison. All filterbanks for one voice are connected in series to the impulse generator and their outputs are summed and scaled accordingly. An explorative study of number systems for DSP algorithms and their architectures is investigated. I invented a new theoretical mechanism for multi-level logic based DSP. Its aims are to reduce the number of transistors and to increase their functionality. A review of synthesis algorithms and VLSI architectures are discussed in a case study between a filter based bit-serial and a CORDIC based sinusoidal generator. They are both of similar size, but the latter is always guaranteed to be stable

Durham e-Theses