Search CORE

25 research outputs found

Bipartite Modular Multiplication

Author: Kaihara Marcelo E.
Takagi Naofumi
高木直史
Publication venue: Springer
Publication date
Field of study

This paper proposes a new fast method for calculating modular multiplication. The calculation is performed using a new represen- tation of residue classes modulo M that enables the splitting of the multiplier into two parts. These two parts are then processed separately, in parallel, potentially doubling the calculation speed. The upper part and the lower part of the multiplier are processed using the interleaved modular multiplication algorithm and the Montgomery algorithm respectively. Conversions back and forth between the original integer set and the new residue system can be performed at speeds up to twice that of the Montgomery method without the need for precomputed constants. This new method is suitable for both hardware implementation; and software implementation in a multiprocessor environment. Although this paper is focusing on the application of the new method in the integer eld, the technique used to speed up the calculation can also easily be adapted for operation in the binary extended eld GF(2m)

Nagoya Repository

Design And Implementation Of Rsa Cryptosystem Using Partially Interleaved Modular Karatsuba-ofman Multiplier

Author: Arış Ahmet
Publication venue: 'Nara Institute of Science and Technology'
Publication date: 06/01/2013
Field of study

Tez (Yüksek Lisans) -- İstanbul Teknik Üniversitesi, Fen Bilimleri Enstitüsü, 2012Thesis (M.Sc.) -- İstanbul Technical University, Institute of Science and Technology, 2012Kriptografinin, yani şifreleme biliminin önemi gün geçtikçe artmaktadır. Kullanılan teknoloji ne olursa olsun güvenli iletişim her zaman en başta gelen ihtiyaçlardan birisi olacaktır. Günümüzde şifreleme, sistemlere kullanıcı hesabıyla giriş yapılıyorken, internetten herhangi bir hizmet ya da ürün satın alınıyorken, iletişim araçları kullanılıyorken, araçların kapıları uzaktan kilitlenip açılıyorken ve daha birçok yerde kullanılmaktadır. Kullanım alanına ve güvenlik ihtiyacının niteliğine göre değişik şifreleme algoritmaları farklı teknolojilerle karşımıza çıkmaktadır. Bu algoritmaların bir tanesi de Rivest-Shamir-Adleman(RSA) şifreleme sistemidir. RSA bankacılık başta olmak üzere birçok sektörde şifreleme ve sayısal imza işlemleri için sıklıkla kullanılmaktadır. RSA algoritması gücünü çok büyük sayıların asal çarpanlarına ayrılmalarındaki zorluktan almaktadır. Özetle çok büyük sayılarla yapılan modüler üs alma işlemlerinden oluşmaktadır. Modüler üs alma işlemleri de özünde modüler çarpma işlemlerinden ibaret olduğu için hızlı bir RSA gerçeklemesi ancak hızlı modüler çarpma işlemi yapan bir tasarımla mümkün olmaktadır. Güvenlik ve hız gibi sebeplerden ötürü RSA kriptosistemi genellikle modüler üs alma işleminin yazılımda gerçeklenmesi ve modüler çarpma işlemlerinin de donanımda tasarlanan özel bloklarla yapılması yoluyla gerçeklenir. Modüler üs alma işlemi için birçok yöntem mevcuttur. Bu yöntemler modüler üs alma işlemi esnasında yapılan modüler çarpma işlemi sayısını değişik yöntemlerle en aza indirmeye çalışırlar. Modüler çarpma işlemi için de bilim dünyasında epeyi çalışmalar yapılmıştır. Modüler çarpma, önce çarpıp sonra indirgeme yapma ya da çarpma ve indirgeme işlemlerini iç-içe yapma gibi iki yöntemle mümkündür. Alan kısıtlamaları sebebiyle çoğunlukla ikinci metot tercih edilmektedir. İndirgeme işleminin uygulanma yönüne göre modüler çarpma algoritmaları soldan sağa doğru işleyenler ve sağdan sola doğru işleyenler olmak üzere ikiye ayrılır. Soldan sağa doğru işleyen yöntemlerin en bilindik örnekleri Blakley ve Barrett algoritmalarıdır. Sağdan sola doğru indirgeme yapan sınıfa ise tek örnek Montgomery yöntemidir. Hızlı çarpma yapan Karatsuba-Ofman, Schönhage-Strassen gibi yöntemler olsa da bu hızlı çarpıcılar indirgeme algoritmalarıyla birleştirilememektedirler. Bu sorun paralel çalışmaya izin vermeyen indirgeme yöntemlerinden kaynaklanmaktadır. Ancak Kaihara ve Takagi tarafından bilim dünyasına sunulan İki Parçalı Modüler çarpma yöntemi bu soruna bir nebze de olsa çözüm bulmaktadır. Bu indirgeme metodu Montgomery algoritmasının sahip olduğu bir özelliği kullanarak modüler çarpmadaki çarpanı ikiye ayırmakta ve böylelikle Blakley ve Montgomery paralel olarak çalışabilmektedir. Hızlı çarpma algoritmalarından birisi olan Karatsuba-Ofman yöntemiyle İki Parçalı indirgemeyi birleştiren ilk çalışma Gökay Saldamlı tarafından ortaya atılmıştır. İki Parçalı Örgü Karatsuba-Ofman çarpıcısı iki parçalı indirgemeyi Karatsuba-Ofman rekürsif çarpma yönteminin en üst katmanında birleştirmektedir. Bu yeni yöntemin gerçeklenmesinde Montgomery çarpıcısına, Blakley modüler çarpıcısına ve standart çarpma yapan bloklara ihtiyaç vardır. Ancak bu algoritma Montgomery ve Blakley modüllerinde literatürdeki daha önceki gerçeklemelerde bulunmayan değerlerin de hesaplanmasına gereksinim duymaktadır. Bu tezde İki Parçalı Modüler Örgü Karatsuba-Ofman çarpıcısının donanımda gerçeklenmesine ait iki çalışma ve bu tasarımlardan birisi kullanılarak gerçeklenen RSA kriptosistemi anlatılmaktadır. Bu çalışmalardan ilki çarpanı birer bit işleyen gerçeklemedir. FPGA teknolojisinde gerçeklenmiştir. İkinci gerçekleme ise ilk tasarımdaki eksikleri kapatan ve daha hızlı bir modüler çarpma için kodlama yöntemleri, daha fazla sayıda bit işleme, donanımın çalışma frekansını arttırmak için en büyük gecikmeye sahip yolu kontrol sinyallerini kullanarak optimize etmeye çalışan bir ASIC gerçeklemesidir. Bu tezdeki donanım tasarımları İki Parçalı Örgü Karatsuba-Ofman çarpma yönteminin ilk gerçeklemeleridirler. Montgomery ve Blakley algoritmaları bu yeni yöntem için yeniden düzenlenmiştir. Tasarımların ikisi için de Maple’da kütüphaneler oluşturulmuş ve iki tasarım da Maple ortamında donanımla aynı yapıda gerçeklenmiş, gerekli testler yapılmış ve simülatörlerden gelen sonuçlarla yazılım gerçeklemesinden gelen sonuçlar karşılaştırılarak tasarımların doğru çalıştığı kanıtlanmıştır. İkinci modüler çarpma gerçeklemesi RSA kriptosistemi içinde modüler çarpıcı olarak kullanılmış ve piyasadaki RSA kırmıklarıyla karşılaştırılabilir sonuçlara ulaşılmıştır.Importance of cryptography is becoming more and more important day by day. Secure communication will always be a crucial need independent from the technology in use. Applications of cryptography can be seen in online banking, purchases of goods and services by means of credit cards, ID cards, remote lock and start systems for cars, telecommunication, and in many more places. Different cryptosystems are employed according to different needs and different security levels. Rivest-Shamir-Adleman(RSA) Algorithm is one of the most popular cryptosystem that is used in many sectors like banking. It takes its strength from factorization of very large integers. Briefly, it consists of modular exponentiations with large integers which are 1024-bit or 2048-bit numbers. As modular exponentiation operations are fundamentally composed of modular multiplications, designing a fast RSA cryptosystem becomes possible only with a fast modular multiplier implementation. Due to security and speed issues, modular exponentiation in RSA is implemented in software and modular multiplications are carried out in modular multipliers which are implemented in hardware. Many methods exist in the literature in order to perform modular exponentiation. These methods try to reduce the total number of multiplications that are required for a modular exponetiation operation. In addition to modular exponentiation algorithms, there are several techniques to do modular multiplication as well. Modular multiplication may be carried out either by multiplying first and performing reduction later on, or by interleaving multiplication and reduction stages. In order to reach compact hardware designs the latter approach is utilized. According to the direction of reduction operation, modular multiplication algorithms may be classified into two groups, which are algorithms reducing from left-to-right and from right-to-left. The most well known examples of left-to-right approach are Blakley and Barrett algorithms, whereas Montgomery multiplication is the only member of the right-to-left modular multiplication. Although fast multiplication algorithms, such as Karatsuba-Ofman(KO), Schönhage-Strassen exist, these multiplication methods can not be interleaved with reduction algorithms. This is caused from the reduction approaches not allowing parallel processing. However, Bipartite Modular Multiplication(BMM) method introduced by Kaihara and Takagi proposes a partial solution to the parallel reduction problem. This reduction method modifies a feature of Montgomery algorithm. This modification separates the multiplier bits into two halves so that a product can be reduced from left and right simultaneously without a dependency issue. The first method which combines Karatsuba-Ofman multiplication with bipartite reduction was proposed by Gökay Saldamlı. His algoritm, namely Partially Interleaved Modular Karatsuba-Ofman Multiplication, interleaves KO multiplier with the bipartite reduction on the uppermost layer of KO’s recursion. Implementation of this new method requires Montgomery multiplier, Blakley multiplier and standard integer multipliers. However, for this approach, both Montgomery and Blakley multipliers are needed to be designed in a way that, they compute not only the modular multiplication result, but also quotient values, what is somewhat different than existing implementations. In this thesis, two hardware implementations of Partially Interleaved Modular KO Multiplication and an RSA implementation utilizing the designed multiplier are proposed. The first design is Radix-2 implementation on FPGA technology. The second hardware implementation explores the design methodologies in order to reach a faster modular multiplier. These design methodologies include employing high radices, optimization of critical path according to the effect of control signals and analyzing parameter dependencies in Partially Interleaved Modular KO Multiplier and scheduling jobs effectively. Improved design was implemented on ASIC technology using 90 nm TSMC standard cell libraries in Design Compiler. Hardware implementations proposed in this thesis are the first hardware implementations of Partially Interleaved Modular KO Multiplication method. Montgomery and Blakley multiplication algorithms were modified in order to produce desired results. Maple libraries which emulate the operation of hardware building blocks were written and both hardware implementations were firstly implemented in Maple. Tests were run with random input vectors and when correct operation of Maple implementations were verified, hardwares were described in VHDL and implemented using FPGA and ASIC design tools respectively. The second implementation of Partially Interleaved Modular KO multiplier was used as a modular multiplier for RSA cryptosystem and very promising results which are comparable with commercial RSA chips were achieved.Yüksek LisansM.Sc

Ulusal Üniversitelerarası Açık Erişim Sistemi - İstanbul Teknik Üniversitesi

High-Performance Ternary (4:2) Compressor Based on Capacitive Threshold Logic

Author: Faghih Mirzaee Reza
Reza Akram
Publication venue: Electronics and Telecommunications Committee
Publication date: 01/01/2017
Field of study

This paper presents a ternary (4:2) compressor, which is an important component in multiplication. However, the structure differs from the binary counterpart since the ternary model does not require carry signals. The method of capacitive threshold logic (CTL) is used to achieve the output signals directly. Unlike the previously presented similar structure, the entire capacitor network is divided into two parts. This segregation results in higher reliability and robustness against unwanted process, voltage, and temperature (PVT) variations. Simulations are performed by HSPICE and 32nm CNFET technology. Simulation results demonstrate about 94% higher performance in terms of power-delay product (PDP) for the new design over the previous one

Biblioteka Nauki - repozytorium artykuÅÃ³w

International Journal of Electronics and Telecommunications (Warsaw University of Technology)

Horner's Rule-Based Multiplication over Fp and Fp^n: A Survey

Author: Beuchat Jean-Luc
Miyoshi Takanori
Muller Jean-Michel
Okamoto Eiji
Publication venue: 'Informa UK Limited'
Publication date: 01/07/2008
Field of study

International audienceThis paper aims at surveying multipliers based on Horner's rule for finite field arithmetic. We present a generic architecture based on five processing elements and introduce a classification of several algorithms based on our model. We provide the readers with a detailed description of each scheme which should allow them to write a VHDL description or a VHDL code generator

HAL-ENS-LYON

INRIA a CCSD electronic archive server

Hal-Diderot

Montgomery Arithmetic from a Software Perspective

Author: Joppe W. Bos
Peter L. Montgomery
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 31/10/2017
Field of study

This chapter describes Peter L. Montgomery\u27s modular multiplication method and the various improvements to reduce the latency for software implementations on devices which have access to many computational units

Cryptology ePrint Archive

CASA: A Compact and Scalable Accelerator for Approximate Homomorphic Encryption

Author: Jiafeng Xie
Pengzhou He
Samira Carolina Oliva Madrigal
Tianyou Bao
Çetin Kaya Koç
Publication venue: Ruhr-Universität Bochum
Publication date: 01/03/2024
Field of study

Approximate arithmetic-based homomorphic encryption (HE) scheme CKKS [CKKS17] is arguably the most suitable one for real-world data-privacy applications due to its wider computation range than other HE schemes such as BGV [BGV14], FV and BFV [Bra12, FV12]. However, the most crucial homomorphic operation of CKKS called key-switching induces a great amount of computational burden in actual deployment situations, and creates scalability challenges for hardware acceleration. In this paper, we present a novel Compact And Scalable Accelerator (CASA) for CKKS on the field-programmable gate array (FPGA) platform. The proposed CASA addresses the aforementioned computational and scalability challenges in homomorphic operations, including key-exchange, homomorphic multiplication, homomorphic addition, and rescaling. On the architecture layer, we propose a new design methodology for efficient acceleration of CKKS. We design this novel hardware architecture by carefully studying the homomorphic operation patterns and data dependency amongst the primitive oracles. The homomorphic operations are efficiently mapped into an accelerator with simple control and smooth operation, which brings benefits for scalable implementation and enhanced pipeline and parallel processing (even with the potential for further improvement). On the component layer, we carry out a detailed and extensive study and present novel micro-architectures for primitive function modules, including memory bank, number theoretic transform (NTT) module, modulus switching bank, and dyadic multiplication and accumulation. On the arithmetic layer, we develop a new partially reduction-free modular arithmetic technique to eliminate part of the reduction cost over different prime moduli within the moduli chain of the Residue Number System (RNS). The proposed structure can support arbitrary numbers of security primes of CKKS during key exchange, which offers better security options for adopting the scalable design methodology. As a proof-of-concept, we implement CASA on the FPGA platform and compare it with state-of-the-art designs. The implementation results showcase the superior performance of the proposed CASA in many aspects such as compact area, scalable architecture, and overall better area-time complexities. In particular, we successfully implement CASA on a mainstream resource-constrained Artix-7 FPGA. To the authors’ best knowledge, this is the first compact CKKS accelerator implemented on an Artix-7 device, e.g., CASA achieves a 10.8x speedup compared with the state-of-the-art CPU implementations (with power consumption of only 5.8%). Considering the power-delay product metric, CASA also achieves 138x and 105x improvement compared with the recent GPU implementation

Directory of Open Access Journals

Montgomery Multiplication Using Vector Instructions

Author: Daniel Shumow
Gregory M. Zaverucha
Joppe W. Bos
Peter L. Montgomery
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 21/08/2013
Field of study

In this paper we present a parallel approach to compute interleaved Montgomery multiplication. This approach is particularly suitable to be computed on 2-way single instruction, multiple data platforms as can be found on most modern computer architectures in the form of vector instruction set extensions. We have implemented this approach for tablet devices which run the x86 architecture (Intel Atom Z2760) using SSE2 instructions as well as devices which run on the ARM platform (Qualcomm MSM8960, NVIDIA Tegra 3 and 4) using NEON instructions. When instantiating modular exponentiation with this parallel version of Montgomery multiplication we observed a performance increase of more than a factor of 1.5 compared to the sequential implementation in OpenSSL for the classical arithmetic logic unit on the Atom platform for 2048-bit moduli

Cryptology ePrint Archive

Efficient Modular Multiplication

Author: Dan Page
Joppe W. Bos
Thorsten Kleinjung
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 10/09/2021
Field of study

This paper is concerned with one of the fundamental building blocks used in modern public-key cryptography: modular multiplication. Speed-ups applied to the modular multiplication algorithm or implementation directly translate in a faster modular exponentiation for RSA or a faster realization of the group law when using elliptic curve cryptography

Cryptology ePrint Archive

A Faster Software Implementation of the Supersingular Isogeny Diffie-Hellman Key Exchange Protocol

Author: Armando Faz-Hernández
Eduardo Ochoa-Jiménez
Francisco Rodríguez-Henríquez
Julio López
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 31/12/2017
Field of study

Since its introduction by Jao and De Feo in 2011, the supersingular isogeny Diffie-Hellman (SIDH) key exchange protocol has positioned itself as a promising candidate for post-quantum cryptography. One salient feature of the SIDH protocol is that it requires exceptionally short key sizes. However, the latency associated to SIDH is higher than the ones reported for other post-quantum cryptosystem proposals. Aiming to accelerate the SIDH runtime performance, we present in this work several algorithmic optimizations targeting both elliptic-curve and field arithmetic operations. We introduce in the context of the SIDH protocol a more efficient approach for calculating the elliptic curve operation P + [k]Q. Our strategy achieves a factor 1.4 speedup compared with the popular variable-three-point ladder algorithm regularly used in the SIDH shared secret phase. Moreover, profiting from pre-computation techniques our algorithm yields a factor 1.7 acceleration for the computation of this operation in the SIDH key generation phase. We also present an optimized evaluation of the point tripling formula, and discuss several algorithmic and implementation techniques that lead to faster field arithmetic computations. A software implementation of the above improvements on an Intel Skylake Core i7-6700 processor gives a factor 1.33 speedup against the state-of-the-art software implementation of the SIDH protocol reported by Costello-Longa-Naehrig in CRYPTO 2016

Cryptology ePrint Archive

Extension and implementation of the mod without mod algorithm to efficiently compute the modulus of a number in hardware

Author: Swann Ryan
Publication venue
Publication date: 01/12/2020
Field of study

This thesis discusses a hardware implementation of modulo that does not require a multiplication. This implementation is based on the algorithm proposed in Mark A. Will's "Mod without mod" in which the an algorithm is presented to calculate the modulus of large values using shifting and adding. This allows our implementation to be comparable in clock cycles to other implementations without the need for a multiplier's delay. This algorithm is compared with others, such as Barret reduction, Montgomery reduction, and fast modular reduction. Our implementation of this modulo algorithm is shown to be faster in many cases. This paper proposes both a hardware implementation of this algorithm as well as synthesis results in soi12s0 45nm IBM Multi-threshold CMOS (MTCMOS) technology and ARM-based standard cells

SHAREOK repository