Search CORE

41 research outputs found

Modular SIMD arithmetic in Mathemagix

Author: Lecerf Grégoire
Quintin Guillaume
van der Hoeven Joris
Publication venue
Publication date: 29/06/2014
Field of study

Modular integer arithmetic occurs in many algorithms for computer algebra, cryptography, and error correcting codes. Although recent microprocessors typically offer a wide range of highly optimized arithmetic functions, modular integer operations still require dedicated implementations. In this article, we survey existing algorithms for modular integer arithmetic, and present detailed vectorized counterparts. We also present several applications, such as fast modular Fourier transforms and multiplication of integer polynomials and matrices. The vectorized algorithms have been implemented in C++ inside the free computer algebra and analysis system Mathemagix. The performance of our implementation is illustrated by various benchmarks

arXiv.org e-Print Archive

HAL-UNILIM

HAL-Polytechnique

Hardware Aspects of Montgomery Modular Multiplication

Author: Colin D. Walter
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 21/11/2017
Field of study

This chapter compares Peter Montgomery\u27s modular multiplication method with traditional techniques for suitability on hardware platforms. It also covers systolic array implementations and side channel leakage

Cryptology ePrint Archive

Organization of parallel execution of modular multiplication to speed up the computational implementation of public-key cryptography

Author: Boyarshin Igor
Markovskyi Oleksandr
Ostrovska Bogdana
Publication venue: 'Kyiv Politechnic Institute'
Publication date: 01/01/2022
Field of study

The article theoretically substantiates, investigates and develops a method for parallel execution of the basic operation of public key cryptography - modular multiplication of numbers with high bit count. It is based on a special organization of the division of the components of modular multiplication into independent computational processes. To implement this, it is proposed to use the Montgomery modular reduction. The described solution is illustrated with numerical examples. It has been theoretically and experimentally proven that the proposed approach to parallelization of the arithmetical process of modular multiplication makes it possible to speed up this important for cryptographic tasks operation by 5-6 times

Electronic Archive of Kyiv Polytechnic Institute

Comparison of Scalable Montgomery Modular Multiplication Implementations Embedded in Reconfigurable Hardware

Author: Drutarovský Milos
Fischer Viktor
Simka Martin
Publication venue: 'Corporacion Universitaria Latinoamericana CUL'
Publication date: 01/01/2006
Field of study

International audienceThis paper presents a comparison of possible approaches for an efficient implementation of Multiple-word radix-2 Montgomery Modular Multiplication (MM) on modern Field Programmable Gate Arrays (FPGAs). The hardware implementation of MM coprocessor is fully scalable what means that it can be reused in order to generate long-precision results independently on the word length of the originally proposed coprocessor. The first of analyzed implementations uses a data path based on traditionally used redundant carry-save adders, the second one exploits, in scalable designs not yet applied, standard carry-propagate adders with fast carry chain logic. As a control unit and a platform for purely software implementation an embedded soft-core processor Altera NIOS is employed. All implementations use large embedded memory blocks available in recent FPGAs. Speed and logic requirements comparisons are performed on the optimized software and combined hardware-software designs in Altera FPGAs. The issues of targeting a design specifically for a FPGA are considered taking into account the underlying architecture imposed by the target FPGA technology. It is shown that the coprocessors based on carry-save adders and carry-propagate adders provide comparable results in constrained FPGA implementations but in case of carry-propagate logic, the solution requires less embedded memory and provides some additional implementation advantages presented in the paper

HAL-UJM