Search CORE

768 research outputs found

Maximizing the Efficiency using Montgomery Multipliers on FPGA in RSA Cryptography for Wireless Sensor Networks

Author: Leelavathi G, Shaila K, Venugopal K R
Publication venue: Auricle Global Society of Education and Research
Publication date: 30/11/2017
Field of study

The architecture and modeling of RSA public key encryption/decryption systems are presented in this work. Two different architectures are proposed, mMMM42 (modified Montgomery Modular Multiplier 4 to 2 Carry Save Architecture) and RSACIPHER128 to check the suitability for implementation in Wireless Sensor Nodes to utilize the same in Wireless Sensor Networks. It can easily be fitting into systems that require different levels of security by changing the key size. The processing time is increased and space utilization is reduced in FPGA due to its reusability. VHDL code is synthesized and simulated using Xilinx-ISE for both the architectures. Architectures are compared in terms of area and time. It is verified that this architecture support for a key size of 128bits. The implementation of RSA encryption/decryption algorithm on FPGA using 128 bits data and key size with RSACIPHER128 gives good result with 50% less utilization of hardware. This design is also implemented for ASIC using Mentor Graphics

International Journal on Future Revolution in Computer Science & Communication Engineering

Customisable arithmetic hardware designs

Author: Cheung Chak-Chung Ray
Cheung Chak-Chung Ray
Publication venue
Publication date: 01/01/2007
Field of study

Imperial Users onl

Spiral - Imperial College Digital Repository

A Vector Approach to Cryptography Implementation

Author: Fournier Jacques Jean-Alain
Simon Moore
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/10/2005
Field of study

International audienceThe current deployment of Digital Right Management (DRM) schemes to distribute protected contents and rights is leading the way to massive use of sophisticated embedded cryptographic applications. Embedded microprocessors have been equipped with bulky and power-consuming co-processors designed to suit particular data sizes. However, flexible cryptographic platforms are more desirable than devices dedicated to a particular cryptographic algorithm as the increasing cost of fabrication chips favors large volume production. This paper proposes a novel approach to embedded cryptography whereby we propose a vector-based general purpose machine capable of implementing a range of cryptographic algorithms. We show that vector processing ideas can be used to perform cryptography in an e±cient manner which we believe is appropriate for high performance, flexible and power efficient embedded systems

HAL-EMSE

Optimization of Supersingular Isogeny Cryptography for Deeply Embedded Systems

Author: Calhoun Jeffrey Denton
Publication venue: UNM Digital Repository
Publication date: 11/05/2018
Field of study

Public-key cryptography in use today can be broken by a quantum computer with sufficient resources. Microsoft Research has published an open-source library of quantum-secure supersingular isogeny (SI) algorithms including Diffie-Hellman key agreement and key encapsulation in portable C and optimized x86 and x64 implementations. For our research, we modified this library to target a deeply-embedded processor with instruction set extensions and a finite-field coprocessor originally designed to accelerate traditional elliptic curve cryptography (ECC). We observed a 6.3-7.5x improvement over a portable C implementation using instruction set extensions and a further 6.0-6.1x improvement with the addition of the coprocessor. Modification of the coprocessor to a wider datapath further increased performance 2.6-2.9x. Our results show that current traditional ECC implementations can be easily refactored to use supersingular elliptic curve arithmetic and achieve post-quantum security

Recommended from our members

Accelerating RSA Public Key Cryptography via Hardware Acceleration

Author: Ramesh Pavithra
Publication venue: ScholarWorks@UMass Amherst
Publication date: 10/04/2020
Field of study

A large number and a variety of sensors and actuators, also known as edge devices of the Internet of Things, belonging to various industries - health care monitoring, home automation, industrial automation, have become prevalent in today\u27s world. These edge devices need to communicate data collected to the central system occasionally and often in burst mode which is then used for monitoring and control purposes. To ensure secure connections, Asymmetric or Public Key Cryptography (PKC) schemes are used in combination with Symmetric Cryptography schemes. RSA (Rivest - Shamir- Adleman) is one of the most prevalent public key cryptosystems, and has computationally intensive operations which might have a high latency when implemented in resource constrained environments. The objective of this thesis is to design an accelerator capable of increasing the speed of execution of the RSA algorithm in such resource constrained environments. The bottleneck of the algorithm is determined by analyzing the performance of the algorithm in various platforms - Intel Linux Machine, Raspberry Pi, Nios soft core processor. In designing the accelerator to speedup bottleneck function, we realize that the accelerator architecture will need to be changed according to the resources available to the accelerator. We use high level synthesis tools to explore the design space of the accelerator by taking into consideration system level aspects like the number of ports available to transfer inputs to the accelerator, the word size of the processor, etc. We also propose a new accelerator architecture for the bottleneck function and the algorithm it implements and compare the area and latency requirements of it with other designs obtained from design space exploration. The functionality of the design proposed is verified and prototyped in Zynq SoC of Xilinx Zedboard

ScholarWorks@UMass Amherst

BP-NTT: Fast and Compact in-SRAM Number Theoretic Transform with Bit-Parallel Modular Multiplication

Author: Imani Mohsen
Sadredini Elaheh
Zhang Jingyao
Publication venue
Publication date: 22/04/2023
Field of study

Number Theoretic Transform (NTT) is an essential mathematical tool for computing polynomial multiplication in promising lattice-based cryptography. However, costly division operations and complex data dependencies make efficient and flexible hardware design to be challenging, especially on resource-constrained edge devices. Existing approaches either focus on only limited parameter settings or impose substantial hardware overhead. In this paper, we introduce a hardware-algorithm methodology to efficiently accelerate NTT in various settings using in-cache computing. By leveraging an optimized bit-parallel modular multiplication and introducing costless shift operations, our proposed solution provides up to 29x higher throughput-per-area and 2.8-100x better throughput-per-area-per-joule compared to the state-of-the-art.Comment: This work is accepted to the 60th Design Automation Conference (DAC), 202

arXiv.org e-Print Archive

Low-Weight Primes for Lightweight Elliptic Curve Cryptography on 8-bit AVR Processors

Author: AD Woodbury
C Lederer
CD Walter
D Stebila
DF Aranha
DJ Bernstein
DR Hankerson
G Meulenaer de
H Seo
H Wang
J Großschädl
J Großschädl
L Uhsadel
M Hutter
M Hutter
N Gura
P Szczechowiak
PL Montgomery
PL Montgomery
RP Gallant
S Mangard
SC Seo
T Eisenbarth
T Yanık
Y Sakai
ÇK Koç
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/11/2013
Field of study

Small 8-bit RISC processors and micro-controllers based on the AVR instruction set architecture are widely used in the embedded domain with applications ranging from smartcards over control systems to wireless sensor nodes. Many of these applications require asymmetric encryption or authentication, which has spurred a body of research into implementation aspects of Elliptic Curve Cryptography (ECC) on the AVR platform. In this paper, we study the suitability of a special class of finite fields, the so-called Optimal Prime Fields (OPFs), for a "lightweight" implementation of ECC with a view towards high performance and security. An OPF is a finite field Fp defined by a prime of the form p = u*2^k + v, whereby both u and v are "small" (in relation to 2^k) so that they fit into one or two registers of an AVR processor. OPFs have a low Hamming weight, which allows for a very efficient implementation of the modular reduction since only the non-zero words of p need to be processed. We describe a special variant of Montgomery multiplication for OPFs that does not execute any input-dependent conditional statements (e.g. branch instructions) and is, hence, resistant against certain side-channel attacks. When executed on an Atmel ATmega processor, a multiplication in a 160-bit OPF takes just 3237 cycles, which compares favorably with other implementations of 160-bit modular multiplication on an 8-bit processor. We also describe a performance-optimized and a security-optimized implementation of elliptic curve scalar multiplication over OPFs. The former uses a GLV curve and executes in 4.19M cycles (over a 160-bit OPF), while the latter is based on a Montgomery curve and has an execution time of approximately 5.93M cycles. Both results improve the state-of-the-art in lightweight ECC on 8-bit processors

Crossref

Open Repository and Bibliography - Luxembourg

Hardware-Software Codesign of a Vector Co-processor for Public Key Cryptography

Author: Fournier Jacques Jean-Alain
Moore Simon
Publication venue: IEEE Computer Society
Publication date: 01/01/2006
Field of study

International audienceUntil now, most cryptography implementations on parallel architectures have focused on adapting the software to SIMD architectures initially meant for media applications. In this paper, we review some of the most significant contributions in this area. We then propose a vector architecture to efficiently implement long precision modular multiplications. Having such a data level parallel hardware provides a circuit whose decode and schedule units are at least of the same complexity as those of a scalar processor. The excess transistors are mainly found in the data path. Moreover, the vector approach gives a very modular architecture where resources can be easily redefined. We built a functional simulator onto which we performed a quantitative analysis to study how the resizing of those resources affects the performance of the modular multiplication operation. Hence we not only propose a vector architecture for our Public Key cryptographic operations but also show how we can analyze the impact of design choices on performance. The proposed architecture is also flexible in the sense that the software running on it would offer room for the implementation of counter-measures against side-channel or fault attacks

Crossref

HAL-EMSE