Search CORE

1,844 research outputs found

A versatile Montgomery multiplier architecture with characteristic three support

Author: Ozturk Erdinc
Savas Erkay
Savaş Erkay
Sunar Berk
Öztürk Erdinç
Publication venue: 'Elsevier BV'
Publication date: 08/05/2008
Field of study

We present a novel unified core design which is extended to realize Montgomery multiplication in the fields GF(2n), GF(3m), and GF(p). Our unified design supports RSA and elliptic curve schemes, as well as the identity-based encryption which requires a pairing computation on an elliptic curve. The architecture is pipelined and is highly scalable. The unified core utilizes the redundant signed digit representation to reduce the critical path delay. While the carry-save representation used in classical unified architectures is only good for addition and multiplication operations, the redundant signed digit representation also facilitates efficient computation of comparison and subtraction operations besides addition and multiplication. Thus, there is no need for a transformation between the redundant and the non-redundant representations of field elements, which would be required in the classical unified architectures to realize the subtraction and comparison operations. We also quantify the benefits of the unified architectures in terms of area and critical path delay. We provide detailed implementation results. The metric shows that the new unified architecture provides an improvement over a hypothetical non-unified architecture of at least 24.88%, while the improvement over a classical unified architecture is at least 32.07%

Sabanci University Research Database

A high-speed integrated circuit with applications to RSA Cryptography

Author: Onions Paul David
Publication venue: 'University of Plymouth'
Publication date: 01/01/1995
Field of study

Merged with duplicate record 10026.1/833 on 01.02.2017 by CS (TIS)The rapid growth in the use of computers and networks in government, commercial and private communications systems has led to an increasing need for these systems to be secure against unauthorised access and eavesdropping. To this end, modern computer security systems employ public-key ciphers, of which probably the most well known is the RSA ciphersystem, to provide both secrecy and authentication facilities. The basic RSA cryptographic operation is a modular exponentiation where the modulus and exponent are integers typically greater than 500 bits long. Therefore, to obtain reasonable encryption rates using the RSA cipher requires that it be implemented in hardware. This thesis presents the design of a high-performance VLSI device, called the WHiSpER chip, that can perform the modular exponentiations required by the RSA cryptosystem for moduli and exponents up to 506 bits long. The design has an expected throughput in excess of 64kbit/s making it attractive for use both as a general RSA processor within the security function provider of a security system, and for direct use on moderate-speed public communication networks such as ISDN. The thesis investigates the low-level techniques used for implementing high-speed arithmetic hardware in general, and reviews the methods used by designers of existing modular multiplication/exponentiation circuits with respect to circuit speed and efficiency. A new modular multiplication algorithm, MMDDAMMM, based on Montgomery arithmetic, together with an efficient multiplier architecture, are proposed that remove the speed bottleneck of previous designs. Finally, the implementation of the new algorithm and architecture within the WHiSpER chip is detailed, along with a discussion of the application of the chip to ciphering and key generation

Plymouth Electronic Archive and Research Library

OpenGrey Repository

Optimization of Supersingular Isogeny Cryptography for Deeply Embedded Systems

Author: Calhoun Jeffrey Denton
Publication venue: UNM Digital Repository
Publication date: 11/05/2018
Field of study

Public-key cryptography in use today can be broken by a quantum computer with sufficient resources. Microsoft Research has published an open-source library of quantum-secure supersingular isogeny (SI) algorithms including Diffie-Hellman key agreement and key encapsulation in portable C and optimized x86 and x64 implementations. For our research, we modified this library to target a deeply-embedded processor with instruction set extensions and a finite-field coprocessor originally designed to accelerate traditional elliptic curve cryptography (ECC). We observed a 6.3-7.5x improvement over a portable C implementation using instruction set extensions and a further 6.0-6.1x improvement with the addition of the coprocessor. Modification of the coprocessor to a wider datapath further increased performance 2.6-2.9x. Our results show that current traditional ECC implementations can be easily refactored to use supersingular elliptic curve arithmetic and achieve post-quantum security

BRAMAC: Compute-in-BRAM Architectures for Multiply-Accumulate on FPGAs

Author: Abdelfattah Mohamed S.
Chen Yuzong
Publication venue
Publication date: 08/04/2023
Field of study

Deep neural network (DNN) inference using reduced integer precision has been shown to achieve significant improvements in memory utilization and compute throughput with little or no accuracy loss compared to full-precision floating-point. Modern FPGA-based DNN inference relies heavily on the on-chip block RAM (BRAM) for model storage and the digital signal processing (DSP) unit for implementing the multiply-accumulate (MAC) operation, a fundamental DNN primitive. In this paper, we enhance the existing BRAM to also compute MAC by proposing BRAMAC (Compute-in-

\underline{\text{BR}}

\underline{\text{A}}

rchitectures for

\underline{\text{M}}

ultiply-

\underline{\text{Ac}}

cumulate). BRAMAC supports 2's complement 2- to 8-bit MAC in a small dummy BRAM array using a hybrid bit-serial & bit-parallel data flow. Unlike previous compute-in-BRAM architectures, BRAMAC allows read/write access to the main BRAM array while computing in the dummy BRAM array, enabling both persistent and tiling-based DNN inference. We explore two BRAMAC variants: BRAMAC-2SA (with 2 synchronous dummy arrays) and BRAMAC-1DA (with 1 double-pumped dummy array). BRAMAC-2SA/BRAMAC-1DA can boost the peak MAC throughput of a large Arria-10 FPGA by 2.6

\times

/2.1

\times

, 2.3

\times

/2.0

\times

, and 1.9

\times

/1.7

\times

for 2-bit, 4-bit, and 8-bit precisions, respectively at the cost of 6.8%/3.4% increase in the FPGA core area. By adding BRAMAC-2SA/BRAMAC-1DA to a state-of-the-art tiling-based DNN accelerator, an average speedup of 2.05

\times

/1.7

\times

and 1.33

\times

/1.52

\times

can be achieved for AlexNet and ResNet-34, respectively across different model precisions.Comment: 11 pages, 13 figures, 3 tables, FCCM conference 202

arXiv.org e-Print Archive

Introduction to Logic Circuits & Logic Design with VHDL

Author: Brock J. LaMeres
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 24/04/2020
Field of study

The overall goal of this book is to fill a void that has appeared in the instruction of digital circuits over the past decade due to the rapid abstraction of system design. Up until the mid-1980s, digital circuits were designed using classical techniques. Classical techniques relied heavily on manual design practices for the synthesis, minimization, and interfacing of digital systems. Corresponding to this design style, academic textbooks were developed that taught classical digital design techniques. Around 1990, large-scale digital systems began being designed using hardware description languages (HDL) and automated synthesis tools. Broad-scale adoption of this modern design approach spread through the industry during this decade. Around 2000, hardware description languages and the modern digital design approach began to be taught in universities, mainly at the senior and graduate level. There were a variety of reasons that the modern digital design approach did not penetrate the lower levels of academia during this time. First, the design and simulation tools were difficult to use and overwhelmed freshman and sophomore students. Second, the ability to implement the designs in a laboratory setting was infeasible. The modern design tools at the time were targeted at custom integrated circuits, which are cost- and time-prohibitive to implement in a university setting. Between 2000 and 2005, rapid advances in programmable logic and design tools allowed the modern digital design approach to be implemented in a university setting, even in lower-level courses. This allowed students to learn the modern design approach based on HDLs and prototype their designs in real hardware, mainly field programmable gate arrays (FPGAs). This spurred an abundance of textbooks to be authored teaching hardware description languages and higher levels of design abstraction. This trend has continued until today. While abstraction is a critical tool for engineering design, the rapid movement toward teaching only the modern digital design techniques has left a void for freshman- and sophomore-level courses in digital circuitry. Legacy textbooks that teach the classical design approach are outdated and do not contain sufficient coverage of HDLs to prepare the students for follow-on classes. Newer textbooks that teach the modern digital design approach move immediately into high-level behavioral modeling with minimal or no coverage of the underlying hardware used to implement the systems. As a result, students are not being provided the resources to understand the fundamental hardware theory that lies beneath the modern abstraction such as interfacing, gate-level implementation, and technology optimization. Students moving too rapidly into high levels of abstraction have little understanding of what is going on when they click the “compile and synthesize” button of their design tool. This leads to graduates who can model a breadth of different systems in an HDL but have no depth into how the system is implemented in hardware. This becomes problematic when an issue arises in a real design and there is no foundational knowledge for the students to fall back on in order to debug the problem

Open Library

Computer Science Principles with Java

Author: Bergmann Seth D
Publication venue: Rowan Digital Works
Publication date: 15/09/2020
Field of study

This textbook is intended to be used for a first course in computer science, such as the College Board’s Advanced Placement course known as AP Computer Science Principles (CSP). This book includes all the topics on the CSP exam, plus some additional topics. It takes a breadth-first approach, with an emphasis on the principles which form the foundation for hardware and software. No prior experience with programming should be required to use this book. This version of the book uses the Java programming language.https://rdw.rowan.edu/oer/1018/thumbnail.jp

Rowan University

A performance study of X25519 on Cortex-M3 and M4

Author: de Groot W.
Publication venue
Publication date: 01/01/2015
Field of study

Repository TU/e

Pure OAI Repository

Recommended from our members

New methods for finite field arithmetic

Author: Yanik Tuğrul
Publication venue: 'Oregon State University'
Publication date
Field of study

We describe novel methods for obtaining fast software implementations of the arithmetic operations in the finite field GF(p) and GF(p[superscript k]). In GF(p) we realize an extensive speedup in modular addition and subtraction routines and some small speedup in the modular multiplication routine with an arbitrary prime modulus p which is of arbitrary length. The most important feature of the method is that it avoids bit-level operations which are slow on microprocessors and performs word-level operations which are significantly faster. The proposed method has applications in public-key cryptographic algorithms defined over the finite field GF(p), most notably the elliptic curve digital signature algorithm. The new method provides up to 13% speedup in the execution of the ECDSA algorithm over the field GF(p) for the length of p in the range 161≤k≤256. In the finite extension field GF(p[superscript k]) we describe two new methods for obtaining fast software implementations of the modular multiplication operation with an arbitrary prime modulus p, which has less bit-length than the word-length of a microprocessor and an arbitrary generator polynomial. The second algorithm is a significant improvement over the first algorithm by using the same concepts introduced in GF(p) arithmetic

ScholarsArchive@OSU