105 research outputs found

    Low-Latency Elliptic Curve Scalar Multiplication

    Get PDF
    This paper presents a low-latency algorithm designed for parallel computer architectures to compute the scalar multiplication of elliptic curve points based on approaches from cryptographic side-channel analysis. A graphics processing unit implementation using a standardized elliptic curve over a 224-bit prime field, complying with the new 112-bit security level, computes the scalar multiplication in 1.9ms on the NVIDIA GTX 500 architecture family. The presented methods and implementation considerations can be applied to any parallel 32-bit architectur

    Rethinking Modular Multi-Exponentiation in Real-World Applications

    Get PDF
    The importance of efficient multi-exponen- tiation algorithms in a large spectrum of cryptographic applications continues to grow. Previous literature on the subject pays attention exclusively on the mini- mization of the number of modular multiplications. However, a small reduction of the multiplicative com- plexity can be easily overshadowed by other figures of merit. In this article, we demonstrate that the most efficient algorithm for computing multi-exponentiation changes if considering execution time instead of number of multi-exponentiations. We focus our work on two al- gorithms that perform best under the number of multi- exponentiation metric and show that some side opera- tions affects their theoretical ranking. We provide this analysis on different hardware, such as Intel Core and ARM CPUs and the two latest generations of Rasp- berry Pis, to show how the machine chosen affects the execution time of multi-exponentiation

    Cryptographic primitives on reconfigurable platforms.

    Get PDF
    Tsoi Kuen Hung.Thesis (M.Phil.)--Chinese University of Hong Kong, 2002.Includes bibliographical references (leaves 84-92).Abstracts in English and Chinese.Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Motivation --- p.1Chapter 1.2 --- Objectives --- p.3Chapter 1.3 --- Contributions --- p.3Chapter 1.4 --- Thesis Organization --- p.4Chapter 2 --- Background and Review --- p.6Chapter 2.1 --- Introduction --- p.6Chapter 2.2 --- Cryptographic Algorithms --- p.6Chapter 2.3 --- Cryptographic Applications --- p.10Chapter 2.4 --- Modern Reconfigurable Platforms --- p.11Chapter 2.5 --- Review of Related Work --- p.14Chapter 2.5.1 --- Montgomery Multiplier --- p.14Chapter 2.5.2 --- IDEA Cipher --- p.16Chapter 2.5.3 --- RC4 Key Search --- p.17Chapter 2.5.4 --- Secure Random Number Generator --- p.18Chapter 2.6 --- Summary --- p.19Chapter 3 --- The IDEA Cipher --- p.20Chapter 3.1 --- Introduction --- p.20Chapter 3.2 --- The IDEA Algorithm --- p.21Chapter 3.2.1 --- Cipher Data Path --- p.21Chapter 3.2.2 --- S-Box: Multiplication Modulo 216 + 1 --- p.23Chapter 3.2.3 --- Key Schedule --- p.24Chapter 3.3 --- FPGA-based IDEA Implementation --- p.24Chapter 3.3.1 --- Multiplication Modulo 216 + 1 --- p.24Chapter 3.3.2 --- Deeply Pipelined IDEA Core --- p.26Chapter 3.3.3 --- Area Saving Modification --- p.28Chapter 3.3.4 --- Key Block in Memory --- p.28Chapter 3.3.5 --- Pipelined Key Block --- p.30Chapter 3.3.6 --- Interface --- p.31Chapter 3.3.7 --- Pipelined Design in CBC Mode --- p.31Chapter 3.4 --- Summary --- p.32Chapter 4 --- Variable Radix Montgomery Multiplier --- p.33Chapter 4.1 --- Introduction --- p.33Chapter 4.2 --- RSA Algorithm --- p.34Chapter 4.3 --- Montgomery Algorithm - Ax B mod N --- p.35Chapter 4.4 --- Systolic Array Structure --- p.36Chapter 4.5 --- Radix-2k Core --- p.37Chapter 4.5.1 --- The Original Kornerup Method (Bit-Serial) --- p.37Chapter 4.5.2 --- The Radix-2k Method --- p.38Chapter 4.5.3 --- Time-Space Relationship of Systolic Cells --- p.38Chapter 4.5.4 --- Design Correctness --- p.40Chapter 4.6 --- Implementation Details --- p.40Chapter 4.7 --- Summary --- p.41Chapter 5 --- Parallel RC4 Engine --- p.42Chapter 5.1 --- Introduction --- p.42Chapter 5.2 --- Algorithms --- p.44Chapter 5.2.1 --- RC4 --- p.44Chapter 5.2.2 --- Key Search --- p.46Chapter 5.3 --- System Architecture --- p.47Chapter 5.3.1 --- RC4 Cell Design --- p.47Chapter 5.3.2 --- Key Search --- p.49Chapter 5.3.3 --- Interface --- p.50Chapter 5.4 --- Implementation --- p.50Chapter 5.4.1 --- RC4 cell --- p.51Chapter 5.4.2 --- Floorplan --- p.53Chapter 5.5 --- Summary --- p.53Chapter 6 --- Blum Blum Shub Random Number Generator --- p.55Chapter 6.1 --- Introduction --- p.55Chapter 6.2 --- RRNG Algorithm . . --- p.56Chapter 6.3 --- PRNG Algorithm --- p.58Chapter 6.4 --- Architectural Overview --- p.59Chapter 6.5 --- Implementation --- p.59Chapter 6.5.1 --- Hardware RRNG --- p.60Chapter 6.5.2 --- BBS PRNG --- p.61Chapter 6.5.3 --- Interface --- p.66Chapter 6.6 --- Summary --- p.66Chapter 7 --- Experimental Results --- p.68Chapter 7.1 --- Design Platform --- p.68Chapter 7.2 --- IDEA Cipher --- p.69Chapter 7.2.1 --- Size of IDEA Cipher --- p.70Chapter 7.2.2 --- Performance of IDEA Cipher --- p.70Chapter 7.3 --- Variable Radix Systolic Array --- p.71Chapter 7.4 --- Parallel RC4 Engine --- p.75Chapter 7.5 --- BBS Random Number Generator --- p.76Chapter 7.5.1 --- Size --- p.76Chapter 7.5.2 --- Speed --- p.76Chapter 7.5.3 --- External Clock --- p.77Chapter 7.5.4 --- Random Performance --- p.78Chapter 7.6 --- Summary --- p.78Chapter 8 --- Conclusion --- p.81Chapter 8.1 --- Future Development --- p.83Bibliography --- p.8

    Desenvolvimento de Hardware de Criptografia RSA em Linguagem Verilog.

    Get PDF
    Este trabalho apresenta um modelo e arquitetura de um hardware desenvolvido em linguagem Verilog para exponenciação modular do algoritmo de criptografia assimétrica RSA. Uma breve discussão sobre os tipos de criptografia e a linguagem Verilog é apresentada no início da dissertação. A criptografia RSA e os algoritmos utilizados são apresentados através de pseudocódigos ao longo deste trabalho e implementações em linguagem C e Java são apresentadas nos anexos. A arquitetura desenvolvida foi baseada nos algoritmos apresentados e arquiteturas difundidas nas literaturas com modificações para melhorar o desempenho. Uma arquitetura de 4 bits é apresentada com todos os blocos e interligações comentadas. Um exemplo de cifragem para esta arquitetura com a discussão do caminho dos dados na arquitetura é realizada. Em seguida uma arquitetura para 1024 bits é proposta e exemplificada através de um processo de cifragem e decifragem. O coprocessador RSA foi implementado com um conjunto de células básicas da tecnologia de 0,18m CMOS IBM7SF. Esta implementação realiza um processo de cifragem ou decifragem de 1024 bits em 8,44 ms e o throughput medido na sua máxima frequência de operação é de 121,269 Kbps

    Security challenges and opportunities in adaptive and reconfigurable hardware

    Get PDF
    We present a novel approach to building hardware support for providing strong security guarantees for computations running in the cloud (shared hardware in massive data centers), while maintaining the high performance and low cost that make cloud computing attractive in the first place. We propose augmenting regular cloud servers with a Trusted Computation Base (TCB) that can securely perform high-performance computations. Our TCB achieves cost savings by spreading functionality across two paired chips. We show that making a Field-Programmable Gate Array (FPGA) a part of the TCB benefits security and performance, and we explore a new method for defending the computation inside the TCB against side-channel attacks.Northrop Grumman CorporationQuanta Computer (Firm

    Montgomery Arithmetic from a Software Perspective

    Get PDF
    This chapter describes Peter L. Montgomery\u27s modular multiplication method and the various improvements to reduce the latency for software implementations on devices which have access to many computational units

    Constructing cluster of simple FPGA boards for cryptologic computations

    Get PDF
    In this thesis, we propose an FPGA cluster infrastructure which can be utilized in implementing cryptanalytic attacks and accelerating cryptographic operations. The cluster can be formed using simple and inexpensive, off-the-shelf FPGA boards featuring an FPGA device, local storage, CPLD, and network connection. Forming the cluster is simple and no effort for the hardware development is needed except for the hardware design for the actual computation. Using a softcore processor on FPGA, we are able to configure FPGA devices dynamically and change their configuration on the fly from a remote computer. The softcore on FPGA can execute relatively complicated programs for mundane tasks unworthy of FPGA resources. Finally, we propose and implement a fast and efficient dynamic configuration switch technique that is shown to be useful especially in cryptanalytic applications. Our infrastructure provides a cost-effective alternative for formerly proposed cryptanalytic engines based on FPGA devices

    Side Channels in the Cloud: Isolation Challenges, Attacks, and Countermeasures

    Get PDF
    Cloud computing is based on the sharing of physical resources among several virtual machines through a virtualization layer providing software isolation. Despite advances in virtualization, data security and isolation guarantees remain important challenges for cloud providers. Some of the most prominent isolation violations come from side-channel attacks that aim at exploiting and using a leaky channel to obtain sensitive data such as encryption keys. Such channels may be created by vulnerable implementations of cryptographic algorithms, exploiting weaknesses of processor architectures or of resource sharing in the virtualization layer. In this paper, we provide a comprehensive survey of side-channel attacks (SCA) and mitigation techniques for virtualized environments, focusing on cache-based attacks. We review isolation challenges, attack classes and techniques. We also provide a layer-based taxonomy of applicable countermeasures , from the hardware to the application level, with an assessment of their effectiveness
    corecore