34 research outputs found

    Efficient Arithmetic on ARM-NEON and Its Application for High-Speed RSA Implementation

    Get PDF
    Advanced modern processors support Single Instruction Multiple Data (SIMD) instructions (e.g. Intel-AVX, ARM-NEON) and a massive body of research on vector-parallel implementations of modular arithmetic, which are crucial components for modern public-key cryptography ranging from RSA, ElGamal, DSA and ECC, have been conducted. In this paper, we introduce a novel Double Operand Scanning (DOS) method to speed-up multi-precision squaring with non-redundant representations on SIMD architecture. The DOS technique partly doubles the operands and computes the squaring operation without Read-After-Write (RAW) dependencies between source and destination variables. Furthermore, we presented Karatsuba Cascade Operand Scanning (KCOS) multiplication and Karatsuba Double Operand Scanning (KDOS) squaring by adopting additive and subtractive Karatsuba\u27s methods, respectively. The proposed multiplication and squaring methods are compatible with separated Montgomery algorithms and these are highly efficient for RSA crypto system. Finally, our proposed multiplication/squaring, separated Montgomery multiplication/squaring and RSA encryption outperform the best-known results by 22/41\%, 25/33\% and 30\% on the Cortex-A15 platform

    NEON PQCryto: Fast and Parallel Ring-LWE Encryption on ARM NEON Architecture

    Get PDF
    Recently, ARM NEON architecture has occupied a significant share of tablet and smartphone markets due to its low cost and high performance. This paper studies efficient techniques of lattice-based cryptography on ARM processor and presents the first implementation of ring-LWE encryption on ARM NEON architecture. In particular, we propose a vectorized version of Iterative Number Theoretic Transform (NTT) for high-speed computation. We present a 32-bit variant of SAMS2 technique, original proposed in CHES’15, for fast reduction. A combination of proposed and previous optimizations results in a very efficient implementation. For 128-bit security level, our ring-LWE implementation requires only 145; 200 clock cycles for encryption and 32; 800 cycles for decryption. These result are more than 17:6 times faster than the fastest ECC implementation on ARM NEON with same security level

    Compact Implementations of LEA Block Cipher for Low-End Microprocessors

    Get PDF
    In WISA\u2713, a novel lightweight block cipher named LEA was released. This algorithm has certain useful features for hardware and software implementations, i.e., simple ARX operations, non-S-box architecture, and 32-bit word size. These features are realized in several platforms for practical usage with high performance and low overheads. In this paper, we further improve 128-, 192- and 256-bit LEA encryption for low-end embedded processors. Firstly we present speed optimization methods. The methods split a 32-bit word operation into four byte-wise operations and avoid several rotation operations by taking advantages of efficient byte-wise rotations. Secondly we reduce the code size to ensure minimum code size.We nd the minimum inner loops and optimize them in an instruction set level. After then we construct the whole algorithm in a partly unrolled fashion with reasonable speed. Finally, we achieved the fastest LEA implementations, which improves performance by 10.9% than previous best known results. For size optimization, our implemen- tation only occupies the 280B to conduct LEA encryption. After scaling, our implementation achieved the smallest ARX implementations so far, compared with other state-of-art ARX block ciphers such as SPECK and SIMON

    Binary Field Multiplication on ARMv8

    Get PDF
    In this paper, we show efficient implementations of binary field multiplication over ARMv8. We exploit an advanced 64-bit polynomial multiplication (\texttt{PMULL}) supported by ARMv8 and conduct multiple levels of asymptotically faster Karatsuba multiplication. Finally, our method conducts binary field multiplication within 57 clock cycles for B-251. Our proposed method on ARMv8 improves the performance by a factor of 5.55.5 times than previous techniques on ARMv7

    Secure Binary Field Multiplication

    Get PDF
    Binary eld multiplication is the most fundamental building block of binary eld Elliptic Curve Cryptography (ECC) and Galois/Counter Mode (GCM). Both bit-wise scanning and Look-Up Table (LUT) based methods are commonly used for binary eld multiplication. In terms of Side Channel Attack (SCA), bit-wise scanning exploits insecure branch operations which leaks information in a form of timing and power consumption. On the other hands, LUT based method is regarded as a relatively secure approach because LUT access can be conducted in a regular and atomic form. This ensures a constant time solution as well. In this paper, we conduct the SCA on the LUT based binary eld multiplication. The attack exploits the horizontal Correlation Power Analysis (CPA) on weights of LUT. We identify the operand with only a power trace of binary eld multiplication. In order to prevent SCA, we also suggest a mask based binary eld multiplication which ensures a regular and constant time solution without LUT and branch statements

    Efficient Ring-LWE Encryption on 8-bit AVR Processors

    Get PDF
    Public-key cryptography based on the ``ring-variant\u27\u27 of the Learning with Errors (ring-LWE) problem is both efficient and believed to remain secure in a post-quantum world. In this paper, we introduce a carefully-optimized implementation of a ring-LWE encryption scheme for 8-bit AVR processors like the ATxmega128. Our research contributions include several optimizations for the Number Theoretic Transform (NTT) used for polynomial multiplication. More concretely, we describe the Move-and-Add (MA) and the Shift-Add-Multiply-Subtract-Subtract (SAMS2) technique to speed up the performance-critical multiplication and modular reduction of coefficients, respectively. We take advantage of incompletely-reduced intermediate results to minimize the total number of reduction operations and use a special coefficient-storage method to decrease the RAM footprint of NTT multiplications. In addition, we propose a byte-wise scanning strategy to improve the performance of a discrete Gaussian sampler based on the Knuth-Yao random walk algorithm. For medium-term security, our ring-LWE implementation needs 590k, 672k, and 276k clock cycles for key-generation, encryption, and decryption, respectively. On the other hand, for long-term security, the execution time of key-generation, encryption, and decryption amount to 2.2M, 2.6M, and 686k cycles, respectively. These results set new speed records for ring-LWE encryption on an 8-bit processor and outperform related RSA and ECC implementations by an order of magnitude

    Metastability-based Feedback Method for Enhancing FPGA-based TRNG

    No full text
    This paper presents a novel and efficient method to enhance the randomness of a Programmable Delay Line (PDL)-based True Random Number Generator (TRNG) by introducing Metastability-based Feedback scheme. As a principal tool for the security of sensor network, random number generator is one of important security primitives. At the CHES2011 conference, a new method of generating random numbers by inducing metastability using precise PDL and PI (proportional-Integral) control has been proposed. As the proposed TRNG could not achieve sufficient randomness on its own, a filtering scheme was used for higher entropy. Unfortunately, the intrinsic characteristics of filters made the throughput of the TRNG decrease by approximately 50\%. To preserve the original throughput and attain a high level of randomness, we present a simple solution that analyzes the probability of outputs and assign long time to metastable state. The proposed scheme shows high randomness in the NIST randomness test suite without any of the throughput loss caused by filtering

    Efficient Implementation of NIST-Compliant Elliptic Curve Cryptography for 8-bit AVR-Based Sensor Nodes

    No full text
    In this paper, we introduce a highly optimized software implementation of standards-compliant elliptic curve cryptography (ECC) for wireless sensor nodes equipped with an 8-bit AVR microcontroller. We exploit the state-of-the-art optimizations and propose novel techniques to further push the performance envelope of a scalar multiplication on the NIST P-192 curve. To illustrate the performance of our ECC software, we develope the prototype implementations of different cryptographic schemes for securing communication in a wireless sensor network, including elliptic curve Diffie-Hellman (ECDH) key exchange, the elliptic curve digital signature algorithm (ECDSA), and the elliptic curve Menezes-Qu-Vanstone (ECMQV) protocol. We obtain record-setting execution times for fixed-base, point variable-base, and double-base scalar multiplication. Compared with the related work, our ECDH key exchange achieves a performance gain of roughly 27% over the best previously published result using the NIST P-192 curve on the same platform, while our ECDSA performs twice as fast as the ECDSA implementation of the well-known TinyECC library. We also evaluate the impact of Karatsuba's multiplication technique on the overall execution time of a scalar multiplication. In addition to offering high performance, our implementation of scalar multiplication has a highly regular execution profile, which helps to protect against certain side-channel attacks. Our results show that NIST-compliant ECC can be implemented efficiently enough to be suitable for resource-constrained sensor nodes

    Reset Tree-Based Optical Fault Detection

    No full text
    In this paper, we present a new reset tree-based scheme to protect cryptographic hardware against optical fault injection attacks. As one of the most powerful invasive attacks on cryptographic hardware, optical fault attacks cause semiconductors to misbehave by injecting high-energy light into a decapped integrated circuit. The contaminated result from the affected chip is then used to reveal secret information, such as a key, from the cryptographic hardware. Since the advent of such attacks, various countermeasures have been proposed. Although most of these countermeasures are strong, there is still the possibility of attack. In this paper, we present a novel optical fault detection scheme that utilizes the buffers on a circuit’s reset signal tree as a fault detection sensor. To evaluate our proposal, we model radiation-induced currents into circuit components and perform a SPICE simulation. The proposed scheme is expected to be used as a supplemental security tool
    corecore