20 research outputs found

    Evaluating KpqC Algorithm Submissions: Balanced and Clean Benchmarking Approach

    Get PDF
    In 2022, a Korean domestic Post Quantum Cryptography contest called KpqC held, and the standard for Post Quantum Cryptography is set to be selected in 2024. In Round 1 of this competition, 16 algorithms have advanced and are competing. Algorithms submitted to KpqC introduce their performance, but direct performance comparison is difficult because all algorithms were measured in different environments. In this paper, we present the benchmark results of all KpqC algorithms in a single environment. To benchmark the algorithms, we removed the external library dependency of each algorithm. By removing dependencies, performance deviations due to external libraries can be eliminated, and source codes that can conveniently operate the KpqC algorithm can be provided to users who have difficulty setting up the environment

    Novel Approach to Cryptography Implementation using ChatGPT

    Get PDF
    ChatGPT, which emerged at the end of 2022, has gained significant attention as a highly advanced conversational artificial intelligence service. Developed by OpenAI, ChatGPT is a natural language processing model. There are instances where individuals might want to attempt programming using ChatGPT. In this paper, we utilized the ChatGPT to implement a cryptographic algorithms. Despite numerous trial and error efforts, it was possible to implement cryptography through ChatGPT. This implies that even without extensive coding skill or programming knowledge, one can implement cryptography through ChatGPT if they understand the cryptographic structure. However, the ability to analyze the source code is essential, as it is necessary to identify incorrect parts within the implemented code

    Optimized Implementation of Encapsulation and Decapsulation of Classic McEliece on ARMv8

    Get PDF
    Recently, the results of the NIST PQC contest were announced. Classic McEliece, one of the 3rd round candidates, was selected as the fourth round candidate. Classic McEliece is the only code-based cipher in the NIST PQC finalists in third round and the algorithm is regarded as secure. However, it has low efficiency. In this paper, we propose an efficient software implementation of Classic McEliece, a code-based cipher, on 64-bit ARMv8 processors. Classic McEliece can be divided into Key Generation, Encapsulation, and Decapsulation. Among them, we propose an optimal implementation for Encapsulation and Decapsulation. Optimized Encapsulation implementation utilizes vector registers to perform 16-byte parallel operations, and optimize using the specificity of the identity matrix. Decapsulation implemented efficient Multiplication and Inversion on F2mF_2^m field. Compared with the previous results, Encapsulation showed the performance improvement of up-to 1.99× than the-state-of-art works

    Look-up the Rainbow: Efficient Table-based Parallel Implementation of Rainbow Signature on 64-bit ARMv8 Processors

    Get PDF
    Rainbow signature is one of the finalist in National Institute of Standards and Technology (NIST) standardization. It is also the only signature candidate that is designed based on multivariate quadratic hard problem. Rainbow signature is known to have very small signature size compared to other post-quantum candidates. In this paper, we propose an efficient implementation technique to improve performance of Rainbow signature schemes. A parallel polynomial-multiplication on a 64-bit ARMv8 processor was proposed, wherein a look-up table was created by pre-calculating the 4×44\times4 multiplication results. This technique was developed based on the observation that the existing implementation of Rainbow\u27s polynomial-multiplication relies on the Karatsuba algorithm. It is not optimal due to the divide and conquer steps involved, whereby operations on F16\mathbb{F}_{16} are divided into many small sub-fields of F4\mathbb{F}_{4} and F2\mathbb{F}_{2}. Further investigations reveal that when the polynomial-multiplication in Rainbow signature is operated on F16\mathbb{F}_{16}, its operand is in 4-bit. Since the maximum combinations of a 4×44 \times 4 multiplication is only 256, we constructed a 256-byte look-up table. According to the 4-bit constant, only 16-byte is loaded from the table at one time. The time-consuming multiplication is replaced by performing the table look-up. In addition, it calculates up-to 16 result values per register using characteristics of vector registers available on 64-bit ARMv8 processor. With the proposed fast polynomial-multiplication technique, we implemented the optimized Rainbow III and V. These two parameter sets are performed on F256\mathbb{F}_{256}, but they use sub-field F16\mathbb{F}_{16} in the multiplication process. Therefore, the sub-field multiplication can be replaced with the proposed table look-up technique, which in turn omitted a significant number of operations. We have carried out the experiments on the Apple M1 processor, which shows up to 167.2×\times and 51.6×\times better performance enhancement at multiplier, and Rainbow signatures, respectively, compared to the previous implementation

    Optimized Quantum Circuit for Quantum Security Strength Analysis of Argon2

    Get PDF
    This paper explores the optimization of quantum circuits for Argon2, a memory-hard function used for password hashing and other applications. With the rise of quantum computers, the security of classical cryptographic systems is at risk. It emphasizes the need to accurately measure the quantum security strength of cryptographic schemes using optimized quantum circuits. The proposed method focuses on two perspectives: qubit reduction (qubit optimization) and depth reduction (depth optimization). The qubit-optimized quantum circuit was designed to find a point where an appropriate inverse is possible and reuses the qubit through the inverse to minimize the number of qubits. The start point and end point of the inverse are set by finding a point where qubits can be reused with minimal computation. The depth-optimized quantum circuit reduces the depth by using the minimum number of qubits as necessary without performing an inverse operation. The trade-off between qubit and depth is confirmed by modifying the internal structure of the circuits and the quantum adders. Qubit optimization achieved up to a 12,229 qubit reduction, while depth optimization resulted in approximately 196,741 (approximately 69.02%) depth reduction. In conclusion, this research demonstrates the importance of implementing and analyzing quantum circuits from various optimization perspectives. The results contribute to the post-quantum strength analysis of Argon2 and provide valuable insights for future research on quantum circuit design, considering the appropriate trade-offs of quantum resources in response to advancements in quantum computing technology

    High-speed Implementation of AIM symmetric primitives within AIMer digital signature

    Get PDF
    Recently, as quantum computing technology develops, the importance of quantum resistant cryptography technology is increasing. AIMer is a quantum-resistant cryptographic algorithm that was selected as the first candidate in the electronic signature section of the KpqC Contest, and uses symmetric primitive AIM. In this paper, we propose a high-speed implementation technique of symmetric primitive AIM and evaluate the performance of the implementation. The proposed techniques are two methods, a Mer operation optimization technique and a linear layer operation simplification technique, and as a result of performance measurement, it achieved a performance improvement of up to 97.9% compared to the existing reference code. This paper is the first study to optimize the implementation of AIM

    Grover on SPEEDY

    Get PDF
    With the advent of quantum computers, revisiting the security of cryptography has been an active research area in recent years. In this paper, we estimate the cost of applying Grover\u27s algorithm to SPEEDY block cipher. SPEEDY is a family of ultra-low-latency block ciphers presented in CHES\u2721. It is ensured that the key search equipped with Grover\u27s algorithm reduces the nn-bit security of the block cipher to n2\frac{n}{2}-bit. The issue is how many quantum resources are required for Grover\u27s algorithm to work. NIST estimates the post-quantum security strength for symmetric key cryptography as the cost of Grover key search algorithm. SPEEDY provides 128-bit security or 192-bit security depending on the number of rounds. Based on our estimated cost, we present that increasing the number of rounds is insufficient to satisfy the security against attacks on quantum computers. To the best of our knowledge, this is the first implementation of SPEEDY as a quantum circuit

    Optimized Implementation of SM4 on AVR Microcontrollers, RISC-V Processors, and ARM Processors

    Get PDF
    The SM4 block cipher is a Chinese domestic crpytographic that was introduced in 2003. Since the algorithm was developed for the use in wireless sensor networks, it is mandated in the Chinese National Standard for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure). The SM4 block cipher uses a 128-bit block size and a 32-bit round key. This consists of 32 rounds and one reverse translation \texttt{R}. In this paper, we present the optimized implementation of the SM4 block cipher on 8-bit AVR microcontrollers, which are widely used in wireless sensor devices, the optimized implementation of the SM4 block cipher on 32-bit RISC-V processors, which are open-source based computer architectures, and the optimized implementation of SM4 on 64-bit ARM processors with the parallel computation, which are widely used in smartphone and tablet. In the AVR microcontroller, it is implemented in three versions, including speed-optimization, memory-optimization, and code-optimization. As a result, speed-optimization, memory-optimization, and code-optimization achieved 205.2 cycles per byte, 213.3 cycles per byte and 207.4 cycles per byte, respectively. This is faster than the reference implementation written in C (1670.7 cycles per byte). The implementation on 32-bit RISC-V processors 128.8 cycles per byte. This is faster than the reference C code implementation (345.7 cycles per byte). The implementation on 64-bit ARM processors is 8.62 cycles per byte. This is faster than the reference C code implementation (120.07 cycles per byte)

    SPEEDY on Cortex--M3: Efficient Software Implementation of SPEEDY on ARM Cortex--M3

    Get PDF
    The SPEEDY block cipher suite announced at CHES 2021 shows excellent hardware performance. However, SPEEDY was not designed to be efficient in software implementations. SPEEDY\u27s 6-bit sbox and bit permutation operations generally do not work efficiently in software. We implemented SPEEDY block cipher by applying the implementation technique of bit slicing. As an implementation technique of bit slicing, SPEEDY can be operated in software very efficiently and can be applied in microcontroller. By calculating the round key in advance, the performance on ARM Cortex-M3 for SPEEDY-5-192, SPEEDY-6-192, and SPEEDY-7-192 are 65.7, 75.25, and 85.16 clock cycles per byte (i.e. cpb), respectively. It showed better performance than AES-128 constant-time implementation and GIFT constant-time implementation in the same platform. Through this, we conclude that SPEEDY can show good performance on embedded environments

    Simpira Gets Simpler: Optimized Simpira on Microcontrollers

    Get PDF
    Simpira Permutation is a Permutation design using the AES algorithm. The AES algorithm is the most widely used in the world, and Intel has developed a hardware accelerated AES instruction set (AES-NI) to improve the performance of encryption. By using AES-NI, Simpira can be improved further. However, low-end processors that do not support AES-NI require efficient implementation of Simpira optimization. In this paper, we introduce a optimized implementation of a Simpira Permutation in 8-bit AVR microcontrollers and 32-bit RISC-V processors, that do not support the AES instruction set. We firstly pre-computed round keys and omitted the Addroundkey. Afterward, the MixColumn and InvMixColumn of the final round (i.e. 12-th), which were added unnecessarily due to characteristics of Simpira using AES-NI, were omitted. In the AVR microcontroller, the Addroundkey consists of 16 operations, but it has been optimized by eliminating operations where the value of roundkeys is \texttt{0x00}, omitting Addroundkey to 4 operations. In the RISC-V processor, it is implemented using a same optimization technique of AVR implementation. We have carried out experiments 8-bit ATmega128 microcontroller and 32-bit RISC-V processor, which shows up-to \texttt{5.76×\times and 37.01×\times} better performance enhancement than reference codes for the Simpira Permutation, respectively
    corecore