28 research outputs found

    High Performance On-Chip Interconnects Design for Future Many-Core Architectures

    Get PDF
    Switch-based Network-on-Chip (NoC) is a widely accepted inter-core communication infrastructure for Chip Multiprocessors (CMPs). With the continued advance of CMOS technology, the number of cores on a single chip keeps increasing at a rapid pace. It is highly expected that many-core architectures with more than hundreds of processor cores will be commercialized in the near future. In such a large scale CMP system, NoC overheads are more dominant than computation power in determining overall system performance. Also, for modern computational workloads requiring abundant thread level parallelism (TLP), NoC design for highly-parallel, many-core accelerators such as General Purpose Graphics Processing Units (GPGPUs) is of primary importance in harnessing the potential of massive thread- and data-level parallelism. In these contexts, it is critical that NoC provides both low latency and high bandwidth within limited on-chip power and area budgets. In this dissertation, we explore various design issues inherent in future many-core architectures, CMPs and GPGPUs, to achieve both high performance and power efficiency. First, we deal with issues in using a promising next generation memory technology, Spin-Transfer Torque Magnetic RAM (STT-MRAM), for NoC input buffers in CMPs. Using a high density and low leakage memory offers more buffer capacities with the same die footprint, thus helping increase network throughput in NoC routers. However, its long latency and high power consumption in write operations still need to be addressed. Thus, we propose a hybrid design of input buffers using both SRAM and STT-MRAM to hide the long write latency efficiently. Considering that simple data migration in the hybrid buffer consumes more dynamic power compared to SRAM, we provide a lazy migration scheme that reduces the dynamic power consumption of the hybrid buffer. Second, we propose the first NoC router design that uses only STT-MRAM, providing much larger buffer space with less power consumption, while preserving data integrity. To hide the multicycle writes, we employ a multibank STT-MRAM buffer, a virtual channel with multiple banks where every incoming flit is seamlessly pipelined to each bank alternately. Our STT-MRAM design has aggressively reduced the retention time, resulting in a significant reduction in the latency and power overheads of write operations. To ensure data integrity against inadvertent bit flips from the thermal fluctuation during the given retention time, we propose a cost-efficient dynamic buffer refresh scheme combined with Error Correcting Codes (ECC) to detect and correct data corruption. Third, we present schemes for bandwidth-efficient on-chip interconnects in GPGPUs. GPGPUs place a heavy demand on the on-chip interconnect between the many cores and a few memory controllers (MCs). Thus, traffic is highly asymmetric, impacting on-chip resource utilization and system performance. Here, we analyze the communication demands of typical GPGPU applications, and propose efficient NoC designs to meet those demands

    Grover on GIFT

    Get PDF
    Grover search algorithm can be used to find the nn-bit secret key at the speed of n\sqrt{n}, which is the most effective quantum attack method for block ciphers. In order to apply the Grover search algorithm, the target block cipher should be implemented in quantum circuits. Many recent research works optimized the expensive substitute layer to evaluate the need for quantum resources of AES block ciphers. Research on the implementation of quantum circuits for lightweight block ciphers such as SIMON, SPECK, HIGHT, CHAM, LEA, and Gimli, an active research field, is also gradually taking place. In this paper, we present optimized implementations of GIFT block ciphers for quantum computers. To the best of our knowledge, this is the first implementation of GIFT in quantum circuits. Finally, we estimate quantum resources for applying the Grover algorithm to the our optimized GIFT quantum circuit

    Parallel Quantum Addition for Korean Block Cipher

    Get PDF
    Adversaries using quantum computers can employ new attacks on cryptography that are not possible with classical computers. Grover\u27s search algorithm, a well-known quantum algorithm, can reduce the search complexity of O(2n)O(2^n) to 2n\sqrt{2^n} for symmetric key cryptography using an nn-bit key. To apply the Grover search algorithm, the target encryption process must be implemented as a quantum circuit. In this paper, we present optimized quantum circuits for Korean block ciphers based on ARX architectures. We adopt the optimal quantum adder and design in parallel way with only a few trade-offs between quantum resources. As a result, we provide a performance improvement of 78\% in LEA, 85\% in HIGHT, and 70\% in CHAM in terms of circuit depth, respectively. Finally, we estimate the cost of the Grover key search for Korean block ciphers and evaluate the post-quantum security based on the criteria presented by NIST

    Grover on SPEEDY

    Get PDF
    With the advent of quantum computers, revisiting the security of cryptography has been an active research area in recent years. In this paper, we estimate the cost of applying Grover\u27s algorithm to SPEEDY block cipher. SPEEDY is a family of ultra-low-latency block ciphers presented in CHES\u2721. It is ensured that the key search equipped with Grover\u27s algorithm reduces the nn-bit security of the block cipher to n2\frac{n}{2}-bit. The issue is how many quantum resources are required for Grover\u27s algorithm to work. NIST estimates the post-quantum security strength for symmetric key cryptography as the cost of Grover key search algorithm. SPEEDY provides 128-bit security or 192-bit security depending on the number of rounds. Based on our estimated cost, we present that increasing the number of rounds is insufficient to satisfy the security against attacks on quantum computers. To the best of our knowledge, this is the first implementation of SPEEDY as a quantum circuit

    All the Polynomial Multiplication You Need on RISC-V

    Get PDF
    Polynomial multiplication is a core operation for public key cryptography, such as pre-quantum cryptography (e.g. elliptic curve cryptography) and post-quantum cryptography (e.g. code-based cryptography and multivariate-based cryptography). For this reason, the efficient and secure implementation of polynomial multiplication has been actively conducted for high availability and security level in application services. In this paper, we present all polynomial multiplication methods on modern 32-bit RISC-V processors. We re-designed expensive implementations of polynomial multiplication on legacy microcontrollers (e.g. 8-bit AVR, 16-bit MSP, and 32-bit ARM) for new instruction sets of 32-bit RISC-V processors. Secondly, we suggest the optimal operand length for each polynomial multiplication on 32-bit RISC-V processors. With this implementation technique and Karatsuba algorithm, we achieved scalable features, which ensures the polynomial multiplication in any operand lengths with reasonably fast performance. Third, we propose instruction set extensions for the optimal implementation of polynomial multiplication on 32-bit RISC-V processors. This new feature introduces significant performance enhancements. Lastly, the proposed implementation is a public domain and following researchers can easily re-produce the result

    Simpira Gets Simpler: Optimized Simpira on Microcontrollers

    Get PDF
    Simpira Permutation is a Permutation design using the AES algorithm. The AES algorithm is the most widely used in the world, and Intel has developed a hardware accelerated AES instruction set (AES-NI) to improve the performance of encryption. By using AES-NI, Simpira can be improved further. However, low-end processors that do not support AES-NI require efficient implementation of Simpira optimization. In this paper, we introduce a optimized implementation of a Simpira Permutation in 8-bit AVR microcontrollers and 32-bit RISC-V processors, that do not support the AES instruction set. We firstly pre-computed round keys and omitted the Addroundkey. Afterward, the MixColumn and InvMixColumn of the final round (i.e. 12-th), which were added unnecessarily due to characteristics of Simpira using AES-NI, were omitted. In the AVR microcontroller, the Addroundkey consists of 16 operations, but it has been optimized by eliminating operations where the value of roundkeys is \texttt{0x00}, omitting Addroundkey to 4 operations. In the RISC-V processor, it is implemented using a same optimization technique of AVR implementation. We have carried out experiments 8-bit ATmega128 microcontroller and 32-bit RISC-V processor, which shows up-to \texttt{5.76×\times and 37.01×\times} better performance enhancement than reference codes for the Simpira Permutation, respectively

    SPEEDY on Cortex--M3: Efficient Software Implementation of SPEEDY on ARM Cortex--M3

    Get PDF
    The SPEEDY block cipher suite announced at CHES 2021 shows excellent hardware performance. However, SPEEDY was not designed to be efficient in software implementations. SPEEDY\u27s 6-bit sbox and bit permutation operations generally do not work efficiently in software. We implemented SPEEDY block cipher by applying the implementation technique of bit slicing. As an implementation technique of bit slicing, SPEEDY can be operated in software very efficiently and can be applied in microcontroller. By calculating the round key in advance, the performance on ARM Cortex-M3 for SPEEDY-5-192, SPEEDY-6-192, and SPEEDY-7-192 are 65.7, 75.25, and 85.16 clock cycles per byte (i.e. cpb), respectively. It showed better performance than AES-128 constant-time implementation and GIFT constant-time implementation in the same platform. Through this, we conclude that SPEEDY can show good performance on embedded environments

    K-XMSS and K-SPHINCS+^+:Hash based Signatures with\\Korean Cryptography Algorithms

    Get PDF
    Hash-Based Signature (HBS) uses a hash function to construct a digital signature scheme, where its security is guaranteed by the collision resistance of the hash function used. To provide sufficient security in the post-quantum environment, the length of hash should be satisfied. Modern HBS can be classified into stateful and stateless schemes. Two representative stateful and stateless HBS are XMSS and SPHINCS+^+, respectively. In this paper, we propose two HBS schemes: K-XMSS and K-SPHINCS+^+, which replace internal hash functions of XMSS and SPHINCS+^+ with Korean cryptography algorithms. K-XMSS is a stateful signature, while K-SPHINCS+^+ is its stateless counterpart. We also showcase the reference implementation of K-XMSS and K-SPHINCS+^+ employing LSH and two hash functions based on block ciphers (i.e. CHAM and LEA) as the internal hash function. The reference code is developed as a proof-of-concept, which can be optimized for better performance using advanced implementation techniques (e.g. AVX2 and NEON)

    Highly Sensitive and Real-Time Detection of Zinc Oxide Nanoparticles Using Quartz Crystal Microbalance via DNA Induced Conjugation

    No full text
    With the development of nanotechnology, nanomaterials have been widely used in the development of commercial products. In particular, zinc oxide nanoparticles (ZnONPs) have been of great interest due to their extraordinary properties, such as semiconductive, piezoelectric, and absorbance properties in UVA and UVB (280–400 nm) spectra. However, recent studies have investigated the toxicity of these ZnONPs; therefore, a ZnONP screening tool is required for human health and environmental problems. In this study, we propose a detection method for ZnONPs using quartz crystal microbalance (QCM) and DNA. The detection method was based on the resonance frequency shift of the QCM. In detail, two different complementary DNA strands were used to conjugate ZnONPs, which were subjected to mass amplification. One of these DNA strands was designed to hybridize to a probe DNA immobilized on the QCM electrode. By introducing the ZnONP conjugation, we were able to detect ZnONPs with a detection limit of 100 ng/mL in both distilled water and a real sample of drinking water, which is 3 orders less than the reported critical harmful concentration of ZnONPs. A phosphate terminal group, which selectively interacts with a zinc oxide compound, was also attached at one end of a DNA linker and was attributed to the selective detection of ZnONPs. As a result, better selective detection of ZnONPs was achieved compared to gold and silicon nanoparticles. This work demonstrated the potential of our proposed method as a ZnONP screening tool in real environmental water systems
    corecore