15 research outputs found

    Comparative Study of Keccak SHA-3 Implementations

    Get PDF
    This paper conducts an extensive comparative study of state-of-the-art solutions for im- plementing the SHA-3 hash function. SHA-3, a pivotal component in modern cryptography, has spawned numerous implementations across diverse platforms and technologies. This research aims to provide valuable insights into selecting and optimizing Keccak SHA-3 implementations. Our study encompasses an in-depth analysis of hardware, software, and software–hardware (hybrid) solutions. We assess the strengths, weaknesses, and performance metrics of each approach. Critical factors, including computational efficiency, scalability, and flexibility, are evaluated across differ- ent use cases. We investigate how each implementation performs in terms of speed and resource utilization. This research aims to improve the knowledge of cryptographic systems, aiding in the informed design and deployment of efficient cryptographic solutions. By providing a comprehensive overview of SHA-3 implementations, this study offers a clear understanding of the available options and equips professionals and researchers with the necessary insights to make informed decisions in their cryptographic endeavors

    High Level Synthesis and Evaluation of the Secure Hash Standard for FPGAs

    Get PDF
    Secure hash algorithms (SHAs) are important components of cryptographic applications. SHA performance on central processing units (CPUs) is slow, therefore, acceleration must be done using hardware such as Field Programmable Gate Arrays (FPGAs). Considerable work has been done in academia using FPGAs to accelerate SHAs. These designs were implemented using Hardware Description Language (HDL) based design methodologies, which are tedious and time consuming. High Level Synthesis (HLS) enables designers to synthesize optimized FPGA hardware from algorithm specifications in programming languages such as C/C++. This substantially reduces the design cost and time. In this thesis, the Altera SDK for OpenCL (AOCL) HLS tool was used to synthesize the SHAs on FPGAs and to explore the design space of the algorithms. The results were evaluated against the previous HDL based designs. Synthesized FPGA hardware performance was comparable to the HDL based designs despite the simpler and faster design process

    RISE: RISC-V SoC for En/decryption Acceleration on the Edge for Homomorphic Encryption

    Full text link
    Today edge devices commonly connect to the cloud to use its storage and compute capabilities. This leads to security and privacy concerns about user data. Homomorphic Encryption (HE) is a promising solution to address the data privacy problem as it allows arbitrarily complex computations on encrypted data without ever needing to decrypt it. While there has been a lot of work on accelerating HE computations in the cloud, little attention has been paid to the message-to-ciphertext and ciphertext-to-message conversion operations on the edge. In this work, we profile the edge-side conversion operations, and our analysis shows that during conversion error sampling, encryption, and decryption operations are the bottlenecks. To overcome these bottlenecks, we present RISE, an area and energy-efficient RISC-V SoC. RISE leverages an efficient and lightweight pseudo-random number generator core and combines it with fast sampling techniques to accelerate the error sampling operations. To accelerate the encryption and decryption operations, RISE uses scalable, data-level parallelism to implement the number theoretic transform operation, the main bottleneck within the encryption and decryption operations. In addition, RISE saves area by implementing a unified en/decryption datapath, and efficiently exploits techniques like memory reuse and data reordering to utilize a minimal amount of on-chip memory. We evaluate RISE using a complete RTL design containing a RISC-V processor interfaced with our accelerator. Our analysis reveals that for message-to-ciphertext conversion and ciphertext-to-message conversion, using RISE leads up to 6191.19X and 2481.44X more energy-efficient solution, respectively, than when using just the RISC-V processor

    HI-Kyber: A novel high-performance implementation scheme of Kyber based on GPU

    Get PDF
    CRYSTALS-Kyber, as the only public key encryption (PKE) algorithm selected by the National Institute of Standards and Technology (NIST) in the third round, is considered one of the most promising post-quantum cryptography (PQC) schemes. Lattice-based cryptography uses complex discrete alogarithm problems on lattices to build secure encryption and decryption systems to resist attacks from quantum computing. Performance is an important bottleneck affecting the promotion of post quantum cryptography. In this paper, we present a High-performance Implementation of Kyber (named HI-Kyber) on the NVIDIA GPUs, which can increase the key-exchange performance of Kyber to the million-level. Firstly, we propose a lattice-based PQC implementation architecture based on kernel fusion, which can avoid redundant global-memory access operations. Secondly, We optimize and implement the core operations of CRYSTALS-Kyber, including Number Theoretic Transform (NTT), inverse NTT (INTT), pointwise multiplication, etc. Especially for the calculation bottleneck NTT operation, three novel methods are proposed to explore extreme performance: the sliced layer merging (SLM), the sliced depth-first search (SDFS-NTT) and the entire depth-first search (EDFS-NTT), which achieve a speedup of 7.5%, 28.5%, and 41.6% compared to the native implementation. Thirdly, we conduct comprehensive performance experiments with different parallel dimensions based on the above optimization. Finally, our key exchange performance reaches 1,664 kops/s. Specifically, based on the same platform, our HI-Kyber is 3.52×\times that of the GPU implementation based on the same instruction set and 1.78×\times that of the state-of-the-art one based on AI-accelerated tensor core

    Lightweight wireless network authentication scheme for constrained oracle sensors

    Get PDF
    x, 212 leaves : ill. (some col.) ; 29 cmIncludes abstract and appendices.Includes bibliographical references (leaves 136-147).With the significant increase in the dependence of contextual data from constrained IoT, the blockchain has been proposed as a possible solution to address growing concerns from organizations. To address this, the Lightweight Blockchain Authentication for Constrained Sensors (LBACS) scheme was proposed and evaluated using quantitative and qualitative methods. LBACS was designed with constrained Wireless Sensor Networks (WSN) in mind and independent of a blockchain implementation. It asserts the authentication and provenance of constrained IoT on the blockchain utilizing a multi-signature approach facilitated by symmetric and asymmetric methods and sufficient considerations for key and certificate registry management. The metrics, threat assessment and comparison to existing WSN authentication schemes conducted asserted the pragmatic use of LBACS to provide authentication, blockchain provenance, integrity, auditable, revocation, weak backward and forward secrecy and universal forgeability. The research has several implications for the ubiquitous use of IoT and growing interest in the blockchain

    High-Performance VLSI Architectures for Lattice-Based Cryptography

    Get PDF
    Lattice-based cryptography is a cryptographic primitive built upon the hard problems on point lattices. Cryptosystems relying on lattice-based cryptography have attracted huge attention in the last decade since they have post-quantum-resistant security and the remarkable construction of the algorithm. In particular, homomorphic encryption (HE) and post-quantum cryptography (PQC) are the two main applications of lattice-based cryptography. Meanwhile, the efficient hardware implementations for these advanced cryptography schemes are demanding to achieve a high-performance implementation. This dissertation aims to investigate the novel and high-performance very large-scale integration (VLSI) architectures for lattice-based cryptography, including the HE and PQC schemes. This dissertation first presents different architectures for the number-theoretic transform (NTT)-based polynomial multiplication, one of the crucial parts of the fundamental arithmetic for lattice-based HE and PQC schemes. Then a high-speed modular integer multiplier is proposed, particularly for lattice-based cryptography. In addition, a novel modular polynomial multiplier is presented to exploit the fast finite impulse response (FIR) filter architecture to reduce the computational complexity of the schoolbook modular polynomial multiplication for lattice-based PQC scheme. Afterward, an NTT and Chinese remainder theorem (CRT)-based high-speed modular polynomial multiplier is presented for HE schemes whose moduli are large integers

    A Survey of Recent Developments in Testability, Safety and Security of RISC-V Processors

    Get PDF
    With the continued success of the open RISC-V architecture, practical deployment of RISC-V processors necessitates an in-depth consideration of their testability, safety and security aspects. This survey provides an overview of recent developments in this quickly-evolving field. We start with discussing the application of state-of-the-art functional and system-level test solutions to RISC-V processors. Then, we discuss the use of RISC-V processors for safety-related applications; to this end, we outline the essential techniques necessary to obtain safety both in the functional and in the timing domain and review recent processor designs with safety features. Finally, we survey the different aspects of security with respect to RISC-V implementations and discuss the relationship between cryptographic protocols and primitives on the one hand and the RISC-V processor architecture and hardware implementation on the other. We also comment on the role of a RISC-V processor for system security and its resilience against side-channel attacks

    A quantum-resistant advanced metering infrastructure

    Get PDF
    This dissertation focuses on discussing and implementing a Quantum-Resistant Advanced Metering Infrastructure (QR-AMI) that employs quantum-resistant asymmetric and symmetric cryptographic schemes to withstand attacks from both quantum and classical computers. The proposed solution involves the integration of Quantum-Resistant Dedicated Cryptographic Modules (QR-DCMs) within Smart Meters (SMs). These QR-DCMs are designed to embed quantum-resistant cryptographic schemes suitable for AMI applications. In this sense, it investigates quantum-resistant asymmetric cryptographic schemes based on strong cryptographic principles and a lightweight approach for AMIs. In addition, it examines the practical deployment of quantum-resistant schemes in QR-AMIs. Two candidates from the National Institute of Standards and Technology (NIST) post-quantum cryptography (PQC) standardization process, FrodoKEM and CRYSTALS-Kyber, are assessed due to their adherence to strong cryptographic principles and lightweight approach. The feasibility of embedding these schemes within QRDCMs in an AMI context is evaluated through software implementations on low-cost hardware, such as microcontroller and processor, and hardware/software co-design implementations using System-on-a-Chip (SoC) devices with Field-Programmable Gate Array (FPGA) components. Experimental results show that the execution time for FrodoKEM and CRYSTALS-Kyber schemes on SoC FPGA devices is at least one-third faster than software implementations. Furthermore, the achieved execution time and resource usage demonstrate the viability of these schemes for AMI applications. The CRYSTALS-Kyber scheme appears to be a superior choice in all scenarios, except when strong cryptographic primitives are necessitated, at least theoretically. Due to the lack of off-the-shelf SMs supporting quantum-resistant asymmetric cryptographic schemes, a QRDCM embedding quantum-resistant scheme is implemented and evaluated. Regarding hardware selection for QR-DCMs, microcontrollers are preferable in situations requiring reduced processing power, while SoC FPGA devices are better suited for those demanding high processing power. The resource usage and execution time outcomes demonstrate the feasibility of implementing AMI based on QR-DCMs (i.e., QR-AMI) using microcontrollers or SoC FPGA devices.Esta tese de doutorado foca na discussão e implementação de uma Infraestrutura de Medição Avançada com Resistência Quântica (do inglês, Quantum-Resistant Advanced Metering Infrastructure - QR-AMI), que emprega esquemas criptográficos assimétricos e simétricos com resistência quântica para suportar ataques proveniente tanto de computadores quânticos, como clássicos. A solução proposta envolve a integração de um Módulo Criptográfico Dedicado com Resistência Quântica (do inglês, Quantum-Resistant Dedicated Cryptographic Modules - QR-DCMs) com Medidores Inteligentes (do inglês, Smart Meter - SM). Os QR-DCMs são projetados para embarcar esquemas criptográficos com resistência quântica adequados para aplicação em AMI. Nesse sentido, é investigado esquemas criptográficos assimétricos com resistência quântica baseado em fortes princípios criptográficos e abordagem com baixo uso de recursos para AMIs. Além disso, é analisado a implantação prática de um esquema com resistência quântica em QR-AMIs. Dois candidatos do processo de padronização da criptografia pós-quântica (do inglês, post-quantum cryptography - PQC) do Instituto Nacional de Padrões e Tecnologia (do inglês, National Institute of Standards and Technology - NIST), FrodoKEM e CRYSTALS-Kyber, são avaliados devido à adesão a fortes princípios criptográficos e abordagem com baixo uso de recursos. A viabilidade de embarcar esses esquemas em QR-DCMs em um contexto de AMI é avaliado por meio de implementação em software em hardwares de baixo custo, como um microcontrolador e processador, e implementações conjunta hardware/software usando um sistema em um chip (do inglês, System-on-a-Chip - SoC) com Arranjo de Porta Programável em Campo (do inglês, Field-Programmable Gate Array - FPGA). Resultados experimentais mostram que o tempo de execução para os esquemas FrodoKEM e CRYSTALSKyber em dispositivos SoC FPGA é, ao menos, um terço mais rápido que implementações em software. Além disso, os tempos de execuções atingidos e o uso de recursos demonstram a viabilidade desses esquemas para aplicações em AMI. O esquema CRYSTALS-Kyber parece ser uma escolha superior em todos os cenários, exceto quando fortes primitivas criptográficas são necessárias, ao menos teoricamente. Devido à falta de SMs no mercado que suportem esquemas criptográficos assimétricos com resistência quântica, um QR-DCM embarcando esquemas com resistência quântica é implementado e avaliado. Quanto à escolha do hardware para os QR-DCMs, microcontroladores são preferíveis em situações que requerem poder de processamento reduzido, enquanto dispositivos SoC FPGA são mais adequados para quando é demandado maior poder de processamento. O uso de recurso e o resultado do tempo de execução demonstram a viabilidade da implementação de AMI baseada em QR-DCMs, ou seja, uma QR-AMI, usando microcontroladores e dispositivos SoC FPGA

    On designing hardware accelerator-based systems: interfaces, taxes and benefits

    Full text link
    Complementary Metal Oxide Semiconductor (CMOS) Technology scaling has slowed down. One promising approach to sustain the historic performance improvement of computing systems is to utilize hardware accelerators. Today, many commercial computing systems integrate one or more accelerators, with each accelerator optimized to efficiently execute specific tasks. Over the years, there has been a substantial amount of research on designing hardware accelerators for machine learning (ML) training and inference tasks. Hardware accelerators are also widely employed to accelerate data privacy and security algorithms. In particular, there is currently a growing interest in the use of hardware accelerators for accelerating homomorphic encryption (HE) based privacy-preserving computing. While the use of hardware accelerators is promising, a realistic end-to-end evaluation of an accelerator when integrated into the full system often reveals that the benefits of an accelerator are not always as expected. Simply assessing the performance of the accelerated portion of an application, such as the inference kernel in ML applications, during performance analysis can be misleading. When designing an accelerator-based system, it is critical to evaluate the system as a whole and account for all the accelerator taxes. In the first part of our research, we highlight the need for a holistic, end-to-end analysis of workloads using ML and HE applications. Our evaluation of an ML application for a database management system (DBMS) shows that the benefits of offloading ML inference to accelerators depend on several factors, including backend hardware, model complexity, data size, and the level of integration between the ML inference pipeline and the DBMS. We also found that the end-to-end performance improvement is bottlenecked by data retrieval and pre-processing, as well as inference. Additionally, our evaluation of an HE video encryption application shows that while HE client-side operations, i.e., message-to- ciphertext and ciphertext-to-message conversion operations, are bottlenecked by number theoretic transform (NTT) operations, accelerating NTT in hardware alone is not sufficient to get enough application throughput (frame rate per second) improvement. We need to address all bottlenecks such as error sampling, encryption, and decryption in message-to-ciphertext and ciphertext-to-message conversion pipeline. In the second part of our research, we address the lack of a scalable evaluation infrastructure for building and evaluating accelerator-based systems. To solve this problem, we propose a robust and scalable software-hardware framework for accelerator evaluation, which uses an open-source RISC-V based System-on-Chip (SoC) design called BlackParrot. This framework can be utilized by accelerator designers and system architects to perform an end-to-end performance analysis of coherent and non-coherent accelerators while carefully accounting for the interaction between the accelerator and the rest of the system. In the third part of our research, we present RISE, which is a full RISC-V SoC designed to efficiently perform message-to-ciphertext and ciphertext-to-message conversion operations. RISE comprises of a BlackParrot core and an efficient custom-designed accelerator tailored to accelerate end-to-end message-to-ciphertext and ciphertext-to-message conversion operations. Our RTL-based evaluation demonstrates that RISE improves the throughput of the video encryption application by 10x-27x for different frame resolutions
    corecore