118 research outputs found

    KaLi: A Crystal for Post-Quantum Security using Kyber and Dilithium

    Get PDF
    Quantum computers pose a threat to the security of communications over the internet. This imminent risk has led to the standardization of cryptographic schemes for protection in a post-quantum scenario. We present a design methodology for future implementations of such algorithms. This is manifested using the NIST selected digital signature scheme CRYSTALS-Dilithium and key encapsulation scheme CRYSTALS-Kyber. A unified architecture, \crystal, is proposed that can perform key generation, encapsulation, decapsulation, signature generation, and signature verification for all the security levels of CRYSTALS-Dilithium, and CRYSTALS-Kyber. A unified yet flexible polynomial arithmetic unit is designed that can processes Kyber operations twice as fast as Dilithium operations. Efficient memory management is proposed to achieve optimal latency. \crystal is explicitly tailored for ASIC platforms using multiple clock domains. On ASIC 28nm/65nm technology, it occupies 0.263/1.107 mm2^2 and achieves a clock frequency of 2GHz/560MHz for the fast clock used for memory unit. On Xilinx Zynq Ultrascale+ZCU102 FPGA, the proposed architecture uses 23,277 LUTs, 9,758 DFFs, 4 DSPs, and 24 BRAMs, at 270 MHz clock frequency. \crystal performs better than the standalone implementations of either of the two schemes. This is the first work to provide a unified design in hardware for both schemes

    Accelerating SLH-DSA by Two Orders of Magnitude with a Single Hash Unit

    Get PDF
    We report on efficient and secure hardware implementation techniques for the FIPS 205 SLH-DSA Hash-Based Signature Standard. We demonstrate that very significant overall performance gains can be obtained from hardware that optimizes the padding formats and iterative hashing processes specific to SLH-DSA. A prototype implementation, SLotH, contains Keccak/SHAKE, SHA2-256, and SHA2-512 cores and supports all 12 parameter sets of SLH-DSA. SLotH also supports side-channel secure PRF computation and Winternitz chains. SLotH drivers run on a small RISC-V control core, as is common in current Root-of-Trust (RoT) systems. The new features make SLH-DSA on SLotH many times faster compared to similarly-sized general-purpose hash accelerators. Compared to unaccelerated microcontroller implementations, the performance of SLotH\u27s SHAKE variants is up to 300×300\times faster; signature generation with 128f parameter set is is 4,903,978 cycles, while signature verification with 128s parameter set is only 179,603 cycles. The SHA2 parameter sets have approximately half of the speed of SHAKE parameter sets. We observe that the signature verification performance of SLH-DSA\u27s ``s\u27\u27 parameter sets is generally better than that of accelerated ECDSA or Dilithium on similarly-sized RoT targets. The area of the full SLotH system is small, from 63 kGE (SHA2, Cat 1 only) to 155 kGe (all parameter sets). Keccak Threshold Implementation adds another 130 kGE. We provide sensitivity analysis of SLH-DSA in relation to side-channel leakage. We show experimentally that an SLH-DSA implementation with CPU hashing will rapidly leak the SK.seed master key. We perform a 100,000-trace TVLA leakage assessment with a protected SLotH unit

    Interframe Bus Encoding Technique and Architecture for MPEG-4 AVC/H.264 Video Compression

    Get PDF
    In this paper, we propose an implementation of a data encoder to reduce the switched capacitance on a system bus. Our technique focuses on transferring raw video data for multiple reference frames between off-and on-chip memories in an MPEG-4 AVC/H.264 encoder. This technique is based on entropy coding to minimize bus transition. Existing techniques exploit the correlation between neighboring pixels. In our proposed technique, we exploit pixel correlation between two consecutive frames. Our method achieves a 58% power saving compared to an unencoded bus when transferring pixels on a 32-b off-chip bus with a 15-pF capacitance per wire

    IoTsafe, Decoupling Security from Applications for a Safer IoT

    Get PDF
    The use of robust security solutions is a must for the Internet of Things (IoT) devices and their applications: regulators in different countries are creating frameworks for certifying those devices with an acceptable security level. However, even for already certified devices, security protocols have to be updated when a breach is found or a certain version becomes obsolete. Many approaches for securing IoT applications are nowadays based on the integration of a security layer [e.g., using transport layer security, (TLS)], but this may result in difficulties when upgrading the security algorithms, as the whole application has to be updated. This fact may shorten the life of IoT devices. As a way to overcome these difficulties, this paper presents IoTsafe, a novel approach relying on secure socket shell (SSH), a feasible alternative to secure communications in IoT applications based on hypertext transfer protocol (HTTP and HTTP/2). In order to illustrate its advantages, a comparison between the traditional approach (HTTP with TLS) and our scheme (HTTP with SSH) is performed over low-power wireless personal area networks (6loWPAN) through 802.15.4 interfaces. The results show that the proposed approach not only provides a more robust and easy-To-update solution, but it also brings an improvement to the overall performance in terms of goodput and energy consumption. Core server stress tests are also presented, and the server performance is also analyzed in terms of RAM consumption and escalation strategies

    CAE - PROCESS AND NETWORK : A methodology for continuous product validation process based on network of various digital simulation methods

    Get PDF
    CAE ProNet methodology is to develop CAE network considering interdependencies among digital validations. Utilizing CAE network and considering industrial requirements, an algorithm is applied to execute a product, vehicle development phase, and load case priority oriented CAE process. Major advantage of this research work is to improve quality of simulation results, reducing time-to-market and decreasing dependencies on hardware prototype

    Hardware and Software Optimizations for Accelerating Deep Neural Networks: Survey of Current Trends, Challenges, and the Road Ahead

    Get PDF
    Currently, Machine Learning (ML) is becoming ubiquitous in everyday life. Deep Learning (DL) is already present in many applications ranging from computer vision for medicine to autonomous driving of modern cars as well as other sectors in security, healthcare, and finance. However, to achieve impressive performance, these algorithms employ very deep networks, requiring a significant computational power, both during the training and inference time. A single inference of a DL model may require billions of multiply-and-accumulated operations, making the DL extremely compute-and energy-hungry. In a scenario where several sophisticated algorithms need to be executed with limited energy and low latency, the need for cost-effective hardware platforms capable of implementing energy-efficient DL execution arises. This paper first introduces the key properties of two brain-inspired models like Deep Neural Network (DNN), and Spiking Neural Network (SNN), and then analyzes techniques to produce efficient and high-performance designs. This work summarizes and compares the works for four leading platforms for the execution of algorithms such as CPU, GPU, FPGA and ASIC describing the main solutions of the state-of-the-art, giving much prominence to the last two solutions since they offer greater design flexibility and bear the potential of high energy-efficiency, especially for the inference process. In addition to hardware solutions, this paper discusses some of the important security issues that these DNN and SNN models may have during their execution, and offers a comprehensive section on benchmarking, explaining how to assess the quality of different networks and hardware systems designed for them

    Power Consumption Analysis, Measurement, Management, and Issues:A State-of-the-Art Review of Smartphone Battery and Energy Usage

    Get PDF
    The advancement and popularity of smartphones have made it an essential and all-purpose device. But lack of advancement in battery technology has held back its optimum potential. Therefore, considering its scarcity, optimal use and efficient management of energy are crucial in a smartphone. For that, a fair understanding of a smartphone's energy consumption factors is necessary for both users and device manufacturers, along with other stakeholders in the smartphone ecosystem. It is important to assess how much of the device's energy is consumed by which components and under what circumstances. This paper provides a generalized, but detailed analysis of the power consumption causes (internal and external) of a smartphone and also offers suggestive measures to minimize the consumption for each factor. The main contribution of this paper is four comprehensive literature reviews on: 1) smartphone's power consumption assessment and estimation (including power consumption analysis and modelling); 2) power consumption management for smartphones (including energy-saving methods and techniques); 3) state-of-the-art of the research and commercial developments of smartphone batteries (including alternative power sources); and 4) mitigating the hazardous issues of smartphones' batteries (with a details explanation of the issues). The research works are further subcategorized based on different research and solution approaches. A good number of recent empirical research works are considered for this comprehensive review, and each of them is succinctly analysed and discussed

    Implementation and Evaluation of an NoC Architecture for FPGAs

    Get PDF
    The Networks-on-Chip (NoC) approach for designing Systems-on-Chip (SoC) is currently emerging as an advanced concept for overcoming the scalability and efficiency problems of traditional bus-based systems. A great deal of theoretical research has been done in this area that provides good insight and shows promising results. There is a great need for research in hardware implementation of NoC-based systems to determine the feasibility of implementing various topologies and protocols, and also to accurately determine what design tradeoffs are involved in NoC implementation. This thesis addresses the challenges of implementing an NoC-based system on FPGAs for running real benchmark applications. The NoC used a mesh topology and circuit-switched communication protocol. An experimental framework was developed that allowed implementation of NoC-based system from a high level specification, using the Celoxica Handel-C hardware description language. Two test applications: charged couple device (CCD) and JPEG were developed in Handel-C to be used as our benchmark applications. Both benchmarks are computational expensive and require large quantities of data transfer that will test the NoC system. Implementation results show that the NoC-based system gives superior area utilization and speed performance compared to the bus-based system, running the same benchmarks

    UpWB: An Uncoupled Architecture Design for White-box Cryptography Using Vectorized Montgomery Multiplication

    Get PDF
    White-box cryptography (WBC) seeks to protect secret keys even if the attacker has full control over the execution environment. One of the techniques to hide the key is space hardness approach, which conceals the key into a large lookup table generated from a reliable small block cipher. Despite its provable security, space-hard WBC also suffers from heavy performance overhead when executed on general purpose hardware platform, hundreds of magnitude slower than conventional block ciphers. Specifically, recent studies adopt nested substitution permutation network (NSPN) to construct dedicated white-box block cipher [BIT16], whose performance is limited by a massive number of rounds, nested loop dependency and high-dimension dynamic maximal distance separable (MDS) matrices. To address these limitations, we put forward UpWB, an uncoupled and efficient accelerator for NSPN-structure WBC. We propose holistic optimization techniques across timing schedule, algorithms and operators. For the high-level timing schedule, we propose a fine-grained task partition (FTP) mechanism to decouple the parameteroriented nested loop with different trip counts. The FTP mechanism narrows down the idle time for synchronization and avoids the extra usage of FIFO, which efficiently increases the computation throughput. For the optimization of arithmetic operators, we devise a flexible and vectorized modular multiplier (VMM) based on the complexity-reduced Montgomery algorithm, which can process multi-precision variable data, multi-size matrix-vector multiplication and different irreducible polynomials. Then, a configurable matrix-vector multiplication (MVM) architecture with diagonal-major dataflow is presented to handle the dynamic MDS matrix. The multi-scale (Inv)Mixcolumns are also unified in a compact manner by intensively sharing the common sub-operations and customizing the constant multiplier. To verify the proposed methodology, we showcase the unified design implementation for three recent families of WBCs, including SPNbox-8/16/24/32, Yoroi-16/32 and WARX-16. Evaluated on FPGA platform, UpWB outperforms the optimized software counterpart (executed on 3.2 GHz Intel CPU with AES-NI and AVX2 instructions) by 7x to 30x in terms of computation throughput. Synthesized under TSMC 28nm technology, 36x to 164x improvement of computation throughput is achieved when UpWB operates at the maximum frequency of 1.3 GHz and consumes a modest area 0.14 mm2. Besides, the proposed VMM also offers about 30% improvement of area efficiency without pulling flexibility down when compared to state-of-the-art work
    • …
    corecore