130 research outputs found
An IoT Endpoint System-on-Chip for Secure and Energy-Efficient Near-Sensor Analytics
Near-sensor data analytics is a promising direction for IoT endpoints, as it
minimizes energy spent on communication and reduces network load - but it also
poses security concerns, as valuable data is stored or sent over the network at
various stages of the analytics pipeline. Using encryption to protect sensitive
data at the boundary of the on-chip analytics engine is a way to address data
security issues. To cope with the combined workload of analytics and encryption
in a tight power envelope, we propose Fulmine, a System-on-Chip based on a
tightly-coupled multi-core cluster augmented with specialized blocks for
compute-intensive data processing and encryption functions, supporting software
programmability for regular computing tasks. The Fulmine SoC, fabricated in
65nm technology, consumes less than 20mW on average at 0.8V achieving an
efficiency of up to 70pJ/B in encryption, 50pJ/px in convolution, or up to
25MIPS/mW in software. As a strong argument for real-life flexible application
of our platform, we show experimental results for three secure analytics use
cases: secure autonomous aerial surveillance with a state-of-the-art deep CNN
consuming 3.16pJ per equivalent RISC op; local CNN-based face detection with
secured remote recognition in 5.74pJ/op; and seizure detection with encrypted
data collection from EEG within 12.7pJ/op.Comment: 15 pages, 12 figures, accepted for publication to the IEEE
Transactions on Circuits and Systems - I: Regular Paper
Comparative Study of Keccak SHA-3 Implementations
This paper conducts an extensive comparative study of state-of-the-art solutions for im-
plementing the SHA-3 hash function. SHA-3, a pivotal component in modern cryptography, has
spawned numerous implementations across diverse platforms and technologies. This research aims
to provide valuable insights into selecting and optimizing Keccak SHA-3 implementations. Our
study encompasses an in-depth analysis of hardware, software, and software–hardware (hybrid)
solutions. We assess the strengths, weaknesses, and performance metrics of each approach. Critical
factors, including computational efficiency, scalability, and flexibility, are evaluated across differ-
ent use cases. We investigate how each implementation performs in terms of speed and resource
utilization. This research aims to improve the knowledge of cryptographic systems, aiding in the
informed design and deployment of efficient cryptographic solutions. By providing a comprehensive
overview of SHA-3 implementations, this study offers a clear understanding of the available options
and equips professionals and researchers with the necessary insights to make informed decisions in
their cryptographic endeavors
Maximizing the Potential of Custom RISC-V Vector Extensions for Speeding up SHA-3 Hash Functions
SHA-3 is considered to be one of the most secure standardized hash functions. It relies on the Keccak-f[1 600] permutation, which operates on an internal state of 1 600 bits, mostly represented as a 5Ă—5Ă—64-bit matrix. While software implementations process the state sequentially in chunks of typically 32 or 64 bits, the Keccak-f[1 600] permutation can benefit a lot from speedup through parallelization. This paper is the first to explore the full potential of parallelization of Keccak-f[1 600] in RISC-V based processors through custom vector extensions on 32-bit and 64-bit architectures.
%Such a structure is suitable to work under vector instructions in data-parallel operation mode. This paper uses the RISC-V vector extensions to explore its performance in 64-bit and 32-bit architectures.
We analyze the Keccak-f[1 600] permutation, composed of five different step mappings, and propose ten custom vector instructions to speed up the computation. We realize these extensions in a SIMD processor described in SystemVerilog. We compare the performance of our hardware/software co-design to a software-only implementation on the one hand and to existing architectures based on (vectorized) hardware/software co-design on the other hand. We show that our design outperforms all related work thanks to our carefully selected custom vector instructions
CrISA-X: Unleashing Performance Excellence in Lightweight Symmetric Cryptography for Extendable and Deeply Embedded Processors
The selection of a Lightweight Cryptography (LWC) algorithm is crucial for resource limited applications. The National Institute of Standards and Technology (NIST) leads this process, which involves a thorough evaluation of the algorithms’ cryptanalytic strength. Furthermore, careful consideration is given to factors such as algorithm latency, code size, and hardware implementation area. These factors are critical in determining the overall performance of cryptographic solutions at edge devices. Introducing CrISA-X, a Cryptography Instruction Set Architecture extensions designed to improve cryptographic latency on extendable processors. CrISA-X, classified as Generic-Atomic, Block-Specific and Procedure-Specific, leverages RISC processor hardware and a base ISA to effectively execute LWC algorithms. Our study aims to evaluate the execution efficiency of new single-cycle instruction extensions and tightly coupled multicycle instructions on extendable modular RISC processors. CrISA-X provides enhanced speed of various algorithms simultaneously while optimizing ISA adaptability, a feat yet to be accomplished. The extension, diverse for several computation levels, is first specifically tailored for individual algorithms and sets of LWC algorithms, depending on performance, frequency, and area trade-offs. By diligently applying the Min-Max optimization technique, we have configured these extensions to achieve a delicate balance between performance, area code size, etc. Our study presents empirical evidence of the performance enhancement achieved on a real synthesis modular RISC processor. We offer a framework for creating optimized processor hardware and ISA extensions. The CrISA-X framework generally outperforms ISA extensions by delivering significant performance boosts between 3x to 17x while experiencing a relative area cost increase of +12% and +47% in LUTs, in respect to the instruction set category. Notably, as one important example, the utilization of the ASCON algorithm yields a 10x performance boost in contrast to the base ISA instruction implementatio
A Survey of Recent Developments in Testability, Safety and Security of RISC-V Processors
With the continued success of the open RISC-V architecture, practical deployment of RISC-V processors necessitates an in-depth consideration of their testability, safety and security aspects. This survey provides an overview of recent developments in this quickly-evolving field. We start with discussing the application of state-of-the-art functional and system-level test solutions to RISC-V processors. Then, we discuss the use of RISC-V processors for safety-related applications; to this end, we outline the essential techniques necessary to obtain safety both in the functional and in the timing domain and review recent processor designs with safety features. Finally, we survey the different aspects of security with respect to RISC-V implementations and discuss the relationship between cryptographic protocols and primitives on the one hand and the RISC-V processor architecture and hardware implementation on the other. We also comment on the role of a RISC-V processor for system security and its resilience against side-channel attacks
Hybrid scalar/vector implementations of Keccak and SPHINCS+ on AArch64
This paper presents two new techniques for the fast implementation of the Keccak permutation on the A-profile of the Arm architecture: First, the elimination of explicit rotations in the Keccak permutation through Barrel shifting, applicable to scalar AArch64 implementations of Keccak-f1600. Second, the construction of hybrid implementations concurrently leveraging both the scalar and the Neon instruction sets of AArch64. The resulting performance improvements are demonstrated in the example of the hash-based signature scheme SPHINCS+, one of the recently announced winners of the NIST post-quantum cryptography project: We achieve up to 1.89Ă— performance improvements compared to the state of the art. Our implementations target the Arm Cortex-{A55,A510,A78,A710,X1,X2} processors common in client devices such as mobile phones
Automatic Verification of Cryptographic Block Function Implementations with Logical Equivalence Checking
Given a fixed-size block, cryptographic block functions gen-
erate outputs by a sequence of bitwise operations. Block functions are
widely used in the design of hash functions and stream ciphers. Their
correct implementations hence are crucial to computer security. We pro-
pose a method that leverages logic equivalence checking to verify assem-
bly implementations of cryptographic block functions. Logic equivalence
checking is a well-established technique from hardware verification. Using
our proposed method, we verify two dozen assembly implementations of
ChaCha20, SHA-256, and SHA-3 block functions from OpenSSL and
XKCP automatically. We also compare the performance of our technique
with the conventional SMT-based technique in experiments
A Unified Cryptoprocessor for Lattice-based Signature and Key-exchange
We propose design methodologies for building a compact, unified and programmable cryptoprocessor architecture that computes post-quantum key agreement and digital signature. Synergies in the two types of cryptographic primitives are used to make the cryptoprocessor compact. As a case study, the cryptoprocessor architecture has been optimized targeting the signature scheme \u27CRYSTALS-Dilithium\u27 and the key encapsulation mechanism (KEM) \u27Saber\u27, both finalists in the NIST’s post-quantum cryptography standardization project. The programmable cryptoprocessor executes key generations, encapsulations, decapsulations, signature generations, and signature verifications for all the security levels of Dilithium and Saber. On a Xilinx Ultrascale+ FPGA, the proposed cryptoprocessor consumes 18,406 LUTs, 9,323 FFs, 4 DSPs, and 24 BRAMs. It achieves 200 MHz clock frequency and finishes CCA-secure key-generation/encapsulation/decapsulation operations for LightSaber in 29.6/40.4/ 58.3s; for Saber in 54.9/69.7/94.9s; and for FireSaber in 87.6/108.0/139.4s, respectively. It finishes key-generation/sign/verify operations for Dilithium-2 in 70.9/151.6/75.2s; for Dilithium-3 in 114.7/237/127.6s; and for Dilithium-5 in 194.2/342.1/228.9s, respectively, for the best-case scenario. On UMC 65nm library for ASIC the latency is improved by a factor of two due to a 2 increase in clock frequency
A Hybrid Approach to Formal Verification of Higher-Order Masked Arithmetic Programs
Side-channel attacks, which are capable of breaking secrecy via side-channel
information, pose a growing threat to the implementation of cryptographic
algorithms. Masking is an effective countermeasure against side-channel attacks
by removing the statistical dependence between secrecy and power consumption
via randomization. However, designing efficient and effective masked
implementations turns out to be an error-prone task. Current techniques for
verifying whether masked programs are secure are limited in their applicability
and accuracy, especially when they are applied. To bridge this gap, in this
article, we first propose a sound type system, equipped with an efficient type
inference algorithm, for verifying masked arithmetic programs against
higher-order attacks. We then give novel model-counting based and
pattern-matching based methods which are able to precisely determine whether
the potential leaky observable sets detected by the type system are genuine or
simply spurious. We evaluate our approach on various implementations of
arithmetic cryptographicprograms.The experiments confirm that our approach out
performs the state-of-the-art base lines in terms of applicability, accuracy
and efficiency
Post-Quantum Signatures on RISC-V with Hardware Acceleration
CRYSTALS-Dilithium and Falcon are digital signature algorithms based on cryptographic lattices, that are considered secure even if large-scale quantum computers will be able to break conventional public-key cryptography. Both schemes have been selected for standardization in the NIST post-quantum competition. In this work, we present a RISC-V HW/SW odesign that aims to combine the advantages of software- and hardware implementations, i.e. flexibility and performance. It shows the use of lexible hardware accelerators, which have been previously used for Public-Key Encryption (PKE) and Key-Encapsulation Mechanism (KEM), for post-quantum signatures. It is optimized for Dilithium as a generic signature cheme but also accelerates applications that require fast verification of Falcon’s compact signatures. We provide a comparison with previous works showing that for Dilithium and Falcon, cycle counts are significantly reduced, such that our design is faster than previous software implementations or other HW/SW codesigns. In addition to that, we present a compact Globalfoundries 22 nm ASIC design that runs at 800MHz. By using hardware acceleration, energy consumption for Dilithium is reduced by up to 92.2%, and up to 67.5% for Falcon’s signature verification
- …