19 research outputs found

    High Throughput Lattice-based Signatures on GPUs: Comparing Falcon and Mitaka

    Get PDF
    The US National Institute of Standards and Technology initiated a standardization process for post-quantum cryptography in 2017, with the aim of selecting key encapsulation mechanisms and signature schemes that can withstand the threat from emerging quantum computers. In 2022, Falcon was selected as one of the standard signature schemes, eventually attracting effort to optimize the implementation of Falcon on various hardware architectures for practical applications. Recently, Mitaka was proposed as an alternative to Falcon, allowing parallel execution of most of its operations. These recent advancements motivate us to develop high throughput implementations of Falcon and Mitaka signature schemes on Graphics Processing Units (GPUs), a massively parallel architecture widely available on cloud service platforms. In this paper, we propose the first parallel implementation of Falcon on various GPUs. An iterative version of the sampling process in Falcon, which is also the most time-consuming Falcon operation, was developed. This allows us to implement Falcon signature generation without relying on expensive recursive function calls on GPUs. In addition, we propose a parallel random samples generation approach to accelerate the performance of Mitaka on GPUs. We evaluate our implementation techniques on state-of-the-art GPU architectures (RTX 3080, A100, T4 and V100). Experimental results show that our Falcon-512 implementation achieves 58, 595 signatures/second and 2, 721, 562 verifications/second on an A100 GPU, which is 20.03× and 29.51× faster than the highly optimized AVX2 implementation on CPU. Our Mitaka implementation achieves 161, 985 signatures/second and 1, 421, 046 verifications/second on the same GPU. Due to the adoption of a parallelizable sampling process, Mitaka signature generation enjoys ≈ 2 – 20× higher throughput than Falcon on various GPUs. The high throughput signature generation and verification achieved by this work can be very useful in various emerging applications, including the Internet of Things

    Improved Masking for Tweakable Blockciphers with Applications to Authenticated Encryption

    Get PDF
    A popular approach to tweakable blockcipher design is via masking, where a certain primitive (a blockcipher or a permutation) is preceded and followed by an easy-to-compute tweak-dependent mask. In this work, we revisit the principle of masking. We do so alongside the introduction of the tweakable Even-Mansour construction MEM. Its masking function combines the advantages of word-oriented LFSR- and powering-up-based methods. We show in particular how recent advancements in computing discrete logarithms over finite fields of characteristic 2 can be exploited in a constructive way to realize highly efficient, constant-time masking functions. If the masking satisfies a set of simple conditions, then MEM is a secure tweakable blockcipher up to the birthday bound. The strengths of MEM are exhibited by the design of fully parallelizable authenticated encryption schemes OPP (nonce-respecting) and MRO (misuse-resistant). If instantiated with a reduced-round BLAKE2b permutation, OPP and MRO achieve speeds up to 0.55 and 1.06 cycles per byte on the Intel Haswell microarchitecture, and are able to significantly outperform their closest competitors

    128-bit Vectorization on Cha-Cha20 Algorithm for Device-to-Device Communication

    Get PDF
    In 5G networks, device to device(D2D) communication is the advanced technology, and it has the major advantages when compared with traditional systems. The coverage area of the system increases by device-to-device transmission which renders with low latency. No doubt, for the upcoming technologies D2D communications is an added advantage but in various views this kind of transmission is still under risk. D2D communication is transferring the information from one device to another device without the involvement of base station. So, the communication is possible with less delay time. Attackers are possible into the D2D communications. To provide the security in communication, encryption algorithms are used. The main theme of the Encryption algorithms is it should execute with less delay time to provide faster communication and in particular CHA-CHA20 offers the secure communication for encrypting the data packets to securing the data. Vectorization on Cha-Cha20 Algorithm provides the security with less delay time compared to AES encryption Algorithm and Cha-Cha20 algorithm. Compared to other encryption algorithms in cryptography, Cha-Cha20 is suited for resource constrained devices

    Novel lightweight video encryption method based on ChaCha20 stream cipher and hybrid chaotic map

    Get PDF
    In the recent years, an increasing demand for securing visual resource-constrained devices become a challenging problem due to the characteristics of these devices. Visual resource-constrained devices are suffered from limited storage space and lower power for computation such as wireless sensors, internet protocol (IP) camera and smart cards. Consequently, to support and preserve the video privacy in video surveillance system, lightweight security methods are required instead of the existing traditional encryption methods. In this paper, a new light weight stream cipher method is presented and investigated for video encryption based on hybrid chaotic map and ChaCha20 algorithm. Two chaotic maps are employed for keys generation process in order to achieve permutation and encryption tasks, respectively. The frames sequences are encrypted-decrypted based on symmetric scheme with assist of ChaCha20 algorithm. The proposed lightweight stream cipher method has been tested on several video samples to confirm suitability and validation in term of encryption–decryption procedures. The performance evaluation metrics include visual test, histogram analysis, information entropy, correlation analysis and differential analysis. From the experimental results, the proposed lightweight encryption method exhibited a higher security with lower computation time compared with state-of-the-art encryption methods

    The last mile: High-Assurance and High-Speed cryptographic implementations

    Get PDF
    We develop a new approach for building cryptographic implementations. Our approach goes the last mile and delivers assembly code that is provably functionally correct, protected against side-channels, and as efficient as handwritten assembly. We illustrate our approach using ChaCha20Poly1305, one of the two ciphersuites recommended in TLS 1.3, and deliver formally verified vectorized implementations which outperform the fastest non-verified code.We realize our approach by combining the Jasmin framework, which offers in a single language features of high-level and low-level programming, and the EasyCrypt proof assistant, which offers a versatile verification infrastructure that supports proofs of functional correctness and equivalence checking. Neither of these tools had been used for functional correctness before. Taken together, these infrastructures empower programmers to develop efficient and verified implementations by "game hopping", starting from reference implementations that are proved functionally correct against a specification, and gradually introducing program optimizations that are proved correct by equivalence checking.We also make several contributions of independent interest, including a new and extensible verified compiler for Jasmin, with a richer memory model and support for vectorized instructions, and a new embedding of Jasmin in EasyCrypt.This work is partially supported by project ONR N00014-19-1-2292. Manuel Barbosa was supported by grant SFRH/BSAB/143018/2018 awarded by FCT. This work was partially funded by national funds via FCT in the context of project PTDC/CCI-INF/31698/2017

    Post-quantum key exchange - a new hope

    Get PDF
    In 2015, Bos, Costello, Naehrig, and Stebila (IEEE Security & Privacy 2015) proposed an instantiation of Ding\u27s ring-learning-with-errors (Ring-LWE) based key-exchange protocol (also including the tweaks proposed by Peikert from PQCrypto 2014), together with an implementation integrated into OpenSSL, with the affirmed goal of providing post-quantum security for TLS. In this work we revisit their instantiation and stand-alone implementation. Specifically, we propose new parameters and a better suited error distribution, analyze the scheme\u27s hardness against attacks by quantum computers in a conservative way, introduce a new and more efficient error-reconciliation mechanism, and propose a defense against backdoors and all-for-the-price-of-one attacks. By these measures and for the same lattice dimension, we more than double the security parameter, halve the communication overhead, and speed up computation by more than a factor of 8 in a portable C implementation and by more than a factor of 27 in an optimized implementation targeting current Intel CPUs. These speedups are achieved with comprehensive protection against timing attacks

    Adiantum: length-preserving encryption for entry-level processors

    Get PDF
    We present HBSH, a simple construction for tweakable length-preserving encryption which supports the fastest options for hashing and stream encryption for processors without AES or other crypto instructions, with a provable quadratic advantage bound. Our composition Adiantum uses NH, Poly1305, XChaCha12, and a single AES invocation. On an ARM Cortex-A7 processor, Adiantum decrypts 4096-byte messages at 10.6 cycles per byte, over five times faster than AES-256-XTS, with a constant-time implementation. We also define HPolyC which is simpler and has excellent key agility at 13.6 cycles per byte
    corecore