6 research outputs found

    Highly Vectorized SIKE for AVX-512

    Get PDF
    It is generally accepted that a large-scale quantum computer would be capable to break any public-key cryptosystem used today, thereby posing a serious threat to the security of the Internet’s public-key infrastructure. The US National Institute of Standards and Technology (NIST) addresses this threat with an open process for the standardization of quantum-safe key establishment and signature schemes, which is now in the final phase of the evaluation of candidates. SIKE (an abbreviation of Supersingular Isogeny Key Encapsulation) is one of the alternate candidates under evaluation and distinguishes itself from other candidates due to relatively short key lengths and relatively high computing costs. In this paper, we analyze how the latest generation of Intel’s Advanced Vector Extensions (AVX), in particular AVX-512IFMA, can be used to minimize the latency (resp. maximize the throughput) of the SIKE key encapsulation mechanism when executed on Ice LakeCPUs based on the Sunny Cove microarchitecture. We present various techniques to parallelize and speed up the base/extension field arithmetic, point arithmetic, and isogeny computations performed by SIKE. All these parallel processing techniques are combined in AVXSIKE, a highly optimized implementation of SIKE using Intel AVX-512IFMA instructions. Our experiments indicate that AVXSIKE instantiated with the SIKEp503 parameter set is approximately 1.5 times faster than the to-date best AVX-512IFMA-based SIKE software from the literature. When executed on an Intel Core i3-1005G1 CPU, AVXSIKE outperforms the x64 assembly implementation of SIKE contained in Microsoft’s SIDHv3.4 library by a factor of about 2.5 for key generation and decapsulation, while the encapsulation is even 3.2 times faster

    Efficient and Side-Channel Resistant Implementations of Next-Generation Cryptography

    Get PDF
    The rapid development of emerging information technologies, such as quantum computing and the Internet of Things (IoT), will have or have already had a huge impact on the world. These technologies can not only improve industrial productivity but they could also bring more convenience to people’s daily lives. However, these techniques have “side effects” in the world of cryptography – they pose new difficulties and challenges from theory to practice. Specifically, when quantum computing capability (i.e., logical qubits) reaches a certain level, Shor’s algorithm will be able to break almost all public-key cryptosystems currently in use. On the other hand, a great number of devices deployed in IoT environments have very constrained computing and storage resources, so the current widely-used cryptographic algorithms may not run efficiently on those devices. A new generation of cryptography has thus emerged, including Post-Quantum Cryptography (PQC), which remains secure under both classical and quantum attacks, and LightWeight Cryptography (LWC), which is tailored for resource-constrained devices. Research on next-generation cryptography is of importance and utmost urgency, and the US National Institute of Standards and Technology in particular has initiated the standardization process for PQC and LWC in 2016 and in 2018 respectively. Since next-generation cryptography is in a premature state and has developed rapidly in recent years, its theoretical security and practical deployment are not very well explored and are in significant need of evaluation. This thesis aims to look into the engineering aspects of next-generation cryptography, i.e., the problems concerning implementation efficiency (e.g., execution time and memory consumption) and security (e.g., countermeasures against timing attacks and power side-channel attacks). In more detail, we first explore efficient software implementation approaches for lattice-based PQC on constrained devices. Then, we study how to speed up isogeny-based PQC on modern high-performance processors especially by using their powerful vector units. Moreover, we research how to design sophisticated yet low-area instruction set extensions to further accelerate software implementations of LWC and long-integer-arithmetic-based PQC. Finally, to address the threats from potential power side-channel attacks, we present a concept of using special leakage-aware instructions to eliminate overwriting leakage for masked software implementations (of next-generation cryptography)

    On Parallel Computation of Large Smooth-Degree Isogeny

    Get PDF
    The computation of large smooth-degree isogenies is considered to be the most time-consuming task in isogeny-based cryptosystems and, to this end, recently several proposals have been made to speed it up. For implementation in software using a single core, De Feo et al. presented an optimal way to compute such isogenies. The multi-core setting is however far more intricate but offers various ways to reduce the computation time and is an active area of research. This thesis presents a study of speeding-up large smooth-degree isogeny computation with various forms of parallelism and consists of three contributions. The first contribution of this thesis is two novel theoretical techniques for speeding-up the computation with parallelism. Our proposed technique, called precedence-constrained scheduling (PCS), transforms the isogeny computation into a task scheduling problem with precedence constraints and utilizes several task scheduling algorithms to tackle the problem. Another proposed technique of ours is to formulate the isogeny computation as an integer linear program. Combining both techniques, we are able to reduce the theoretical cost of the isogeny computation by up to 13.02% from the state-of-the-art. The second contribution of this thesis is two software implementations of the isogeny computation based on our PCS technique. We consider two execution environments for the implementations: one relies only on the parallelism provided by multi-core processors, and the other utilizes multi-core processors supporting the Intel's Advanced Vector eXtensions (AVX) technology. To our best knowledge, we are the first to utilize both parallelization technologies for the isogeny computation. Also, to achieve effective implementations, we modify PCS for each execution environments and equip both implementations with a synchronization handling technique. The implementation results show up to 14.36% speed-up for the first implementation and up to 34.05% speed-up for the second implementation. The third contribution of this thesis is two applications of using learning-based optimizations to speed-up the parallel isogeny computation. We consider the genetic algorithm and the reinforcement learning algorithm and detail our design rationale when instantiating both algorithms for our problem. From experimental results, the genetic algorithm is able to find a better approach for the isogeny computation. The approach found is nontrivial and is up to 9.95% faster than human's heuristic. On the other hand, the reinforcement learning lags PCS by as small as 2.73%. We use the experimental results of the reinforcement learning to argue that PCS may be nearly or even optimal for the computation

    NTT software optimization using an extended Harvey butterfly

    Get PDF
    Software implementations of the number-theoretic transform (NTT) method often leverage Harvey’s butterfly to gain speedups. This is the case in cryptographic libraries such as IBM’s HElib, Microsoft’s SEAL, and Intel’s HEXL, which provide optimized implementations of fully homomorphic encryption schemes or their primitives. We extend the Harvey butterfly to the radix-4 case for primes in the range [2^31, 2^52). This enables us to use the vector multiply sum logical (VMSL) instruction, which is available on recent IBM Z^(R) platforms. On an IBM z14 system, our implementation performs more than 2.5x faster than the scalar implementation of SEAL we converted to native C. In addition, we implemented a mixed-radix implementation that uses AVX512-IFMA on Intel’s Ice Lake processor, which happens to be ~1.1 times faster than the super-optimized implementation of Intel’s HEXL. Finally, we compare the performance of some of our implementation using GCC versus Clang compilers and discuss the results

    Using the new VPMADD instructions for the new post quantum key encapsulation mechanism SIKE

    No full text
    This paper demonstrates the use of new processor instructions VPMADD, intended to appear in the coming generation of Intel processors (codename "Cannon Lake"), in order to accelerate the newly proposed key encapsulation mechanism (KEM) named SIKE. SIKE is one of the submissions to the NIST standardization process on post-quantum cryptography, and is based on pseudo-random walks in supersingular isogeny graphs. While very small keys are the main advantage of SIKE, its extreme computational intensiveness makes it one of the slowest KEM proposals. Performance optimizations are needed. We address here the "Level 1" parameters that target 64-bit quantum security, and deemed sufficient for the NIST standardization effort. Thus, we focus on SIKE503 that operates over F p2 with a 503-bit prime p. These short operands pose a significant challenge on using VPMADD effectively. We demonstrate several optimization methods to accelerate F-p, F-p2, and the elliptic curve arithmetic, and predict a potential speedup by a factor of 1.72x

    End-to-End Encrypted Group Messaging with Insider Security

    Get PDF
    Our society has become heavily dependent on electronic communication, and preserving the integrity of this communication has never been more important. Cryptography is a tool that can help to protect the security and privacy of these communications. Secure messaging protocols like OTR and Signal typically employ end-to-end encryption technology to mitigate some of the most egregious adversarial attacks, such as mass surveillance. However, the secure messaging protocols deployed today suffer from two major omissions: they do not natively support group conversations with three or more participants, and they do not fully defend against participants that behave maliciously. Secure messaging tools typically implement group conversations by establishing pairwise instances of a two-party secure messaging protocol, which limits their scalability and makes them vulnerable to insider attacks by malicious members of the group. Insiders can often perform attacks such as rendering the group permanently unusable, causing the state of the group to diverge for the other participants, or covertly remaining in the group after appearing to leave. It is increasingly important to prevent these insider attacks as group conversations become larger, because there are more potentially malicious participants. This dissertation introduces several new protocols that can be used to build modern communication tools with strong security and privacy properties, including resistance to insider attacks. Firstly, the dissertation addresses a weakness in current two-party secure messaging tools: malicious participants can leak portions of a conversation alongside cryptographic proof of authorship, undermining confidentiality. The dissertation introduces two new authenticated key exchange protocols, DAKEZ and XZDH, with deniability properties that can prevent this type of attack when integrated into a secure messaging protocol. DAKEZ provides strong deniability in interactive settings such as instant messaging, while XZDH provides deniability for non-interactive settings such as mobile messaging. These protocols are accompanied by composable security proofs. Secondly, the dissertation introduces Safehouse, a new protocol that can be used to implement secure group messaging tools for a wide range of applications. Safehouse solves the difficult cryptographic problems at the core of secure group messaging protocol design: it securely establishes and manages a shared encryption key for the group and ephemeral signing keys for the participants. These keys can be used to build chat rooms, team communication servers, video conferencing tools, and more. Safehouse enables a server to detect and reject protocol deviations, while still providing end-to-end encryption. This allows an honest server to completely prevent insider attacks launched by malicious participants. A malicious server can still perform a denial-of-service attack that renders the group unavailable or "forks" the group into subgroups that can never communicate again, but other attacks are prevented, even if the server colludes with a malicious participant. In particular, an adversary controlling the server and one or more participants cannot cause honest participants' group states to diverge (even in subtle ways) without also permanently preventing them from communicating, nor can the adversary arrange to covertly remain in the group after all of the malicious participants under its control are removed from the group. Safehouse supports non-interactive communication, dynamic group membership, mass membership changes, an invitation system, and secure property storage, while offering a variety of configurable security properties including forward secrecy, post-compromise security, long-term identity authentication, strong deniability, and anonymity preservation. The dissertation includes a complete proof-of-concept implementation of Safehouse and a sample application with a graphical client. Two sub-protocols of independent interest are also introduced: a new cryptographic primitive that can encrypt multiple private keys to several sets of recipients in a publicly verifiable and repeatable manner, and a round-efficient interactive group key exchange protocol that can instantiate multiple shared key pairs with a configurable knowledge relationship
    corecore