12 research outputs found

    Efficient and Side-Channel Resistant Implementations of Next-Generation Cryptography

    Get PDF
    The rapid development of emerging information technologies, such as quantum computing and the Internet of Things (IoT), will have or have already had a huge impact on the world. These technologies can not only improve industrial productivity but they could also bring more convenience to people’s daily lives. However, these techniques have “side effects” in the world of cryptography – they pose new difficulties and challenges from theory to practice. Specifically, when quantum computing capability (i.e., logical qubits) reaches a certain level, Shor’s algorithm will be able to break almost all public-key cryptosystems currently in use. On the other hand, a great number of devices deployed in IoT environments have very constrained computing and storage resources, so the current widely-used cryptographic algorithms may not run efficiently on those devices. A new generation of cryptography has thus emerged, including Post-Quantum Cryptography (PQC), which remains secure under both classical and quantum attacks, and LightWeight Cryptography (LWC), which is tailored for resource-constrained devices. Research on next-generation cryptography is of importance and utmost urgency, and the US National Institute of Standards and Technology in particular has initiated the standardization process for PQC and LWC in 2016 and in 2018 respectively. Since next-generation cryptography is in a premature state and has developed rapidly in recent years, its theoretical security and practical deployment are not very well explored and are in significant need of evaluation. This thesis aims to look into the engineering aspects of next-generation cryptography, i.e., the problems concerning implementation efficiency (e.g., execution time and memory consumption) and security (e.g., countermeasures against timing attacks and power side-channel attacks). In more detail, we first explore efficient software implementation approaches for lattice-based PQC on constrained devices. Then, we study how to speed up isogeny-based PQC on modern high-performance processors especially by using their powerful vector units. Moreover, we research how to design sophisticated yet low-area instruction set extensions to further accelerate software implementations of LWC and long-integer-arithmetic-based PQC. Finally, to address the threats from potential power side-channel attacks, we present a concept of using special leakage-aware instructions to eliminate overwriting leakage for masked software implementations (of next-generation cryptography)

    Highly Vectorized SIKE for AVX-512

    Get PDF
    It is generally accepted that a large-scale quantum computer would be capable to break any public-key cryptosystem used today, thereby posing a serious threat to the security of the Internet’s public-key infrastructure. The US National Institute of Standards and Technology (NIST) addresses this threat with an open process for the standardization of quantum-safe key establishment and signature schemes, which is now in the final phase of the evaluation of candidates. SIKE (an abbreviation of Supersingular Isogeny Key Encapsulation) is one of the alternate candidates under evaluation and distinguishes itself from other candidates due to relatively short key lengths and relatively high computing costs. In this paper, we analyze how the latest generation of Intel’s Advanced Vector Extensions (AVX), in particular AVX-512IFMA, can be used to minimize the latency (resp. maximize the throughput) of the SIKE key encapsulation mechanism when executed on Ice LakeCPUs based on the Sunny Cove microarchitecture. We present various techniques to parallelize and speed up the base/extension field arithmetic, point arithmetic, and isogeny computations performed by SIKE. All these parallel processing techniques are combined in AVXSIKE, a highly optimized implementation of SIKE using Intel AVX-512IFMA instructions. Our experiments indicate that AVXSIKE instantiated with the SIKEp503 parameter set is approximately 1.5 times faster than the to-date best AVX-512IFMA-based SIKE software from the literature. When executed on an Intel Core i3-1005G1 CPU, AVXSIKE outperforms the x64 assembly implementation of SIKE contained in Microsoft’s SIDHv3.4 library by a factor of about 2.5 for key generation and decapsulation, while the encapsulation is even 3.2 times faster

    Proceedings of the West Africa Built Environment Research (WABER) Conference 2021

    Get PDF
    FOREWORD: I would like to welcome each participant to the WABER 2021 Conference. Since its inception in 2009, the WABER Conference series has done a great deal to nurture and support researchers, initially in West Africa, also, in other parts of Africa and elsewhere. I would like to thank all delegates for your participation which enables us to keep this Conference going. The WABER Conference enjoys a positive international reputation and has continued to grow from strength to strength over the past 13 years. For this, I would like to thank our team, keynote speakers and participants over the years for every contribution you have made to the success of this Conference. This year's Conference has an excellent programme, line up of speakers and authors. I would like to thank and commend the authors of all 72 papers in this Conference proceedings. If the research paper writing process was compared to a marathon, the authors of the 72 papers in this publication would be adjudged as the ones who have endured and finished the race. We opened the call for papers for this Conference in December 2020 and over 100 abstracts were submitted by authors. However, it is one thing to propose to write a paper, and it is quite another thing to actually write the paper. Therefore, I would like to thank and congratulate all authors who succeeded in completing the process of getting published in this conference proceedings. It is befitting that we have an excellent range of interesting topics in the 72 papers to be discussed at this conference. We are honoured to welcome Professor Charles Egbu, Vice Chancellor of Leeds Trinity University, to give us a special opening address. In the three days of this conference, we will have various plenary presentations by experienced international academics and I would like to thank and welcome each of them below. Professor Albert Chan Richard Lorch Professor Taibat Lawanson Professor Dato’ Sri Ar Dr Asiah Abdul Rahim Professor George Ofori. In addition to these speakers, we have other interesting sessions on the programme including a special session for doctoral students and supervisors several other experienced speakers addressing various topics that should be of interest to many of us. I would like to thank all members of the organising team particularly Associate Professor Emmanuel Essah, Dr Yakubu Aminu Dodo and Dr Sam Moveh for their efforts which has helped to organise this Conference successfully. I would also like to thank all of our reviewers particularly Associate Professor Emmanuel Essah and Dr Haruna Moda for the considerable time and effort spent reviewing and checking all papers to ensure a high standard of quality. The WABER Conference Team always plays an excellent role in the success of our events and I would like to thank and appreciate the contributions of Florence, Sam Boakye, Victor Ayitey and his team, Kwesi Kwofie and Issah Abdul Rahman to the success of this Conference. I hope you enjoy our first hybrid conference and engage with our exciting speakers on the diverse topics that will be covered over the three days of this Conference

    NTT software optimization using an extended Harvey butterfly

    Get PDF
    Software implementations of the number-theoretic transform (NTT) method often leverage Harvey’s butterfly to gain speedups. This is the case in cryptographic libraries such as IBM’s HElib, Microsoft’s SEAL, and Intel’s HEXL, which provide optimized implementations of fully homomorphic encryption schemes or their primitives. We extend the Harvey butterfly to the radix-4 case for primes in the range [2^31, 2^52). This enables us to use the vector multiply sum logical (VMSL) instruction, which is available on recent IBM Z^(R) platforms. On an IBM z14 system, our implementation performs more than 2.5x faster than the scalar implementation of SEAL we converted to native C. In addition, we implemented a mixed-radix implementation that uses AVX512-IFMA on Intel’s Ice Lake processor, which happens to be ~1.1 times faster than the super-optimized implementation of Intel’s HEXL. Finally, we compare the performance of some of our implementation using GCC versus Clang compilers and discuss the results

    Using z14 Fused-Multiply-Add Instructions to Accelerate Elliptic Curve Cryptography

    Get PDF
    Due to growing commercial applications like Blockchain, the performance of large-integer arithmetic is the focus of both academic and industrial research. IBM introduced a new integer fused multiply-add instruction in z14, called VMSL, to accelerate such workloads. Unlike their floating-point counterparts, there are a variety of integer fused multiply-add instruction designs. VMSL multiplies two pairs of radix 2562^{56} inputs, sums the two results together with an additional 128-bit input, and stores the resulting 128-bit value in a vector register. In this paper, we will describe the unique features of VMSL, the ways in which it is inherently more efficient than alternative specifications, in particular by enabling multiple carry strategies. We will then look at the issues we encountered implementing Montgomery Modular Multiplication for Elliptic Curve Cryptography on z14, including radix choice, mixed radices, instruction selection to trade instruction count for latency, and VMSL-specific optimizations for Montgomery-friendly moduli. The best choices resulted in a 20% increase in throughput

    Fast modular squaring with AVX512IFMA

    Get PDF
    Modular exponentiation represents a signicant workload for public key cryptosystems. Examples include not only the classical RSA, DSA, and DH algorithms, but also the partially homomorphic Paillier encryption. As a result, efficient software implementations of modular exponentiation are an important target for optimization. This paper studies methods for using Intel\u27s forthcoming AVX512 Integer Fused Multiply Accumulate (AVX512IFMA) instructions in order to speed up modular (Montgomery) squaring, which dominates the cost of the exponentiation. We further show how a minor tweak in the architectural definition of AVX512IFMA has the potential to further speed up modular squaring
    corecore