193 research outputs found

    20th SC@RUG 2023 proceedings 2022-2023

    Get PDF

    20th SC@RUG 2023 proceedings 2022-2023

    Get PDF

    Efficient Security Algorithm for Provisioning Constrained Internet of Things (IoT) Devices

    Get PDF
    Addressing the security concerns of constrained Internet of Things (IoT) devices, such as client- side encryption and secure provisioning remains a work in progress. IoT devices characterized by low power and processing capabilities do not exactly fit into the provisions of existing security schemes, as classical security algorithms are built on complex cryptographic functions that are too complex for constrained IoT devices. Consequently, the option for constrained IoT devices lies in either developing new security schemes or modifying existing ones as lightweight. This work presents an improved version of the Advanced Encryption Standard (AES) known as the Efficient Security Algorithm for Power-constrained IoT devices, which addressed some of the security concerns of constrained Internet of Things (IoT) devices, such as client-side encryption and secure provisioning. With cloud computing being the key enabler for the massive provisioning of IoT devices, encryption of data generated by IoT devices before onward transmission to cloud platforms of choice is being advocated via client-side encryption. However, coping with trade-offs remain a notable challenge with Lightweight algorithms, making the innovation of cheaper secu- rity schemes without compromise to security a high desirable in the secure provisioning of IoT devices. A cryptanalytic overview of the consequence of complexity reduction with mathematical justification, while using a Secure Element (ATECC608A) as a trade-off is given. The extent of constraint of a typical IoT device is investigated by comparing the Laptop/SAMG55 implemen- tations of the Efficient algorithm for constrained IoT devices. An analysis of the implementation and comparison of the Algorithm to lightweight algorithms is given. Based on experimentation results, resource constrain impacts a 657% increase in the encryption completion time on the IoT device in comparison to the laptop implementation; of the Efficient algorithm for Constrained IoT devices, which is 0.9 times cheaper than CLEFIA and 35% cheaper than the AES in terms of the encryption completion times, compared to current results in literature at 26%, and with a 93% of avalanche effect rate, well above a recommended 50% in literature. The algorithm is utilised for client-side encryption to provision the device onto AWS IoT core

    Efficient and Side-Channel Resistant Implementations of Next-Generation Cryptography

    Get PDF
    The rapid development of emerging information technologies, such as quantum computing and the Internet of Things (IoT), will have or have already had a huge impact on the world. These technologies can not only improve industrial productivity but they could also bring more convenience to people’s daily lives. However, these techniques have “side effects” in the world of cryptography – they pose new difficulties and challenges from theory to practice. Specifically, when quantum computing capability (i.e., logical qubits) reaches a certain level, Shor’s algorithm will be able to break almost all public-key cryptosystems currently in use. On the other hand, a great number of devices deployed in IoT environments have very constrained computing and storage resources, so the current widely-used cryptographic algorithms may not run efficiently on those devices. A new generation of cryptography has thus emerged, including Post-Quantum Cryptography (PQC), which remains secure under both classical and quantum attacks, and LightWeight Cryptography (LWC), which is tailored for resource-constrained devices. Research on next-generation cryptography is of importance and utmost urgency, and the US National Institute of Standards and Technology in particular has initiated the standardization process for PQC and LWC in 2016 and in 2018 respectively. Since next-generation cryptography is in a premature state and has developed rapidly in recent years, its theoretical security and practical deployment are not very well explored and are in significant need of evaluation. This thesis aims to look into the engineering aspects of next-generation cryptography, i.e., the problems concerning implementation efficiency (e.g., execution time and memory consumption) and security (e.g., countermeasures against timing attacks and power side-channel attacks). In more detail, we first explore efficient software implementation approaches for lattice-based PQC on constrained devices. Then, we study how to speed up isogeny-based PQC on modern high-performance processors especially by using their powerful vector units. Moreover, we research how to design sophisticated yet low-area instruction set extensions to further accelerate software implementations of LWC and long-integer-arithmetic-based PQC. Finally, to address the threats from potential power side-channel attacks, we present a concept of using special leakage-aware instructions to eliminate overwriting leakage for masked software implementations (of next-generation cryptography)

    Towards Efficient In-memory Computing Hardware for Quantized Neural Networks: State-of-the-art, Open Challenges and Perspectives

    Full text link
    The amount of data processed in the cloud, the development of Internet-of-Things (IoT) applications, and growing data privacy concerns force the transition from cloud-based to edge-based processing. Limited energy and computational resources on edge push the transition from traditional von Neumann architectures to In-memory Computing (IMC), especially for machine learning and neural network applications. Network compression techniques are applied to implement a neural network on limited hardware resources. Quantization is one of the most efficient network compression techniques allowing to reduce the memory footprint, latency, and energy consumption. This paper provides a comprehensive review of IMC-based Quantized Neural Networks (QNN) and links software-based quantization approaches to IMC hardware implementation. Moreover, open challenges, QNN design requirements, recommendations, and perspectives along with an IMC-based QNN hardware roadmap are provided

    20th SC@RUG 2023 proceedings 2022-2023

    Get PDF

    Architectures for Code-based Post-Quantum Cryptography

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    20th SC@RUG 2023 proceedings 2022-2023

    Get PDF

    A 334µW 0.158mm2 ASIC for Post-Quantum Key-Encapsulation Mechanism Saber with Low-latency Striding Toom-Cook Multiplication Extended Version

    Get PDF
    The hard mathematical problems that assure the security of our current public-key cryptography (RSA, ECC) are broken if and when a quantum computer appears rendering them ineffective for use in the quantum era. Lattice based cryptography is a novel approach to public key cryptography, of which the mathematical investigation (so far) resists attacks from quantum computers. By choosing a module learning with errors (MLWE) algorithm as the next standard, National Institute of Standard \& Technology (NIST) follows this approach. The multiplication of polynomials is the central bottleneck in the computation of lattice based cryptography. Because public key cryptography is mostly used to establish common secret keys, focus is on compact area, power and energy budget and to a lesser extent on throughput or latency. While most other work focuses on optimizing number theoretic transform (NTT) based multiplications, in this paper we highly optimize a Toom-Cook based multiplier. We demonstrate that a memory-efficient striding Toom-Cook with lazy interpolation, results in a highly compact, low power implementation, which on top enables a very regular memory access scheme. To demonstrate the efficiency, we integrate this multiplier into a Saber post-quantum accelerator, one of the four NIST finalists. Algorithmic innovation to reduce active memory, timely clock gating and shift-add multiplier has helped to achieve 38\% less power than state-of-the art PQC core, 4 Ă—\times less memory, 36.8\% reduction in multiplier energy and 118Ă—\times reduction in active power with respect to state-of-the-art Saber accelerator (not silicon verified). This accelerator consumes 0.158mm20.158mm^2 active area which is lowest reported till date despite process disadvantages of the state-of-the-art designs

    Design and Code Optimization for Systems with Next-generation Racetrack Memories

    Get PDF
    With the rise of computationally expensive application domains such as machine learning, genomics, and fluids simulation, the quest for performance and energy-efficient computing has gained unprecedented momentum. The significant increase in computing and memory devices in modern systems has resulted in an unsustainable surge in energy consumption, a substantial portion of which is attributed to the memory system. The scaling of conventional memory technologies and their suitability for the next-generation system is also questionable. This has led to the emergence and rise of nonvolatile memory ( NVM ) technologies. Today, in different development stages, several NVM technologies are competing for their rapid access to the market. Racetrack memory ( RTM ) is one such nonvolatile memory technology that promises SRAM -comparable latency, reduced energy consumption, and unprecedented density compared to other technologies. However, racetrack memory ( RTM ) is sequential in nature, i.e., data in an RTM cell needs to be shifted to an access port before it can be accessed. These shift operations incur performance and energy penalties. An ideal RTM , requiring at most one shift per access, can easily outperform SRAM . However, in the worst-cast shifting scenario, RTM can be an order of magnitude slower than SRAM . This thesis presents an overview of the RTM device physics, its evolution, strengths and challenges, and its application in the memory subsystem. We develop tools that allow the programmability and modeling of RTM -based systems. For shifts minimization, we propose a set of techniques including optimal, near-optimal, and evolutionary algorithms for efficient scalar and instruction placement in RTMs . For array accesses, we explore schedule and layout transformations that eliminate the longer overhead shifts in RTMs . We present an automatic compilation framework that analyzes static control flow programs and transforms the loop traversal order and memory layout to maximize accesses to consecutive RTM locations and minimize shifts. We develop a simulation framework called RTSim that models various RTM parameters and enables accurate architectural level simulation. Finally, to demonstrate the RTM potential in non-Von-Neumann in-memory computing paradigms, we exploit its device attributes to implement logic and arithmetic operations. As a concrete use-case, we implement an entire hyperdimensional computing framework in RTM to accelerate the language recognition problem. Our evaluation shows considerable performance and energy improvements compared to conventional Von-Neumann models and state-of-the-art accelerators
    • …
    corecore