12 research outputs found
Custom Memory Design for Logic-in-Memory: Drawbacks and Improvements over Conventional Memories
The speed of modern digital systems is severely limited by memory latency (the “Memory Wall” problem). Data exchange between Logic and Memory is also responsible for a large part of the system energy consumption. Logic-in-Memory (LiM) represents an attractive solution to this problem. By performing part of the computations directly inside the memory the system speed can be improved while reducing its energy consumption. LiM solutions that offer the major boost in performance are based on the modification of the memory cell. However, what is the cost of such modifications? How do these impact the memory array performance? In this work, this question is addressed by analysing a LiM memory array implementing an algorithm for the maximum/minimum value computation. The memory array is designed at physical level using the FreePDK 45nm CMOS process, with three memory cell variants, and its performance is compared to SRAM and CAM memories. Results highlight that read and write operations performance is worsened but in-memory operations result to be very efficient: a 55.26% reduction in the energy-delay product is measured for the AND operation with respect to the SRAM read one. Therefore, the LiM approach represents a very promising solution for low-density and high-performance memories
FPGA-Based PUF Designs: A Comprehensive Review and Comparative Analysis
Field-programmable gate arrays (FPGAs) have firmly established themselves as dynamic platforms for the implementation of physical unclonable functions (PUFs). Their intrinsic reconfigurability and profound implications for enhancing hardware security make them an invaluable asset in this realm. This groundbreaking study not only dives deep into the universe of FPGA-based PUF designs but also offers a comprehensive overview coupled with a discerning comparative analysis. PUFs are the bedrock of device authentication and key generation and the fortification of secure cryptographic protocols. Unleashing the potential of FPGA technology expands the horizons of PUF integration across diverse hardware systems. We set out to understand the fundamental ideas behind PUF and how crucially important it is to current security paradigms. Different FPGA-based PUF solutions, including static, dynamic, and hybrid systems, are closely examined. Each design paradigm is painstakingly examined to reveal its special qualities, functional nuances, and weaknesses. We closely assess a variety of performance metrics, including those related to distinctiveness, reliability, and resilience against hostile threats. We compare various FPGA-based PUF systems against one another to expose their unique advantages and disadvantages. This study provides system designers and security professionals with the crucial information they need to choose the best PUF design for their particular applications. Our paper provides a comprehensive view of the functionality, security capabilities, and prospective applications of FPGA-based PUF systems. The depth of knowledge gained from this research advances the field of hardware security, enabling security practitioners, researchers, and designers to make wise decisions when deciding on and implementing FPGA-based PUF solutions.publishedVersio
Time- and Amplitude-Controlled Power Noise Generator against SPA Attacks for FPGA-Based IoT Devices
Power noise generation for masking power traces is a powerful countermeasure against
Simple Power Analysis (SPA), and it has also been used against Differential Power Analysis (DPA) or
Correlation Power Analysis (CPA) in the case of cryptographic circuits. This technique makes use of
power consumption generators as basic modules, which are usually based on ring oscillators when
implemented on FPGAs. These modules can be used to generate power noise and to also extract
digital signatures through the power side channel for Intellectual Property (IP) protection purposes.
In this paper, a new power consumption generator, named Xored High Consuming Module (XHCM),
is proposed. XHCM improves, when compared to others proposals in the literature, the amount of
current consumption per LUT when implemented on FPGAs. Experimental results show that these
modules can achieve current increments in the range from 2.4 mA (with only 16 LUTs on Artix-7
devices with a power consumption density of 0.75 mW/LUT when using a single HCM) to 11.1 mA
(with 67 LUTs when using 8 XHCMs, with a power consumption density of 0.83 mW/LUT). Moreover,
a version controlled by Pulse-Width Modulation (PWM) has been developed, named PWM-XHCM,
which is, as XHCM, suitable for power watermarking. In order to build countermeasures against
SPA attacks, a multi-level XHCM (ML-XHCM) is also presented, which is capable of generating
different power consumption levels with minimal area overhead (27 six-input LUTS for generating
16 different amplitude levels on Artix-7 devices). Finally, a randomized version, named RML-XHCM,
has also been developed using two True Random Number Generators (TRNGs) to generate current
consumption peaks with random amplitudes at random times. RML-XHCM requires less than
150 LUTs on Artix-7 devices. Taking into account these characteristics, two main contributions
have been carried out in this article: first, XHCM and PWM-XHCM provide an efficient power
consumption generator for extracting digital signatures through the power side channel, and on the
other hand, ML-XHCM and RML-XHCM are powerful tools for the protection of processing units
against SPA attacks in IoT devices implemented on FPGAs.Junta de AndaluciaEuropean Commission B-TIC-588-UGR2
SoK : Remote Power Analysis
In recent years, numerous attacks have appeared that aim to steal secret information from their victim using the power side-channel vector, yet without direct physical access. These attacks are called Remote Power Attacks or Remote Power Analysis, utilizing resources that are natively present inside the victim environment. However, there is no unified definition about the limitations that a power attack requires to be defined as remote. This paper aims to propose a unified definition and concrete threat models to clearly differentiate remote power attacks from non-remote ones. Additionally, we collect the main remote power attacks performed so far from the literature, and the principal proposed countermeasures to avoid them. The search of such countermeasures denoted a clear gap in preventing remote power attacks at the technical level. Thus, the academic community must face an important challenge to avoid this emerging threat, given the clear room for improvement that should be addressed in terms of defense and security of devices that work with private information.acceptedVersionPeer reviewe
HW-Flow-Fusion: Inter-Layer Scheduling for Convolutional Neural Network Accelerators with Dataflow Architectures
Energy and throughput efficient acceleration of convolutional neural networks (CNN) on devices with a strict power budget is achieved by leveraging different scheduling techniques to minimize data movement and maximize data reuse. Several dataflow mapping frameworks have been developed to explore the optimal scheduling of CNN layers on reconfigurable accelerators. However, previous works usually optimize each layer singularly, without leveraging the data reuse between the layers of CNNs. In this work, we present an analytical model to achieve efficient data reuse by searching for efficient scheduling of communication and computation across layers. We call this inter-layer scheduling framework HW-Flow-Fusion, as we explore the fused map-space of multiple layers sharing the available resources of the same accelerator, investigating the constraints and trade-offs of mapping the execution of multiple workloads with data dependencies. We propose a memory-efficient data reuse model, tiling, and resource partitioning strategies to fuse multiple layers without recomputation. Compared to standard single-layer scheduling, inter-layer scheduling can reduce the communication volume by 51% and 53% for selected VGG16-E and ResNet18 layers on a spatial array accelerator, and reduce the latency by 39% and 34% respectively, while also increasing the computation to communication ratio which improves the memory bandwidth efficiency
Secure Physical Design
An integrated circuit is subject to a number of attacks including information leakage, side-channel attacks, fault-injection, malicious change, reverse engineering, and piracy. Majority of these attacks take advantage of physical placement and routing of cells and interconnects. Several measures have already been proposed to deal with security issues of the high level functional design and logic synthesis. However, to ensure end-to-end trustworthy IC design flow, it is necessary to have security sign-off during physical design flow. This paper presents a secure physical design roadmap to enable end-to-end trustworthy IC design flow. The paper also discusses utilization of AI/ML to establish security at the layout level. Major research challenges in obtaining a secure physical design are also discussed
SoC Root Canal!
Finding the root cause of power-based side-channel leakage becomes harder when multiple layers of design abstraction are involved. While side-channel leakage originates in processor hardware, the dangerous consequences may only become apparent in the cryptographic software that runs on the processor. This contribution presents RootCanal, a methodology to explain the origin of side-channel leakage in a software program in terms of the underlying micro-architecture and system architecture. We simulate the hardware power consumption at the gate level and perform a non-specific test to identify the logic gates that contribute most sidechannel leakage. Then, we back-annotate those findings to the related activities in the software. The resulting analysis can automatically point out non-trivial causes of side-channel leakages. To illustrate RootCanal’s capabilities, we discuss a collection of case studies
Recommended from our members
Efficient machine learning software stack from algorithms to compilation
Machine learning enables the extraction of knowledge from data and decision-making without explicit programming, achieving great success and revolutionizing many fields. These successes can be attributed to the continuous advancements in machine learning software and hardware, which have expanded the boundaries and facilitated breakthroughs in diverse applications. The machine learning software stack is a comprehensive collection of components used to solve problems with machine learning algorithms. It encompasses problem definitions, data processing, model and method designs, software frameworks, libraries, code optimization, and system management. This stack supports the entire life cycle of a machine learning project. The software stack allows the community to stand on the shoulders of previous great work and push the limit of machine learning, fostering innovation and enabling broader adoption of machine learning techniques in academia and industry. The software stack is usually divided into algorithm and compilation with distinct design principles. Algorithm design prioritizes task-related performance, while compilation focuses on execution time and resource consumption on hardware devices. Maintaining arithmetic equivalence is optional in algorithm design, but compulsory in compilation to ensure consistent results. The compilation is closer to hardware than algorithm design. Compilation engineers optimize for hardware specifications, while algorithm developers usually do not prioritize hardware-friendliness. Opportunities to enhance hardware efficiency exist in algorithm and compilation designs, as well as their interplay. Despite extensive innovations and improvements, efficiency in the machine learning software stack is a continuing challenge. Algorithm design proposes efficient model architectures and learning algorithms, while compilation design optimizes computation graphs and simplifies operations. However, there is still a gap between the demand for efficiency and the current solutions, driven by rapidly growing workloads, limited resources in specific machine learning applications, and the need for cross-layer design. Addressing these challenges requires interdisciplinary research and collaboration. Improving efficiency in the machine learning software stack will optimize performance and enhance the accessibility and applicability of machine learning technologies. In this dissertation, we focus on addressing these efficiency challenges from the perspectives of machine learning algorithms and compilation. We introduce three novel improvements that enhance the efficiency of mainstream machine learning algorithms. Firstly, effective gradient matching for dataset condensation generates a small insightful dataset, accelerating training and other related tasks. Additionally, NormSoftmax proposes to append a normalization layer to achieve fast and stable training in Transformers and classification models. Lastly, mixed precision hardware-aware neural architecture search combines mixed-precision quantization, neural architecture search, and hardware energy efficiency, resulting in significantly more efficient neural networks than using a single method. However, algorithmic efficiency alone is insufficient to fully exploit the potential in the machine learning software stack. We delve into and optimize the compilation processes with three techniques. Firstly, we simplify the layer normalization in the influential Transformers, obtaining two equivalent and efficient Transformer variants with alternative normalization types. Our proposed variants enable efficient training and inference of popular models like GPT and ViT. Secondly, we formulate and solve the scheduling problem for reversible neural architectures, finding the optimal training schedule that fully leverages the computation and memory resources on hardware accelerators. Lastly, optimizer fusion allows users to accelerate the training process in the eager execution mode of machine learning frameworks. It leverages the better locality on hardware and parallelism in the computation graphs. Throughout the dissertation, we emphasize the integration of efficient algorithms and compilation into a cohesive machine learning software stack. We also consider hardware properties to provide hardware-friendly software designs. We demonstrate the effectiveness of the proposed methods in algorithm and compilation through extensive experiments. Our approaches effectively reduce the time and energy required for both training and inference. Ultimately, our methods have the potential to empower machine learning practitioners and researchers to build more efficient, powerful, robust, scalable, and accessible machine learning solutions.Electrical and Computer Engineerin
Practical Lightweight Security: Physical Unclonable Functions and the Internet of Things
In this work, we examine whether Physical Unclonable Functions (PUFs) can act as lightweight security mechanisms for practical applications in the context of the Internet of Things (IoT). In order to do so, we first discuss what PUFs are, and note that memory-based PUFs seem to fit the best to the framework of the IoT. Then, we consider a number of relevant memory-based PUF designs and their properties, and evaluate their ability to provide security in nominal and adverse conditions. Finally, we present and assess a number of practical PUF-based security protocols for IoT devices and networks, in order to confirm that memory-based PUFs can indeed constitute adequate security mechanisms for the IoT, in a practical and lightweight fashion.
More specifically, we first consider what may constitute a PUF, and we redefine PUFs as inanimate physical objects whose characteristics can be exploited in order to obtain a behaviour similar to a highly distinguishable (i.e., “(quite) unique”) mathematical function. We note that PUFs share many characteristics with biometrics, with the main difference being that PUFs are based on the characteristics of inanimate objects, while biometrics are based on the characteristics of humans and other living creatures. We also note that it cannot really be proven that PUFs are unique per instance, but they should be considered to be so, insofar as (human) biometrics are also considered to be unique per instance.
We, then, proceed to discuss the role of PUFs as security mechanisms for the IoT, and we determine that memory-based PUFs are particularly suited for this function. We observe that the IoT nowadays consists of heterogeneous devices connected over diverse networks, which include both high-end and resource-constrained devices. Therefore, it is essential that a security solution for the IoT is not only effective, but also highly scalable, flexible, lightweight, and cost-efficient, in order to be considered as practical. To this end, we note that PUFs have been proposed as security mechanisms for the IoT in the related work, but the practicality of the relevant security mechanisms has not been sufficiently studied.
We, therefore, examine a number of memory-based PUFs that are implemented using Commercial Off-The-Shelf (COTS) components, and assess their potential to serve as acceptable security mechanisms in the context of the IoT, not only in terms of effectiveness and cost, but also under both nominal and adverse conditions, such as ambient temperature and supply voltage variations, as well as in the presence of (ionising) radiation. In this way, we can determine whether memory-based PUFs are truly suitable to be used in the various application areas of the IoT, which may even involve particularly adverse environments, e.g., in IoT applications involving space modules and operations.
Finally, we also explore the potential of memory-based PUFs to serve as adequate security mechanisms for the IoT in practice, by presenting and analysing a number of cryptographic protocols based on these PUFs. In particular, we study how memory-based PUFs can be used for key generation, as well as device identification, and authentication, their role as security mechanisms for current and next-generation IoT devices and networks, and their potential for applications in the space segment of the IoT and in other adverse environments. Additionally, this work also discusses how memory-based PUFs can be utilised for the implementation of lightweight reconfigurable PUFs that allow for advanced security applications. In this way, we are able to confirm that memory-based PUFs can indeed provide flexible, scalable, and efficient security solutions for the IoT, in a practical, lightweight, and inexpensive manner