173 research outputs found

    Reliable and High-Performance Hardware Architectures for the Advanced Encryption Standard/Galois Counter Mode

    Get PDF
    The high level of security and the fast hardware and software implementations of the Advanced Encryption Standard (AES) have made it the first choice for many critical applications. Since its acceptance as the adopted symmetric-key algorithm, the AES has been utilized in various security-constrained applications, many of which are power and resource constrained and require reliable and efficient hardware implementations. In this thesis, first, we investigate the AES algorithm from the concurrent fault detection point of view. We note that in addition to the efficiency requirements of the AES, it must be reliable against transient and permanent internal faults or malicious faults aiming at revealing the secret key. This reliability analysis and proposing efficient and effective fault detection schemes are essential because fault attacks have become a serious concern in cryptographic applications. Therefore, we propose, design, and implement various novel concurrent fault detection schemes for different AES hardware architectures. These include different structure-dependent and independent approaches for detecting single and multiple stuck-at faults using single and multi-bit signatures. The recently standardized authentication mode of the AES, i.e., Galois/Counter Mode (GCM), is also considered in this thesis. We propose efficient architectures for the AES-GCM algorithm. In this regard, we investigate the AES algorithm and we propose low-complexity and low-power hardware implementations for it, emphasizing on its nonlinear transformation, i.e., SubByes (S-boxes). We present new formulations for this transformation and through exhaustive hardware implementations, we show that the proposed architectures outperform their counterparts in terms of efficiency. Moreover, we present parallel, high-performance new schemes for the hardware implementations of the GCM to improve its throughput and reduce its latency. The performance of the proposed efficient architectures for the AES-GCM and their fault detection approaches are benchmarked using application-specific integrated circuit (ASIC) and field-programmable gate array (FPGA) hardware platforms. Our comparison results show that the proposed hardware architectures outperform their existing counterparts in terms of efficiency and fault detection capability

    Hardware Mechanisms for Efficient Memory System Security

    Full text link
    The security of a computer system hinges on the trustworthiness of the operating system and the hardware, as applications rely on them to protect code and data. As a result, multiple protections for safeguarding the hardware and OS from attacks are being continuously proposed and deployed. These defenses, however, are far from ideal as they only provide partial protection, require complex hardware and software stacks, or incur high overheads. This dissertation presents hardware mechanisms for efficiently providing strong protections against an array of attacks on the memory hardware and the operating system’s code and data. In the first part of this dissertation, we analyze and optimize protections targeted at defending memory hardware from physical attacks. We begin by showing that, contrary to popular belief, current DDR3 and DDR4 memory systems that employ memory scrambling are still susceptible to cold boot attacks (where the DRAM is frozen to give it sufficient retention time and is then re-read by an attacker after reboot to extract sensitive data). We then describe how memory scramblers in modern memory controllers can be transparently replaced by strong stream ciphers without impacting performance. We also demonstrate how the large storage overheads associated with authenticated memory encryption schemes (which enable tamper-proof storage in off-chip memories) can be reduced by leveraging compact integer encodings and error-correcting code (ECC) DRAMs – without forgoing the error detection and correction capabilities of ECC DRAMs. The second part of this dissertation presents Neverland: a low-overhead, hardware-assisted, memory protection scheme that safeguards the operating system from rootkits and kernel-mode malware. Once the system is done booting, Neverland’s hardware takes away the operating system’s ability to overwrite certain configuration registers, as well as portions of its own physical address space that contain kernel code and security-critical data. Furthermore, it prohibits the CPU from fetching privileged code from any memory region lying outside the physical addresses assigned to the OS kernel and drivers. This combination of protections makes it extremely hard for an attacker to tamper with the kernel or introduce new privileged code into the system – even in the presence of software vulnerabilities. Neverland enables operating systems to reduce their attack surface without having to rely on complex integrity monitoring software or hardware. The hardware mechanisms we present in this dissertation provide building blocks for constructing a secure computing base while incurring lower overheads than existing protections.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/147604/1/salessaf_1.pd

    Proceedings of the 5th International Workshop on Reconfigurable Communication-centric Systems on Chip 2010 - ReCoSoC\u2710 - May 17-19, 2010 Karlsruhe, Germany. (KIT Scientific Reports ; 7551)

    Get PDF
    ReCoSoC is intended to be a periodic annual meeting to expose and discuss gathered expertise as well as state of the art research around SoC related topics through plenary invited papers and posters. The workshop aims to provide a prospective view of tomorrow\u27s challenges in the multibillion transistor era, taking into account the emerging techniques and architectures exploring the synergy between flexible on-chip communication and system reconfigurability

    A scalable, portable, FPGA-based implementation of the Unscented Kalman Filter

    Get PDF
    Sustained technological progress has come to a point where robotic/autonomous systems may well soon become ubiquitous. In order for these systems to actually be useful, an increase in autonomous capability is necessary for aerospace, as well as other, applications. Greater aerospace autonomous capability means there is a need for high performance state estimation. However, the desire to reduce costs through simplified development processes and compact form factors can limit performance. A hardware-based approach, such as using a Field Programmable Gate Array (FPGA), is common when high performance is required, but hardware approaches tend to have a more complicated development process when compared to traditional software approaches; greater development complexity, in turn, results in higher costs. Leveraging the advantages of both hardware-based and software-based approaches, a hardware/software (HW/SW) codesign of the Unscented Kalman Filter (UKF), based on an FPGA, is presented. The UKF is split into an application-specific part, implemented in software to retain portability, and a non-application-specific part, implemented in hardware as a parameterisable IP core to increase performance. The codesign is split into three versions (Serial, Parallel and Pipeline) to provide flexibility when choosing the balance between resources and performance, allowing system designers to simplify the development process. Simulation results demonstrating two possible implementations of the design, a nanosatellite application and a Simultaneous Localisation and Mapping (SLAM) application, are presented. These results validate the performance of the HW/SW UKF and demonstrate its portability, particularly in small aerospace systems. Implementation (synthesis, timing, power) details for a variety of situations are presented and analysed to demonstrate how the HW/SW codesign can be scaled for any application

    FPGA structures for high speed and low overhead dynamic circuit specialization

    Get PDF
    A Field Programmable Gate Array (FPGA) is a programmable digital electronic chip. The FPGA does not come with a predefined function from the manufacturer; instead, the developer has to define its function through implementing a digital circuit on the FPGA resources. The functionality of the FPGA can be reprogrammed as desired and hence the name “field programmable”. FPGAs are useful in small volume digital electronic products as the design of a digital custom chip is expensive. Changing the FPGA (also called configuring it) is done by changing the configuration data (in the form of bitstreams) that defines the FPGA functionality. These bitstreams are stored in a memory of the FPGA called configuration memory. The SRAM cells of LookUp Tables (LUTs), Block Random Access Memories (BRAMs) and DSP blocks together form the configuration memory of an FPGA. The configuration data can be modified according to the user’s needs to implement the user-defined hardware. The simplest way to program the configuration memory is to download the bitstreams using a JTAG interface. However, modern techniques such as Partial Reconfiguration (PR) enable us to configure a part in the configuration memory with partial bitstreams during run-time. The reconfiguration is achieved by swapping in partial bitstreams into the configuration memory via a configuration interface called Internal Configuration Access Port (ICAP). The ICAP is a hardware primitive (macro) present in the FPGA used to access the configuration memory internally by an embedded processor. The reconfiguration technique adds flexibility to use specialized ci rcuits that are more compact and more efficient t han t heir b ulky c ounterparts. An example of such an implementation is the use of specialized multipliers instead of big generic multipliers in an FIR implementation with constant coefficients. To specialize these circuits and reconfigure during the run-time, researchers at the HES group proposed the novel technique called parameterized reconfiguration that can be used to efficiently and automatically implement Dynamic Circuit Specialization (DCS) that is built on top of the Partial Reconfiguration method. It uses the run-time reconfiguration technique that is tailored to implement a parameterized design. An application is said to be parameterized if some of its input values change much less frequently than the rest. These inputs are called parameters. Instead of implementing these parameters as regular inputs, in DCS these inputs are implemented as constants, and the application is optimized for the constants. For every change in parameter values, the design is re-optimized (specialized) during run-time and implemented by reconfiguring the optimized design for a new set of parameters. In DCS, the bitstreams of the parameterized design are expressed as Boolean functions of the parameters. For every infrequent change in parameters, a specialized FPGA configuration is generated by evaluating the corresponding Boolean functions, and the FPGA is reconfigured with the specialized configuration. A detailed study of overheads of DCS and providing suitable solutions with appropriate custom FPGA structures is the primary goal of the dissertation. I also suggest different improvements to the FPGA configuration memory architecture. After offering the custom FPGA structures, I investigated the role of DCS on FPGA overlays and the use of custom FPGA structures that help to reduce the overheads of DCS on FPGA overlays. By doing so, I hope I can convince the developer to use DCS (which now comes with minimal costs) in real-world applications. I start the investigations of overheads of DCS by implementing an adaptive FIR filter (using the DCS technique) on three different Xilinx FPGA platforms: Virtex-II Pro, Virtex-5, and Zynq-SoC. The study of how DCS behaves and what is its overhead in the evolution of the three FPGA platforms is the non-trivial basis to discover the costs of DCS. After that, I propose custom FPGA structures (reconfiguration controllers and reconfiguration drivers) to reduce the main overhead (reconfiguration time) of DCS. These structures not only reduce the reconfiguration time but also help curbing the power hungry part of the DCS system. After these chapters, I study the role of DCS on FPGA overlays. I investigate the effect of the proposed FPGA structures on Virtual-Coarse-Grained Reconfigurable Arrays (VCGRAs). I classify the VCGRA implementations into three types: the conventional VCGRA, partially parameterized VCGRA and fully parameterized VCGRA depending upon the level of parameterization. I have designed two variants of VCGRA grids for HPC image processing applications, namely, the MAC grid and Pixie. Finally, I try to tackle the reconfiguration time overhead at the hardware level of the FPGA by customizing the FPGA configuration memory architecture. In this part of my research, I propose to use a parallel memory structure to improve the reconfiguration time of DCS drastically. However, this improvement comes with a significant overhead of hardware resources which will need to be solved in future research on commercial FPGA configuration memory architectures

    Stream ciphers for secure display

    Get PDF
    In any situation where private, proprietary or highly confidential material is being dealt with, the need to consider aspects of data security has grown ever more important. It is usual to secure such data from its source, over networks and on to the intended recipient. However, data security considerations typically stop at the recipient's processor, leaving connections to a display transmitting raw data which is increasingly in a digital format and of value to an adversary. With a progression to wireless display technologies the prominence of this vulnerability is set to rise, making the implementation of 'secure display' increasingly desirable. Secure display takes aspects of data security right to the display panel itself, potentially minimising the cost, component count and thickness of the final product. Recent developments in display technologies should help make this integration possible. However, the processing of large quantities of time-sensitive data presents a significant challenge in such resource constrained environments. Efficient high- throughput decryption is a crucial aspect of the implementation of secure display and one for which the widely used and well understood block cipher may not be best suited. Stream ciphers present a promising alternative and a number of strong candidate algorithms potentially offer the hardware speed and efficiency required. In the past, similar stream ciphers have suffered from algorithmic vulnerabilities. Although these new-generation designs have done much to respond to this concern, the relatively short 80-bit key lengths of some proposed hardware candidates, when combined with ever-advancing computational power, leads to the thesis identifying exhaustive search of key space as a potential attack vector. To determine the value of protection afforded by such short key lengths a unique hardware key search engine for stream ciphers is developed that makes use of an appropriate data element to improve search efficiency. The simulations from this system indicate that the proposed key lengths may be insufficient for applications where data is of long-term or high value. It is suggested that for the concept of secure display to be accepted, a longer key length should be used

    Recent Advances in Embedded Computing, Intelligence and Applications

    Get PDF
    The latest proliferation of Internet of Things deployments and edge computing combined with artificial intelligence has led to new exciting application scenarios, where embedded digital devices are essential enablers. Moreover, new powerful and efficient devices are appearing to cope with workloads formerly reserved for the cloud, such as deep learning. These devices allow processing close to where data are generated, avoiding bottlenecks due to communication limitations. The efficient integration of hardware, software and artificial intelligence capabilities deployed in real sensing contexts empowers the edge intelligence paradigm, which will ultimately contribute to the fostering of the offloading processing functionalities to the edge. In this Special Issue, researchers have contributed nine peer-reviewed papers covering a wide range of topics in the area of edge intelligence. Among them are hardware-accelerated implementations of deep neural networks, IoT platforms for extreme edge computing, neuro-evolvable and neuromorphic machine learning, and embedded recommender systems

    Energy Efficient Hardware Design for Securing the Internet-of-Things

    Full text link
    The Internet of Things (IoT) is a rapidly growing field that holds potential to transform our everyday lives by placing tiny devices and sensors everywhere. The ubiquity and scale of IoT devices require them to be extremely energy efficient. Given the physical exposure to malicious agents, security is a critical challenge within the constrained resources. This dissertation presents energy-efficient hardware designs for IoT security. First, this dissertation presents a lightweight Advanced Encryption Standard (AES) accelerator design. By analyzing the algorithm, a novel method to manipulate two internal steps to eliminate storage registers and replace flip-flops with latches to save area is discovered. The proposed AES accelerator achieves state-of-art area and energy efficiency. Second, the inflexibility and high Non-Recurring Engineering (NRE) costs of Application-Specific-Integrated-Circuits (ASICs) motivate a more flexible solution. This dissertation presents a reconfigurable cryptographic processor, called Recryptor, which achieves performance and energy improvements for a wide range of security algorithms across public key/secret key cryptography and hash functions. The proposed design employs circuit techniques in-memory and near-memory computing and is more resilient to power analysis attack. In addition, a simulator for in-memory computation is proposed. It is of high cost to design and evaluate new-architecture like in-memory computing in Register-transfer level (RTL). A C-based simulator is designed to enable fast design space exploration and large workload simulations. Elliptic curve arithmetic and Galois counter mode are evaluated in this work. Lastly, an error resilient register circuit, called iRazor, is designed to tolerate unpredictable variations in manufacturing process operating temperature and voltage of VLSI systems. When integrated into an ARM processor, this adaptive approach outperforms competing industrial techniques such as frequency binning and canary circuits in performance and energy.PHDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/147546/1/zhyiqun_1.pd

    Hardware security design from circuits to systems

    Get PDF
    The security of hardware implementations is of considerable importance, as even the most secure and carefully analyzed algorithms and protocols can be vulnerable in their hardware realization. For instance, numerous successful attacks have been presented against the Advanced Encryption Standard, which is approved for top secret information by the National Security Agency. There are numerous challenges for hardware security, ranging from critical power and resource constraints in sensor networks to scalability and automation for large Internet of Things (IoT) applications. The physically unclonable function (PUF) is a promising building block for hardware security, as it exposes a device-unique challenge-response behavior which depends on process variations in fabrication. It can be used in a variety of applications including random number generation, authentication, fingerprinting, and encryption. The primary concerns for PUF are reliability in presence of environmental variations, area and power overhead, and process-dependent randomness of the challenge-response behavior. Carbon nanotube field-effect transistors (CNFETs) have been shown to have excellent electrical and unique physical characteristics. They are a promising candidate to replace silicon transistors in future very large scale integration (VLSI) designs. We present the Carbon Nanotube PUF (CNPUF), which is the first PUF design that takes advantage of unique CNFET characteristics. CNPUF achieves higher reliability against environmental variations and increases the resistance against modeling attacks. Furthermore, CNPUF has a considerable power and energy reduction in comparison to previous ultra-low power PUF designs of 89.6% and 98%, respectively. Moreover, CNPUF allows a power-security tradeoff in an extended design, which can greatly increase the resilience against modeling attacks. Despite increasing focus on defenses against physical attacks, consistent security oriented design of embedded systems remains a challenge, as most formalizations and security models are concerned with isolated physical components or a high-level concept. Therefore, we build on existing work on hardware security and provide four contributions to system-oriented physical defense: (i) A system-level security model to overcome the chasm between secure components and requirements of high-level protocols; this enables synergy between component-oriented security formalizations and theoretically proven protocols. (ii) An analysis of current practices in PUF protocols using the proposed system-level security model; we identify significant issues and expose assumptions that require costly security techniques. (iii) A System-of-PUF (SoP) that utilizes the large PUF design-space to achieve security requirements with minimal resource utilization; SoP requires 64% less gate-equivalent units than recently published schemes. (iv) A multilevel authentication protocol based on SoP which is validated using our system-level security model and which overcomes current vulnerabilities. Furthermore, this protocol offers breach recognition and recovery. Unpredictability and reliability are core requirements of PUFs: unpredictability implies that an adversary cannot sufficiently predict future responses from previous observations. Reliability is important as it increases the reproducibility of PUF responses and hence allows validation of expected responses. However, advanced machine-learning algorithms have been shown to be a significant threat to the practical validity of PUFs, as they can accurately model PUF behavior. The most effective technique was shown to be the XOR-based combination of multiple PUFs, but as this approach drastically reduces reliability, it does not scale well against software-based machine-learning attacks. We analyze threats to PUF security and propose PolyPUF, a scalable and secure architecture to introduce polymorphic PUF behavior. This architecture significantly increases model-building resistivity while maintaining reliability. An extensive experimental evaluation and comparison demonstrate that the PolyPUF architecture can secure various PUF configurations and is the only evaluated approach to withstand highly complex neural network machine-learning attacks. Furthermore, we show that PolyPUF consumes less energy and has less implementation overhead in comparison to lightweight reference architectures. Emerging technologies such as the Internet of Things (IoT) heavily rely on hardware security for data and privacy protection. The outsourcing of integrated circuit (IC) fabrication introduces diverse threat vectors with different characteristics, such that the security of each device has unique focal points. Hardware Trojan horses (HTH) are a significant threat for IoT devices as they process security critical information with limited resources. HTH for information leakage are particularly difficult to detect as they have minimal footprint. Moreover, constantly increasing integration complexity requires automatic synthesis to maintain the pace of innovation. We introduce the first high-level synthesis (HLS) flow that produces a threat-targeted and security enhanced hardware design to prevent HTH injection by a malicious foundry. Through analysis of entropy loss and criticality decay, the presented algorithms implement highly resource-efficient targeted information dispersion. An obfuscation flow is introduced to camouflage the effects of dispersion and reduce the effectiveness of reverse engineering. A new metric for the combined security of the device is proposed, and dispersion and obfuscation are co-optimized to target user-supplied threat parameters under resource constraints. The flow is evaluated on existing HLS benchmarks and a new IoT-specific benchmark, and shows significant resource savings as well as adaptability. The IoT and cloud computing rely on strong confidence in security of confidential or highly privacy sensitive data. As (differential) power attacks can take advantage of side-channel leakage to expose device-internal secrets, side-channel leakage is a major concern with ongoing research focus. However, countermeasures typically require expert-level security knowledge for efficient application, which limits adaptation in the highly competitive and time-constrained IoT field. We address this need by presenting the first HLS flow with primary focus on side-channel leakage reduction. Minimal security annotation to the high-level C-code is sufficient to perform automatic analysis of security critical operations with corresponding insertion of countermeasures. Additionally, imbalanced branches are detected and corrected. For practicality, the flow can meet both resource and information leakage constraints. The presented flow is extensively evaluated on established HLS benchmarks and a general IoT benchmark. Under identical resource constraints, leakage is reduced between 32% and 72% compared to the baseline. Under leakage target, the constraints are achieved with 31% to 81% less resource overhead
    • …
    corecore