123 research outputs found

    Robustness of TAP-based Scan Networks

    Get PDF
    It is common to embed instruments when developing integrated circuits (ICs). These instruments are accessed at post-silicon validation, debugging, wafer sort, package test, burn-in, printed circuit board bring-up, printed circuit board assembly manufacturing test, power-on self-test, and operator-driven in-field test. At any of these scenarios, it is of interest to access some but not all of the instruments. IEEE 1149.1-2013 and IEEE 1687 propose Test Access Port based (TAP-based) mechanisms to design flexible scan networks such that any combination of instruments can be accessed from outside of the IC. Previous works optimize TAP-based scan networks for one scenario with a known number of accesses. However, at design time, it is difficult to foresee all needed scenarios and the exact number of accesses to instruments. Moreover, the number of accesses might change due to late design changes, addition/exclusion of tests, and changes of constraints. In this paper, we analyze and compare seven IEEE 1687 compatible network design approaches in terms of instrument access time, hardware overhead, and robustness. Given the similarities between IEEE 1149.1-2013 and IEEE 1687, the conclusions are also applicable to IEEE 1149.1-2013 networks

    Soft Error Resistant Design of the AES Cipher Using SRAM-based FPGA

    Get PDF
    This thesis presents a new architecture for the reliable implementation of the symmetric-key algorithm Advanced Encryption Standard (AES) in Field Programmable Gate Arrays (FPGAs). Since FPGAs are prone to soft errors caused by radiation, and AES is highly sensitive to errors, reliable architectures are of significant concern. Energetic particles hitting a device can flip bits in FPGA SRAM cells controlling all aspects of the implementation. Unlike previous research, heterogeneous error detection techniques based on properties of the circuit and functionality are used to provide adequate reliability at the lowest possible cost. The use of dual ported block memory for SubBytes, duplication for the control circuitry, and a new enhanced parity technique for MixColumns is proposed. Previous parity techniques cover single errors in datapath registers, however, soft errors can occur in the control circuitry as well as in SRAM cells forming the combinational logic and routing. In this research, propagation of single errors is investigated in the routed netlist. Weaknesses of the previous parity techniques are identified. Architectural redesign at the register-transfer level is introduced to resolve undetected single errors in both the routing and the combinational logic. Reliability of the AES implementation is not only a critical issue in large scale FPGA-based systems but also at both higher altitudes and in space applications where there are a larger number of energetic particles. Thus, this research is important for providing efficient soft error resistant design in many current and future secure applications

    Abusing Hardware Race Conditions for High Throughput Energy Efficient Computation

    Get PDF
    We propose a novel computing approach, called “Race Logic”, which utilizes a new data representation to accelerate a broad class of optimization problems, such as those solved by dynamic programming algorithms. The core idea of Race Logic is to deliberately engineer race conditions in a circuit to perform useful computation. In Race Logic, information, instead of being represented as logic levels (as is done in conventional logic), is represented as a timing delay. Computations can then be performed by observing the relative propagation times of signals injected into a configurable circuit (i.e. the outcome of races through the circuit).In this dissertation I will introduce Race Based computation and talk about multiple VLSI implementations. We first begin by considering a synchronous approach, which uses simple clocked delay elements. Though this synchronous implementation outperforms highly optimized conventional implementations of the well-studied, DNA sequence alignment problem, its third order energy scaling with problem size and limited dynamic range of timing delays are its major pitfalls. Next, in the search for energy efficiency, we study asynchronous designs in order to understand the performance trade-offs and applicability of this new architecture. Finally, I will present the results of a prototype asynchronous Race Logic chip and demonstrate that Race-Based computations can align up to 10 million 50 symbol long DNA sequences per second, about 2-3 orders of magnitude faster than the state of the art general purpose computing systems

    The Assq Chip and Its Progeny

    Get PDF
    The Assq Chip lives on the memory bus of the Scheme-81 chip of Sussman et al and serves as a utility for the computation of a number of functions concerned with the maintenance of linear tables and lists. Motivated by a desire to apply the design methodology implicit in Scheme-81, it was designed in about two months, has a very simple architecture and layout, and is primarily machine-generated. The chip and the design process are described and evaluated in the context of a proposal to construct a Scheme-to-silicon compiler that automates the design methodology used in the Assq Chip.MIT Artificial Intelligence Laborator

    SCAN CHAIN BASED HARDWARE SECURITY

    Get PDF
    Hardware has become a popular target for attackers to hack into any computing and communication system. Starting from the legendary power analysis attacks discovered 20 years ago to the recent Intel Spectre and Meltdown attacks, security vulnerabilities in hardware design have been exploited for malicious purposes. With the emerging Internet of Things (IoT) applications, where the IoT devices are extremely resource constrained, many proven secure but computational expensive cryptography protocols cannot be applied on such devices. Thus there is an urgent need to understand the hardware vulnerabilities and develop cost effective mitigation methods. One established field in the semiconductor and integrated circuit (IC) industry, known as IC test, has the goal of ensuring that fabricated ICs are free of manufacturing defects and perform the required functionalities. Testing is essential to isolate faulty chips from good ones. The concept of design for test (DFT) has been integrated in the commercial IC design and fabrication process for several decades. Scan chain, which provides test engineer access to all the flip flops in the chip through the scan in (SI) and scan out (SO) ports, is the backbone of industrial testing methods and can be found in almost all the modern designs. In addition to IC testing, scan chain has found applications in intellectual property (IP) protection and IC identification. However, attackers can also leverage the controllability and observability of scan chain as a side channel to break systems such as cryptographic chips. This dissertation addresses these two important security problems by proposing (1) a practical scan chain based security primitive for IP protection and (2) a partial scan chain framework that can mitigate all the existing scan based attacks. First, we observe the fact that each D-flip-flop has two output ports, Q and Q’, designed to simplify the logic and has been used to reduce the power consumption for IC test. The availability of both Q and Q’ ports provide the opportunity for IP protection. More specifically, we can generate a digital fingerprint by selecting different connection styles between adjacent scan cells during the design of scan chain. This method has two major advantages: fingerprints are created as a post-silicon procedure and therefore there will be little fabrication overhead; altering the connection style requires the modification of test vectors for each fingerprinted IP and thus enables a non-intrusive fingerprint verification method. This addresses the overhead and detectability problems, two of the most challenging problems of designing practical IP fingerprinting techniques in the past two decades. Combined with the recently developed reconfigurable scan networks (RSNs) that are popular for embedded and IoT devices, we design an IC identification (ID) scheme utilizing the different connection styles. We perform experiments on standard benchmarks to demonstrate that our approach has low design overhead. We also conduct security analysis to show that such fingerprints and IC IDs are robust against various attacks. In the second part of this dissertation, we consider the scan chain side channel attack, which has been reported as one of the most severe side channel attacks to modern secure systems. We argue that the current countermeasures are restricted to the requirement of providing direct SI and SO for testing and thus suffers the vulnerability of leaving this side channel open to the attackers as well. Therefore, we propose a novel public-private partial scan chain based approach with the basic idea of removing the flip flops that store sensitive information from the scan chain. This will eliminate the scan chain side channel, but it also limits IC test. The key contribution in our proposed public-private partial scan chain design is that it can keep the full test coverage while providing security to the scan chain. This is achieved by chaining the removed flip flops into one or more private partial scan chains and adding protections to the SI and SO ports of such chains. Unlike the traditional partial scan design which not only fails to provide full fault coverage, but also incur huge overhead in test time and test vector generation time, we propose a set of techniques to ensure that the desired test vectors can be entered into the system efficiently. These techniques include test vector reordering, test vector reusing, and test vector generation based on a novel finite state machine (FSM) structure we have invented. On the other hand, to enable the test engineers the ability to observe the test output to diagnose the chip while not leaking information to the attackers, we propose two lightweight mechanisms, one based on linear feedback shift register (LFSR) and the other one based on configurable physical unclonable function (PUF). Finally, we discuss a protocol on how in-field test can be realized using our public-private partial scan chain. We conduct experiments with industrial scan design tools to demonstrate that the required hardware in our approach has negligible area overhead and gives full test coverage with reduced test time and does not need to re-generate test vectors. In sum, this dissertation focuses on the role of scan chain, a conventional design for test facility, in hardware security. We show that scan chain features can be leveraged to create practical IP protection techniques including IP watermarking and fingerprinting as well as IC identification and authentication. We also propose a novel public-private partial scan design principle to close the scan chain side channel to the attackers. Through this dissertation work, we demonstrate that it is possible to develop highly practical scan chain based techniques that can benefit both the community of IC test and hardware security

    Area and Energy Optimizations in ASIC Implementations of AES and PRESENT Block Ciphers

    Get PDF
    When small, modern-day devices surface with neoteric features and promise benefits like streamlined business processes, cashierless stores, and autonomous driving, they are all too often accompanied by security risks due to a weak or absent security component. In particular, the lack of data privacy protection is a common concern that can be remedied by implementing encryption. This ensures that data remains undisclosed to unauthorized parties. While having a cryptographic module is often a goal, it is sometimes forfeited because a device's resources do not allow for the conventional cryptographic solutions. Thus, smaller, lower-energy security modules are in demand. Implementing a cipher in hardware as an application-specific integrated circuit (ASIC) will usually achieve better efficiency than alternatives like FPGAs or software, and can help towards goals such as extended battery life and smaller area footprint. The Advanced Encryption Standard (AES) is a block cipher established by the National Institute of Standards and Technology (NIST) in 2001. It has since become the most widely adopted block cipher and is applied in a variety of applications ranging from smartphones to passive RFID tags to high performance microprocessors. PRESENT, published in 2007, is a smaller lightweight block cipher designed for low-power applications. In this study, low-area and low-energy optimizations in ASICs are addressed for AES and PRESENT. In the low-area work, three existing AES encryption cores are implemented, analyzed, and benchmarked using a common fabrication technology (STM 65 nm). The analysis includes an examination of various implementations of internal AES operations and their suitability for different architectural choices. Using our taxonomy of design choices, we designed Quark-AES, a novel 8-bit AES architecture. At 1960 GE, it features a 13% improvement in area and 9% improvement in throughput/area² over the prior smallest design. To illustrate the extent of the variations due to the use of different ASIC libraries, Quark-AES and the three analyzed designs are also synthesized using three additional technologies. Even for the same transistor size, different ASIC libraries produce significantly different area results. To accommodate a variety of applications that seek different levels of tradeoffs in area and throughput, we extend all four designs to 16-bit and 32-bit datawidths. In the low-energy work, round unrolling and glitch filtering are applied together to achieve energy savings. Round unrolling, which applies multiple block cipher rounds in a combinational path, reduces the energy due to registers but increases the glitching energy. Glitch filtering complements round unrolling by reducing the amount of glitches and their associated energy consumption. For unrolled designs of PRESENT and AES, two glitch filtering schemes are assessed. One method uses AND-gates in between combinational rounds while the other used latches. Both methods work by allowing the propagation of signals only after they have stabilized. The experiments assess how energy consumption changes with respect to the degree of unrolling, the glitch filtering scheme, the degree of pipelining, the spacing between glitch filters, and the location of glitch filters when only a limited number of them can be applied due to area constraints. While in PRESENT, the optimal configuration depends on all the variables, in a larger cipher such as AES, the latch-based method consistently offers the most energy savings

    Low-Capture-Power Test Generation for Scan-Based At-Speed Testing

    Get PDF
    Scan-based at-speed testing is a key technology to guarantee timing-related test quality in the deep submicron era. However, its applicability is being severely challenged since significant yield loss may occur from circuit malfunction due to excessive IR drop caused by high power dissipation when a test response is captured. This paper addresses this critical problem with a novel low-capture-power X-filling method of assigning 0\u27s and 1\u27s to unspecified (X) bits in a test cube obtained during ATPG. This method reduces the circuit switching activity in capture mode and can be easily incorporated into any test generation flow to achieve capture power reduction without any area, timing, or fault coverage impact. Test vectors generated with this practical method greatly improve the applicability of scan-based at-speed testing by reducing the risk of test yield lossIEEE International Conference on Test, 2005, 8 November 2005, Austin, TX, US

    Development of Low Power Image Compression Techniques

    Get PDF
    Digital camera is the main medium for digital photography. The basic operation performed by a simple digital camera is, to convert the light energy to electrical energy, then the energy is converted to digital format and a compression algorithm is used to reduce memory requirement for storing the image. This compression algorithm is frequently called for capturing and storing the images. This leads us to develop an efficient compression algorithm which will give the same result as that of the existing algorithms with low power consumption. As a result the new algorithm implemented camera can be used for capturing more images then the previous one. 1) Discrete Cosine Transform (DCT) based JPEG is an accepted standard for lossy compression of still image. Quantisation is mainly responsible for the amount loss in the image quality in the process of lossy compression. A new Energy Quantisation (EQ) method proposed for speeding up the coding and decoding procedure while preserving image qu..

    Dynamic reconfiguration frameworks for high-performance reliable real-time reconfigurable computing

    Get PDF
    The sheer hardware-based computational performance and programming flexibility offered by reconfigurable hardware like Field-Programmable Gate Arrays (FPGAs) make them attractive for computing in applications that require high performance, availability, reliability, real-time processing, and high efficiency. Fueled by fabrication process scaling, modern reconfigurable devices come with ever greater quantities of on-chip resources, allowing a more complex variety of applications to be developed. Thus, the trend is that technology giants like Microsoft, Amazon, and Baidu now embrace reconfigurable computing devices likes FPGAs to meet their critical computing needs. In addition, the capability to autonomously reprogramme these devices in the field is being exploited for reliability in application domains like aerospace, defence, military, and nuclear power stations. In such applications, real-time computing is important and is often a necessity for reliability. As such, applications and algorithms resident on these devices must be implemented with sufficient considerations for real-time processing and reliability. Often, to manage a reconfigurable hardware device as a computing platform for a multiplicity of homogenous and heterogeneous tasks, reconfigurable operating systems (ROSes) have been proposed to give a software look to hardware-based computation. The key requirements of a ROS include partitioning, task scheduling and allocation, task configuration or loading, and inter-task communication and synchronization. Existing ROSes have met these requirements to varied extents. However, they are limited in reliability, especially regarding the flexibility of placing the hardware circuits of tasks on device’s chip area, the problem arising more from the partitioning approaches used. Indeed, this problem is deeply rooted in the static nature of the on-chip inter-communication among tasks, hampering the flexibility of runtime task relocation for reliability. This thesis proposes the enabling frameworks for reliable, available, real-time, efficient, secure, and high-performance reconfigurable computing by providing techniques and mechanisms for reliable runtime reconfiguration, and dynamic inter-circuit communication and synchronization for circuits on reconfigurable hardware. This work provides task configuration infrastructures for reliable reconfigurable computing. Key features, especially reliability-enabling functionalities, which have been given little or no attention in state-of-the-art are implemented. These features include internal register read and write for device diagnosis; configuration operation abort mechanism, and tightly integrated selective-area scanning, which aims to optimize access to the device’s reconfiguration port for both task loading and error mitigation. In addition, this thesis proposes a novel reliability-aware inter-task communication framework that exploits the availability of dedicated clocking infrastructures in a typical FPGA to provide inter-task communication and synchronization. The clock buffers and networks of an FPGA use dedicated routing resources, which are distinct from the general routing resources. As such, deploying these dedicated resources for communication sidesteps the restriction of static routes and allows a better relocation of circuits for reliability purposes. For evaluation, a case study that uses a NASA/JPL spectrometer data processing application is employed to demonstrate the improved reliability brought about by the implemented configuration controller and the reliability-aware dynamic communication infrastructure. It is observed that up to 74% time saving can be achieved for selective-area error mitigation when compared to state-of-the-art vendor implementations. Moreover, an improvement in overall system reliability is observed when the proposed dynamic communication scheme is deployed in the data processing application. Finally, one area of reconfigurable computing that has received insufficient attention is security. Meanwhile, considering the nature of applications which now turn to reconfigurable computing for accelerating compute-intensive processes, a high premium is now placed on security, not only of the device but also of the applications, from loading to runtime execution. To address security concerns, a novel secure and efficient task configuration technique for task relocation is also investigated, providing configuration time savings of up to 32% or 83%, depending on the device; and resource usage savings in excess of 90% compared to state-of-the-art

    Design for pre-bond testability in 3D integrated circuits

    Get PDF
    In this dissertation we propose several DFT techniques specific to 3D stacked IC systems. The goal has explicitly been to create techniques that integrate easily with existing IC test systems. Specifically, this means utilizing scan- and wrapper-based techniques, two foundations of the digital IC test industry. First, we describe a general test architecture for 3D ICs. In this architecture, each tier of a 3D design is wrapped in test control logic that both manages tier test pre-bond and integrates the tier into the large test architecture post-bond. We describe a new kind of boundary scan to provide the necessary test control and observation of the partial circuits, and we propose a new design methodology for test hardcore that ensures both pre-bond functionality and post-bond optimality. We present the application of these techniques to the 3D-MAPS test vehicle, which has proven their effectiveness. Second, we extend these DFT techniques to circuit-partitioned designs. We find that boundary scan design is generally sufficient, but that some 3D designs require special DFT treatment. Most importantly, we demonstrate that the functional partitioning inherent in 3D design can potentially decrease the total test cost of verifying a circuit. Third, we present a new CAD algorithm for designing 3D test wrappers. This algorithm co-designs the pre-bond and post-bond wrappers to simultaneously minimize test time and routing cost. On average, our algorithm utilizes over 90% of the wires in both the pre-bond and post-bond wrappers. Finally, we look at the 3D vias themselves to develop a low-cost, high-volume pre-bond test methodology appropriate for production-level test. We describe the shorting probes methodology, wherein large test probes are used to contact multiple small 3D vias. This technique is an all-digital test method that integrates seamlessly into existing test flows. Our experimental results demonstrate two key facts: neither the large capacitance of the probe tips nor the process variation in the 3D vias and the probe tips significantly hinders the testability of the circuits. Taken together, this body of work defines a complete test methodology for testing 3D ICs pre-bond, eliminating one of the key hurdles to the commercialization of 3D technology.PhDCommittee Chair: Lee, Hsien-Hsin; Committee Member: Bakir, Muhannad; Committee Member: Lim, Sung Kyu; Committee Member: Vuduc, Richard; Committee Member: Yalamanchili, Sudhaka
    corecore