30 research outputs found

    Doctor of Philosophy

    Get PDF
    dissertationAsynchronous design has a very promising potential even though it has largely received a cold reception from industry. Part of this reluctance has been due to the necessity of custom design languages and computer aided design (CAD) flows to design, optimize, and validate asynchronous modules and systems. Next generation asynchronous flows should support modern programming languages (e.g., Verilog) and application specific integrated circuits (ASIC) CAD tools. They also have to support multifrequency designs with mixed synchronous (clocked) and asynchronous (unclocked) designs. This work presents a novel relative timing (RT) based methodology for generating multifrequency designs using synchronous CAD tools and flows. Synchronous CAD tools must be constrained for them to work with asynchronous circuits. Identification of these constraints and characterization flow to automatically derive the constraints is presented. The effect of the constraints on the designs and the way they are handled by the synchronous CAD tools are analyzed and reported in this work. The automation of the generation of asynchronous design templates and also the constraint generation is an important problem. Algorithms for automation of reset addition to asynchronous circuits and power and/or performance optimizations applied to the circuits using logical effort are explored thus filling an important hole in the automation flow. Constraints representing cyclic asynchronous circuits as directed acyclic graphs (DAGs) to the CAD tools is necessary for applying synchronous CAD optimizations like sizing, path delay optimizations and also using static timing analysis (STA) on these circuits. A thorough investigation for the requirements of cycle cutting while preserving timing paths is presented with an algorithm to automate the process of generating them. A large set of designs for 4 phase handshake protocol circuit implementations with early and late data validity are characterized for area, power and performance. Benchmark circuits with automated scripts to generate various configurations for better understanding of the designs are proposed and analyzed. Extension to the methodology like addition of scan insertion using automatic test pattern generation (ATPG) tools to add testability of datapath in bundled data asynchronous circuit implementations and timing closure approaches are also described. Energy, area, and performance of purely asynchronous circuits and circuits with mixed synchronous and asynchronous blocks are explored. Results indicate the benefits that can be derived by generating circuits with asynchronous components using this methodology

    Fault-Independent Test-Generation for Software-Based Self-Testing

    Get PDF
    Software-based self-test (SBST) is being widely used in both manufacturing and in-the-field testing of processor-based devices and Systems-on-Chips. Unfortunately, the stuck-at fault model is increasingly inadequate to match the new and different types of defects in the most recent semiconductor technologies, while the explicit and separate targeting of every fault model in SBST is cumbersome due to the high complexity of the test-generation process, the lack of automation tools, and the high CPU-intensity of the fault-simulation process. Moreover, defects in advanced semiconductor technologies are not always covered by the most commonly used fault-models, and the probability of defect-escapes increases even more. To overcome these shortcomings we propose the first fault-independent method for generating software-based self-test procedures. The proposed method is almost fully automated, it offers high coverage of non- modeled faults by means of a novel SBST-oriented probabilistic metric, and it is very fast as it omits the time-consuming test- generation/fault-simulation processes. Extensive experiments on the OpenRISC OR1200 processor show the advantages of the proposed method

    Test set generation and optimisation using evolutionary algorithms and cubical calculus.

    Get PDF
    As the complexity of modern day integrated circuits rises, many of the challenges associated with digital testing rise exponentially. VLSI technology continues to advance at a rapid pace, in accordance with Moore's Law, posing evermore complex, NP-complete problems for the test community. The testing of ICs currently accounts for approximately a third of the overall design costs and according to the Semiconductor Industry Association, the per-transistor test cost will soon exceed the per-transistor production cost. Given the need to test ICs of ever-increasing complexity and to contain the cost of test, the problems of test pattern generation, testability analysis and test set minimisation continue to provide formidable challenges for the research community. This thesis presents original work in these three areas. Firstly, a new method is presented for generating test patterns for multiple output combinational circuits based on the Boolean difference method and cubical calculus. The Boolean difference method has been largely overlooked in automatic test pattern generation algorithms due to its cumbersome, algebraic nature. It is shown that cubical calculus provides an elegant and economical technique for solving Boolean difference equations. Formal mathematical techniques are presented involving the Boolean difference and cubical calculus providing, a test pattern generation method that dispenses with the need for costly circuit simulations. The methods provide the basis for test generation algorithms which are suitable for computer implementation. Secondly, some of the core test pattern generation computations outlined above also provide the basis of a new method for computing testability measures such as controllability and observability. This method is effectively a very economical spin-off of the test pattern generation process using Boolean differences and cubical calculus.The third and largest part of this thesis introduces a new test set minimization algorithm, GA-MITS, based on an evolutionary optimization algorithm. This novel approach applies a genetic algorithm to find minimal or near minimal test sets while maintaining a given fault coverage. The algorithm is designed as a postprocessor to minimise test sets that have been previously generated by an ATPG system and is thus considered a static approach to the test set minimisation problem. It is shown empirically that GA-MITS is remarkably successful in minimizing test sets generated for the ISCAS-85 benchmark circuits and hence potentially capable of reducing the production costs of realistic digital circuits

    Rethinking Watermark: Providing Proof of IP Ownership in Modern SoCs

    Get PDF
    Intellectual property (IP) cores are essential to creating modern system-on-chips (SoCs). Protecting the IPs deployed in modern SoCs has become more difficult as the IP houses have been established across the globe over the past three decades. The threat posed by IP piracy and overuse has been a topic of research for the past decade or so and has led to creation of a field called watermarking. IP watermarking aims of detecting unauthorized IP usage by embedding excess, nonfunctional circuitry into the SoC. Unfortunately, prior work has been built upon assumptions that cannot be met within the modern SoC design and verification processes. In this paper, we first provide an extensive overview of the current state-of-the-art IP watermarking. Then, we challenge these dated assumptions and propose a new path for future effective IP watermarking approaches suitable for today\u27s complex SoCs in which IPs are deeply embedded

    Methodology and Ecosystem for the Design of a Complex Network ASIC

    Full text link
    Performance of HPC systems has risen steadily. While the 10 Petaflop/s barrier has been breached in the year 2011 the next large step into the exascale era is expected sometime between the years 2018 and 2020. The EXTOLL project will be an integral part in this venture. Originally designed as a research project on FPGA basis it will make the transition to an ASIC to improve its already excelling performance even further. This transition poses many challenges that will be presented in this thesis. Nowadays, it is not enough to look only at single components in a system. EXTOLL is part of complex ecosystem which must be optimized overall since everything is tightly interwoven and disregarding some aspects can cause the whole system either to work with limited performance or even to fail. This thesis examines four different aspects in the design hierarchy and proposes efficient solutions or improvements for each of them. At first it takes a look at the design implementation and the differences between FPGA and ASIC design. It introduces a methodology to equip all on-chip memory with ECC logic automatically without the user’s input and in a transparent way so that the underlying code that uses the memory does not have to be changed. In the next step the floorplanning process is analyzed and an iterative solution is worked out based on physical and logical constraints of the EXTOLL design. Besides, a work flow for collaborative design is presented that allows multiple users to work on the design concurrently. The third part concentrates on the high-speed signal path from the chip to the connector and how it is affected by technological limitations. All constraints are analyzed and a package layout for the EXTOLL chip is proposed that is seen as the optimal solution. The last part develops a cost model for wafer and package level test and raises technological concerns that will affect the testing methodology. In order to run testing internally it proposes the development of a stand-alone test platform that is able to test packaged EXTOLL chips in every aspect

    Approximate computing: An integrated cross-layer framework

    Get PDF
    A new design approach, called approximate computing (AxC), leverages the flexibility provided by intrinsic application resilience to realize hardware or software implementations that are more efficient in energy or performance. Approximate computing techniques forsake exact (numerical or Boolean) equivalence in the execution of some of the application’s computations, while ensuring that the output quality is acceptable. While early efforts in approximate computing have demonstrated great potential, they consist of ad hoc techniques applied to a very narrow set of applications, leaving in question the applicability of approximate computing in a broader context. The primary objective of this thesis is to develop an integrated cross-layer approach to approximate computing, and to thereby establish its applicability to a broader range of applications. The proposed framework comprises of three key components: (i) At the circuit level, systematic approaches to design approximate circuits, or circuits that realize a slightly modified function with improved efficiency, (ii) At the architecture level, utilize approximate circuits to build programmable approximate processors, and (iii) At the software level, methods to apply approximate computing to machine learning classifiers, which represent an important class of applications that are being utilized across the computing spectrum. Towards this end, the thesis extends the state-of-the-art in approximate computing in the following important directions. Synthesis of Approximate Circuits: First, the thesis proposes a rigorous framework for the automatic synthesis of approximate circuits , which are the hardware building blocks of approximate computing platforms. Designing approximate circuits involves making judicious changes to the function implemented by the circuit such that its hardware complexity is lowered without violating the specified quality constraint. Inspired by classical approaches to Boolean optimization in logic synthesis, the thesis proposes two synthesis tools called SALSA and SASIMI that are general, i.e., applicable to any given circuit and quality specification. The framework is further extended to automatically design quality configurable circuits , which are approximate circuits with the capability to reconfigure their quality at runtime. Over a wide range of arithmetic circuits, complex modules and complete datapaths, the circuits synthesized using the proposed framework demonstrate significant benefits in area and energy. Programmable AxC Processors: Next, the thesis extends approximate computing to the realm of programmable processors by introducing the concept of quality programmable processors (QPPs). A key principle of QPPs is that the notion of quality is explicitly codified in their HW/SW interface i.e., the instruction set. Instructions in the ISA are extended with quality fields, enabling software to specify the accuracy level that must be met during their execution. The micro-architecture is designed with hardware mechanisms to understand these quality specifications and translate them into energy savings. As a first embodiment of QPPs, the thesis presents a quality programmable 1D/2D vector processor QP-Vec, which contains a 3-tiered hierarchy of processing elements. Based on an implementation of QP-Vec with 289 processing elements, energy benefits up to 2.5X are demonstrated across a wide range of applications. Software and Algorithms for AxC: Finally, the thesis addresses the problem of applying approximate computing to an important class of applications viz. machine learning classifiers such as deep learning networks. To this end, the thesis proposes two approaches—AxNN and scalable effort classifiers. Both approaches leverage domain- specific insights to transform a given application to an energy-efficient approximate version that meets a specified application output quality. In the context of deep learning networks, AxNN adapts backpropagation to identify neurons that contribute less significantly to the network’s accuracy, approximating these neurons (e.g., by using lower precision), and incrementally re-training the network to mitigate the impact of approximations on output quality. On the other hand, scalable effort classifiers leverage the heterogeneity in the inherent classification difficulty of inputs to dynamically modulate the effort expended by machine learning classifiers. This is achieved by building a chain of classifiers of progressively growing complexity (and accuracy) such that the number of stages used for classification scale with input difficulty. Scalable effort classifiers yield substantial energy benefits as a majority of the inputs require very low effort in real-world datasets. In summary, the concepts and techniques presented in this thesis broaden the applicability of approximate computing, thus taking a significant step towards bringing approximate computing to the mainstream. (Abstract shortened by ProQuest.

    Divided We Stand, United We Fall: Security Analysis of Some SCA+SIFA Countermeasures Against SCA-Enhanced Fault Template Attacks

    Get PDF
    Protection against Side-Channel (SCA) and Fault Attacks (FA) requires two classes of countermeasures to be simultaneously embedded in a cryptographic implementation. It has already been shown that a straightforward combination of SCA and FA countermeasures are vulnerable against FAs, such as Statistical Ineffective Fault Analysis (SIFA) and Fault Template Attacks (FTA). Consequently, new classes of countermeasures have been proposed which prevent against SIFA, and also includes masking for SCA protection. While they are secure against SIFA and SCA individually, one important question is whether the security claim still holds at the presence of a combined SCA and FA adversary. Security against combined attacks is, however, desired, as countermeasures for both threats are included in such implementations. In this paper, we show that some of the recently proposed combined SIFA and SCA countermeasures fall prey against combined attacks. To this end, we enhance the FTA attacks by considering side-channel information during fault injection. The success of the proposed attacks stems from some non-trivial fault propagation properties of S-Boxes, which remains unexplored in the original FTA proposal. The proposed attacks are validated on an open-source software implementation of Keccak with SIFA-protected χ5 S-Box with laser fault injection and power measurement, and a hardware implementation of a SIFA-protected χ3 S-Box through gate-level power trace simulation. Finally, we discuss some mitigation strategies to strengthen existing countermeasures

    Digital Centric Multi-Gigabit SerDes Design and Verification

    Get PDF
    Advances in semiconductor manufacturing still lead to ever decreasing feature sizes and constantly allow higher degrees of integration in application specific integrated circuits (ASICs). Therefore the bandwidth requirements on the external interfaces of such systems on chips (SoC) are steadily growing. Yet, as the number of pins on these ASICs is not increasing in the same pace - known as pin limitation - the bandwidth per pin has to be increased. SerDes (Serializer/Deserializer) technology, which allows to transfer data serially at very high data rates of 25Gbps and more is a key technology to overcome pin limitation and exploit the computing power that can be achieved in todays SoCs. As such SerDes blocks together with the digital logic interfacing them form complex mixed signal systems, verification of performance and functional correctness is very challenging. In this thesis a novel mixed-signal design methodology is proposed, which tightly couples model and implementation in order to ensure consistency throughout the design cycles and hereby accelerate the overall implementation flow. A tool flow that has been developed is presented, which integrates well into state of the art electronic design automation (EDA) environments and enables the usage of this methodology in practice. Further, the design space of todays high-speed serial links is analyzed and an architecture is proposed, which pushes complexity into the digital domain in order to achieve robustness, portability between manufacturing processes and scaling with advanced node technologies. The all digital phase locked loop (PLL) and clock data recovery (CDR), which have been developed are described in detail. The developed design flow was used for the implementation of the SerDes architecture in a 28nm silicon process and proved to be indispensable for future projects
    corecore