49 research outputs found

    Exceeding Conservative Limits: A Consolidated Analysis on Modern Hardware Margins

    Get PDF
    Modern large-scale computing systems (data centers, supercomputers, cloud and edge setups and high-end cyber-physical systems) employ heterogeneous architectures that consist of multicore CPUs, general-purpose many-core GPUs, and programmable FPGAs. The effective utilization of these architectures poses several challenges, among which a primary one is power consumption. Voltage reduction is one of the most efficient methods to reduce power consumption of a chip. With the galloping adoption of hardware accelerators (i.e., GPUs and FPGAs) in large datacenters and other large-scale computing infrastructures, a comprehensive evaluation of the safe voltage reduction levels for each different chip can be employed for efficient reduction of the total power. We present a survey of recent studies in voltage margins reduction at the system level for modern CPUs, GPUs and FPGAs. The pessimistic voltage guardbands inserted by the silicon vendors can be exploited in all devices for significant power savings. On average, voltage reduction can reach 12% in multicore CPUs, 20% in manycore GPUs and 39% in FPGAs.Comment: Accepted for publication in IEEE Transactions on Device and Materials Reliabilit

    Cross-Layer Resiliency Modeling and Optimization: A Device to Circuit Approach

    Get PDF
    The never ending demand for higher performance and lower power consumption pushes the VLSI industry to further scale the technology down. However, further downscaling of technology at nano-scale leads to major challenges. Reduced reliability is one of them, arising from multiple sources e.g. runtime variations, process variation, and transient errors. The objective of this thesis is to tackle unreliability with a cross layer approach from device up to circuit level

    Semiconductor Memory Applications in Radiation Environment, Hardware Security and Machine Learning System

    Get PDF
    abstract: Semiconductor memory is a key component of the computing systems. Beyond the conventional memory and data storage applications, in this dissertation, both mainstream and eNVM memory technologies are explored for radiation environment, hardware security system and machine learning applications. In the radiation environment, e.g. aerospace, the memory devices face different energetic particles. The strike of these energetic particles can generate electron-hole pairs (directly or indirectly) as they pass through the semiconductor device, resulting in photo-induced current, and may change the memory state. First, the trend of radiation effects of the mainstream memory technologies with technology node scaling is reviewed. Then, single event effects of the oxide based resistive switching random memory (RRAM), one of eNVM technologies, is investigated from the circuit-level to the system level. Physical Unclonable Function (PUF) has been widely investigated as a promising hardware security primitive, which employs the inherent randomness in a physical system (e.g. the intrinsic semiconductor manufacturing variability). In the dissertation, two RRAM-based PUF implementations are proposed for cryptographic key generation (weak PUF) and device authentication (strong PUF), respectively. The performance of the RRAM PUFs are evaluated with experiment and simulation. The impact of non-ideal circuit effects on the performance of the PUFs is also investigated and optimization strategies are proposed to solve the non-ideal effects. Besides, the security resistance against modeling and machine learning attacks is analyzed as well. Deep neural networks (DNNs) have shown remarkable improvements in various intelligent applications such as image classification, speech classification and object localization and detection. Increasing efforts have been devoted to develop hardware accelerators. In this dissertation, two types of compute-in-memory (CIM) based hardware accelerator designs with SRAM and eNVM technologies are proposed for two binary neural networks, i.e. hybrid BNN (HBNN) and XNOR-BNN, respectively, which are explored for the hardware resource-limited platforms, e.g. edge devices.. These designs feature with high the throughput, scalability, low latency and high energy efficiency. Finally, we have successfully taped-out and validated the proposed designs with SRAM technology in TSMC 65 nm. Overall, this dissertation paves the paths for memory technologies’ new applications towards the secure and energy-efficient artificial intelligence system.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201

    Techniques for Aging, Soft Errors and Temperature to Increase the Reliability of Embedded On-Chip Systems

    Get PDF
    This thesis investigates the challenge of providing an abstracted, yet sufficiently accurate reliability estimation for embedded on-chip systems. In addition, it also proposes new techniques to increase the reliability of register files within processors against aging effects and soft errors. It also introduces a novel thermal measurement setup that perspicuously captures the infrared images of modern multi-core processors

    Body of Knowledge for Graphics Processing Units (GPUs)

    Get PDF
    Graphics Processing Units (GPU) have emerged as a proven technology that enables high performance computing and parallel processing in a small form factor. GPUs enhance the traditional computer paradigm by permitting acceleration of complex mathematics and providing the capability to perform weighted calculations, such as those in artificial intelligence systems. Despite the performance enhancements provided by this type of microprocessor, there exist tradeoffs in regards to reliability and radiation susceptibility, which may impact mission success. This report provides an insight into GPU architecture and its potential applications in space and other similar markets. It also discusses reliability, qualification, and radiation considerations for testing GPUs

    Statistical modelling of nano CMOS transistors with surface potential compact model PSP

    Get PDF
    The development of a statistical compact model strategy for nano-scale CMOS transistors is presented in this thesis. Statistical variability which arises from the discreteness of charge and granularity of matter plays an important role in scaling of nano CMOS transistors especially in sub 50nm technology nodes. In order to achieve reasonable performance and yield in contemporary CMOS designs, the statistical variability that affects the circuit/system performance and yield must be accurately represented by the industry standard compact models. As a starting point, predictive 3D simulation of an ensemble of 1000 microscopically different 35nm gate length transistors is carried out to characterize the impact of statistical variability on the device characteristics. PSP, an advanced surface potential compact model that is selected as the next generation industry standard compact model, is targeted in this study. There are two challenges in development of a statistical compact model strategy. The first challenge is related to the selection of a small subset of statistical compact model parameters from the large number of compact model parameters. We propose a strategy to select 7 parameters from PSP to capture the impact of statistical variability on current-voltage characteristics. These 7 parameters are used in statistical parameter extraction with an average RMS error of less than 2.5% crossing the whole operation region of the simulated transistors. Moreover, the accuracy of statistical compact model extraction strategy in reproducing the MOSFET electrical figures of merit is studied in detail. The results of the statistical compact model extraction are used for statistical circuit simulation of a CMOS inverter under different input-output conditions and different number of statistical parameters. The second challenge in the development of statistical compact model strategy is associated with statistical generation of parameters preserving the distribution and correlation of the directly extracted parameters. By using advanced statistical methods such as principal component analysis and nonlinear power method, the accuracy of parameter generation is evaluated and compared to directly extracted parameter sets. Finally, an extension of the PSP statistical compact model strategy to different channel width/length devices is presented. The statistical trends of parameters and figures of merit versus channel width/length are characterized

    High-Performance Energy-Efficient and Reliable Design of Spin-Transfer Torque Magnetic Memory

    Get PDF
    In this dissertation new computing paradigms, architectures and design philosophy are proposed and evaluated for adopting the STT-MRAM technology as highly reliable, energy efficient and fast memory. For this purpose, a novel cross-layer framework from the cell-level all the way up to the system- and application-level has been developed. In these framework, the reliability issues are modeled accurately with appropriate fault models at different abstraction levels in order to analyze the overall failure rates of the entire memory and its Mean Time To Failure (MTTF) along with considering the temperature and process variation effects. Design-time, compile-time and run-time solutions have been provided to address the challenges associated with STT-MRAM. The effectiveness of the proposed solutions is demonstrated in extensive experiments that show significant improvements in comparison to state-of-the-art solutions, i.e. lower-power, higher-performance and more reliable STT-MRAM design

    A statistical study of time dependent reliability degradation of nanoscale MOSFET devices

    Get PDF
    Charge trapping at the channel interface is a fundamental issue that adversely affects the reliability of metal-oxide semiconductor field effect transistor (MOSFET) devices. This effect represents a new source of statistical variability as these devices enter the nano-scale era. Recently, charge trapping has been identified as the dominant phenomenon leading to both random telegraph noise (RTN) and bias temperature instabilities (BTI). Thus, understanding the interplay between reliability and statistical variability in scaled transistors is essential to the implementation of a ‘reliability-aware’ complementary metal oxide semiconductor (CMOS) circuit design. In order to investigate statistical reliability issues, a methodology based on a simulation flow has been developed in this thesis that allows a comprehensive and multi-scale study of charge-trapping phenomena and their impact on transistor and circuit performance. The proposed methodology is accomplished by using the Gold Standard Simulations (GSS) technology computer-aided design (TCAD)-based design tool chain co-optimization (DTCO) tool chain. The 70 nm bulk IMEC MOSFET and the 22 nm Intel fin-shape field effect transistor (FinFET) have been selected as targeted devices. The simulation flow starts by calibrating the device TCAD simulation decks against experimental measurements. This initial phase allows the identification of the physical structure and the doping distributions in the vertical and lateral directions based on the modulation in the inversion layer’s depth as well as the modulation of short channel effects. The calibration is further refined by taking into account statistical variability to match the statistical distributions of the transistors’ figures of merit obtained by measurements. The TCAD simulation investigation of RTN and BTI phenomena is then carried out in the presence of several sources of statistical variability. The study extends further to circuit simulation level by extracting compact models from the statistical TCAD simulation results. These compact models are collected in libraries, which are then utilised to investigate the impact of the BTI phenomenon, and its interaction with statistical variability, in a six transistor-static random access memory (6T-SRAM) cell. At the circuit level figures of merit, such as the static noise margin (SNM), and their statistical distributions are evaluated. The focus of this thesis is to highlight the importance of accounting for the interaction between statistical variability and statistical reliability in the simulation of advanced CMOS devices and circuits, in order to maintain predictivity and obtain a quantitative agreement with a measured data. The main findings of this thesis can be summarised by the following points: Based on the analysis of the results, the dispersions of VT and ΔVT indicate that a change in device technology must be considered, from the planar MOSFET platform to a new device architecture such as FinFET or SOI. This result is due to the interplay between a single trap charge and statistical variability, which has a significant impact on device operation and intrinsic parameters as transistor dimensions shrink further. The ageing process of transistors can be captured by using the trapped charge density at the interface and observing the VT shift. Moreover, using statistical analysis one can highlight the extreme transistors and their probable effect on the circuit or system operation. The influence of the passgate (PG) transistor in a 6T-SRAM cell gives a different trend of the mean static noise margin

    Voltage stacking for near/sub-threshold operation

    Get PDF
    corecore