26 research outputs found

    Reliability and Data Analysis of Wearout Mechanisms for Circuits

    Get PDF
    The objective of this research is to develop methodologies for the failure analysis of circuits, as well as investigate the factors for accelerating testing for front-end-of-line time-dependent dielectric breakdown (FEOL TDDB). The separation of wearout mechanisms for circuits will be investigated, and the identification of failure modes for the failure samples will be analyzed. SRAMs and ring oscillators will be used to study the failure modes. The systematic and random errors for online monitoring of SRAMS will also be examined. Furthermore, the testing plans for acceleration testing will also be explored for ring oscillators. Error reduction through sampling will also be used to find the best testing conditions for accelerated testing. This work provides a way for engineers to better understand aging monitoring of circuits, and to design better testing to collect failure data.Ph.D

    SPRING: A Sparsity-Aware Reduced-Precision Monolithic 3D CNN Accelerator Architecture for Training and Inference

    Full text link
    CNNs outperform traditional machine learning algorithms across a wide range of applications. However, their computational complexity makes it necessary to design efficient hardware accelerators. Most CNN accelerators focus on exploring dataflow styles that exploit computational parallelism. However, potential performance speedup from sparsity has not been adequately addressed. The computation and memory footprint of CNNs can be significantly reduced if sparsity is exploited in network evaluations. To take advantage of sparsity, some accelerator designs explore sparsity encoding and evaluation on CNN accelerators. However, sparsity encoding is just performed on activation or weight and only in inference. It has been shown that activation and weight also have high sparsity levels during training. Hence, sparsity-aware computation should also be considered in training. To further improve performance and energy efficiency, some accelerators evaluate CNNs with limited precision. However, this is limited to the inference since reduced precision sacrifices network accuracy if used in training. In addition, CNN evaluation is usually memory-intensive, especially in training. In this paper, we propose SPRING, a SParsity-aware Reduced-precision Monolithic 3D CNN accelerator for trainING and inference. SPRING supports both CNN training and inference. It uses a binary mask scheme to encode sparsities in activation and weight. It uses the stochastic rounding algorithm to train CNNs with reduced precision without accuracy loss. To alleviate the memory bottleneck in CNN evaluation, especially in training, SPRING uses an efficient monolithic 3D NVM interface to increase memory bandwidth. Compared to GTX 1080 Ti, SPRING achieves 15.6X, 4.2X and 66.0X improvements in performance, power reduction, and energy efficiency, respectively, for CNN training, and 15.5X, 4.5X and 69.1X improvements for inference

    Energy efficient core designs for upcoming process technologies

    Get PDF
    Energy efficiency has been a first order constraint in the design of micro processors for the last decade. As Moore's law sunsets, new technologies are being actively explored to extend the march in increasing the computational power and efficiency. It is essential for computer architects to understand the opportunities and challenges in utilizing the upcoming process technology trends in order to design the most efficient processors. In this work, we consider three process technology trends and propose core designs that are best suited for each of the technologies. The process technologies are expected to be viable over a span of timelines. We first consider the most popular method currently available to improve the energy efficiency, i.e. by lowering the operating voltage. We make key observations regarding the limiting factors in scaling down the operating voltage for general purpose high performance processors. Later, we propose our novel core design, ScalCore, one that can work in high performance mode at nominal Vdd, and in a very energy-efficient mode at low Vdd. The resulting core design can operate at much lower voltages providing higher parallel performance while consuming lower energy. While lowering Vdd improves the energy efficiency, CMOS devices are fundamentally limited in their low voltage operation. Therefore, we next consider an upcoming device technology -- Tunneling Field-Effect Transistors (TFETs), that is expected to supplement CMOS device technology in the near future. TFETs can attain much higher energy efficiency than CMOS at low voltages. However, their performance saturates at high voltages and, therefore, cannot entirely replace CMOS when high performance is needed. Ideally, we desire a core that is as energy-efficient as TFET and provides as much performance as CMOS. To reach this goal, we characterize the TFET device behavior for core design and judiciously integrate TFET units, CMOS units in a single core. The resulting core, called HetCore, can provide very high energy efficiency while limiting the slowdown when compared to a CMOS core. Finally, we analyze Monolithic 3D (M3D) integration technology that is widely considered to be the only way to integrate more transistors on a chip. We present the first analysis of the architectural implications of using M3D for core design and show how to partition the core across different layers. We also address one of the key challenges in realizing the technology, namely, the top layer performance degradation. We propose a critical path based partitioning for logic stages and asymmetric bit/port partitioning for storage stages. The result is a core that performs nearly as well as a core without any top layer slowdown. When compared to a 2D baseline design, an M3D core not only provides much higher performance, it also reduces the energy consumption at the same time. In summary, this thesis addresses one of the fundamental challenges in computer architecture -- overcoming the fact that CMOS is not scaling anymore. As we increase the computing power on a single chip, our ability to power the entire chip keeps decreasing. This thesis proposes three solutions aimed at solving this problem over different timelines. Across all our solutions, we improve energy efficiency without compromising the performance of the core. As a result, we are able to operate twice as many cores with in the same power budget as regular cores, significantly alleviating the problem of dark silicon

    A hierarchical optimization engine for nanoelectronic systems using emerging device and interconnect technologies

    Get PDF
    A fast and efficient hierarchical optimization engine was developed to benchmark and optimize various emerging device and interconnect technologies and system-level innovations at the early design stage. As the semiconductor industry approaches sub-20nm technology nodes, both devices and interconnects are facing severe physical challenges. Many novel device and interconnect concepts and system integration techniques are proposed in the past decade to reinforce or even replace the conventional Si CMOS technology and Cu interconnects. To efficiently benchmark and optimize these emerging technologies, a validated system-level design methodology is developed based on the compact models from all hierarchies, starting from the bottom material-level, to the device- and interconnect-level, and to the top system-level models. Multiple design parameters across all hierarchies are co-optimized simultaneously to maximize the overall chip throughput instead of just the intrinsic delay or energy dissipation of the device or interconnect itself. This optimization is performed under various constraints such as the power dissipation, maximum temperature, die size area, power delivery noise, and yield. For the device benchmarking, novel graphen PN junction devices and InAs nanowire FETs are investigated for both high-performance and low-power applications. For the interconnect benchmarking, a novel local interconnect structure and hybrid Al-Cu interconnect architecture are proposed, and emerging multi-layer graphene interconnects are also investigated, and compared with the conventional Cu interconnects. For the system-level analyses, the benefits of the systems implemented with 3D integration and heterogeneous integration are analyzed. In addition, the impact of the power delivery noise and process variation for both devices and interconnects are quantified on the overall chip throughput.Ph.D

    Design considerations for a new generation of SiPMs with unprecedented timing resolution

    Full text link
    The potential of photon detectors to achieve precise timing information is of increasing importance in many domains, PET and CT scanners in medical imaging and particle physics detectors, amongst others. The goal to increase by an order of magnitude the sensitivity of PET scanners and to deliver, via time-of-flight (TOF), true space points for each event, as well as the constraints set by future particle accelerators require a further leap in time resolution of scintillator-based ionizing radiation detectors, reaching eventually a few picoseconds resolution for sub MeV energy deposits. In spite of the impressive progress made in the last decade by several manufacturers, the Single Photon Time Resolution (SPTR) of SiPMs is still in the range of 70-120ps FWHM, whereas a value of 10ps or even less would be desirable. Such a step requires a break with traditional methods and the development of novel technologies. The possibility of combining the extraordinary potential of nanophotonics with new approaches offered by modern microelectronics and 3D electronic integration opens novel perspectives for the development of a new generation of metamaterial-based SiPMs with unprecedented photodetection efficiency and timing resolution.Comment: 16 pages, 6 figures, submitted to JINS

    Characterization of self-heating effects and assessment of its impact on reliability in finfet technology

    Get PDF
    The systematically growing power (heat) dissipation in CMOS transistors with each successive technology node is reaching levels which could impact its reliable operation. The emergence of technologies such as bulk/SOI FinFETs has dramatically confined the heat in the device channel due to its vertical geometry and it is expected to further exacerbate with gate-all-around transistors. This work studies heat generation in the channel of semiconductor devices and measures its dissipation by means of wafer level characterization and predictive thermal simulation. The experimental work is based on several existing device thermometry techniques to which additional layout improvements are made in state of the art bulk FinFET and SOI FinFET 14nm technology nodes. The sensors produce excellent matching results which are confirmed through TCAD thermal simulation, differences between sensor types are quantified and error bars on measurements are established. The lateral heat transport measurements determine that heat from the source is mostly dissipated at a distance of 1µm and 1.5µm in bulk FinFET and SOI FinFET, respectively. Heat additivity is successfully confirmed to prove and highlight the fact that the whole system needs to be considered when performing thermal analysis. Furthermore, an investigation is devoted to study self-heating with different layout densities by varying the number of fins and fingers per active region (RX). Fin thermal resistance is measured at different ambient temperatures to show its variation of up to 70% between -40°C to 175°C. Therefore, the Si fin has a more dominant effect in heat transport and its varying thermal conductivity should be taken into account. The effect of ambient temperature on self-heating measurement is confirmed by supplying heat through thermal chuck and adjacent heater devices themselves. Motivation for this work is the continuous evolution of the transistor geometry and use of exotic materials, which in the recent technology nodes made heat removal more challenging. This poses reliability and performance concerns. Therefore, this work studies the impact of self-heating on reliability testing at DC conditions as well as realistic CMOS logic operating (AC) conditions. Front-end-of-line (FEOL) reliability mechanisms, such as hot carrier injection (HCI) and non-uniform time dependent dielectric breakdown (TDDB), are studied to show that self-heating effects can impact measurement results and recommendations are given on how to mitigate them. By performing an HCI stress at moderate bias conditions, this dissertation shows that the laborious techniques of heat subtraction are no longer necessary. Self-heating is also studied at more realistic device switching conditions by utilizing ring oscillators with several densities and stage counts to show that self-heating is considerably lower compared to constant voltage stress conditions and degradation is not distinguishable

    Static random-access memory designs based on different FinFET at lower technology node (7nm)

    Get PDF
    Title from PDF of title page viewed January 15, 2020Thesis advisor: Masud H ChowdhuryVitaIncludes bibliographical references (page 50-57)Thesis (M.S.)--School of Computing and Engineering. University of Missouri--Kansas City, 2019The Static Random-Access Memory (SRAM) has a significant performance impact on current nanoelectronics systems. To improve SRAM efficiency, it is important to utilize emerging technologies to overcome short-channel effects (SCE) of conventional CMOS. FinFET devices are promising emerging devices that can be utilized to improve the performance of SRAM designs at lower technology nodes. In this thesis, I present detail analysis of SRAM cells using different types of FinFET devices at 7nm technology. From the analysis, it can be concluded that the performance of both 6T and 8T SRAM designs are improved. 6T SRAM achieves a 44.97% improvement in the read energy compared to 8T SRAM. However, 6T SRAM write energy degraded by 3.16% compared to 8T SRAM. Read stability and write ability of SRAM cells are determined using Static Noise Margin and N- curve methods. Moreover, Monte Carlo simulations are performed on the SRAM cells to evaluate process variations. Simulations were done in HSPICE using 7nm Asymmetrical Underlap FinFET technology. The quasiplanar FinFET structure gained considerable attention because of the ease of the fabrication process [1] – [4]. Scaling of technology have degraded the performance of CMOS designs because of the short channel effects (SCEs) [5], [6]. Therefore, there has been upsurge in demand for FinFET devices for emerging market segments including artificial intelligence and cloud computing (AI) [8], [9], Internet of Things (IoT) [10] – [13] and biomedical [17] –[18] which have their own exclusive style of design. In recent years, many Underlapped FinFET devices were proposed to have better control of the SCEs in the sub-nanometer technologies [3], [4], [19] – [33]. Underlap on either side of the gate increases effective channel length as seen by the charge carriers. Consequently, the source-to-drain tunneling probability is improved. Moreover, edge direct tunneling leakage components can be reduced by controlling the electric field at the gate-drain junction . There is a limitation on the extent of underlap on drain or source sides because the ION is lower for larger underlap. Additionally, FinFET based designs have major width quantization issue. The width of a FinFET device increases only in quanta of silicon fin height (HFIN) [4]. The width quantization issue becomes critical for ratioed designs like SRAMs, where proper sizing of the transistors is essential for fault-free operation. FinFETs based on Design/Technology Co-Optimization (DTCO_F) approach can overcome these issues [38]. DTCO_F follows special design rules, which provides the specifications for the standard SRAM cells with special spacing rules and low leakages. The performances of 6T SRAM designs implemented by different FinFET devices are compared for different pull-up, pull down and pass gate transistor (PU: PD:PG) ratios to identify the best FinFET device for high speed and low power SRAM applications. Underlapped FinFETs (UF) and Design/Technology Co-Optimized FinFETs (DTCO_F) are used for the design and analysis. It is observed that with the PU: PD:PG ratios of 1:1:1 and 1:5:2 for the UF-SRAMs the read energy has degraded by 3.31% and 48.72% compared to the DTCO_F-SRAMs, respectively. However, the read energy with 2:5:2 ratio has improved by 32.71% in the UF-SRAM compared to the DTCO_F-SRAMs. The write energy with 1:1:1 configuration has improved by 642.27% in the UF-SRAM compared to the DTCO_F-SRAM. On the other hand, the write energy with 1:5:2 and 2:5:2 configurations have degraded by 86.26% and 96% in the UF-SRAMs compared to the DTCO_F-SRAMs. The stability and reliability of different SRAMs are also evaluated for 500mV supply. From the analysis, it can be concluded that Asymmetrical Underlapped FinFET is better for high-speed applications and DTCO FinFET for low power applications.Introduction -- Next generation high performance device: FinFET -- FinFET based SRAM bitcell designs -- Benchmarking of UF-SRAMs and DTCO-F-SRAMS -- Collaborative project -- Internship experience at INTEL and Marvell Semiconductor -- Conclusion and future wor

    Integrated Circuits Parasitic Capacitance Extraction Using Machine Learning and its Application to Layout Optimization

    Get PDF
    The impact of parasitic elements on the overall circuit performance keeps increasing from one technology generation to the next. In advanced process nodes, the parasitic effects dominate the overall circuit performance. As a result, the accuracy requirements of parasitic extraction processes significantly increased, especially for parasitic capacitance extraction. Existing parasitic capacitance extraction tools face many challenges to cope with such new accuracy requirements that are set by semiconductor foundries (\u3c 5% error). Although field-solver methods can meet such requirements, they are very slow and have a limited capacity. The other alternative is the rule-based parasitic capacitance extraction methods, which are faster and have a high capacity; however, they cannot consistently provide good accuracy as they use a pre-characterized library of capacitance formulas that cover a limited number of layout patterns. On the other hand, the new parasitic extraction accuracy requirements also added more challenges on existing parasitic-aware routing optimization methods, where simplified parasitic models are used to optimize layouts. This dissertation provides new solutions for interconnect parasitic capacitance extraction and parasitic-aware routing optimization methodologies in order to cope with the new accuracy requirements of advanced process nodes as follows. First, machine learning compact models are developed in rule-based extractors to predict parasitic capacitances of cross-section layout patterns efficiently. The developed models mitigate the problems of the pre-characterized library approach, where each compact model is designed to extract parasitic capacitances of cross-sections of arbitrary distributed metal polygons that belong to a specific set of metal layers (i.e., layer combination) efficiently. Therefore, the number of covered layout patterns significantly increased. Second, machine learning compact models are developed to predict parasitic capacitances of middle-end-of-line (MEOL) layers around FINFETs and MOSFETs. Each compact model extracts parasitic capacitances of 3D MEOL patterns of a specific device type regardless of its metal polygons distribution. Therefore, the developed MEOL models can replace field-solvers in extracting MEOL patterns. Third, a novel accuracy-based hybrid parasitic capacitance extraction method is developed. The proposed hybrid flow divides a layout into windows and extracts the parasitic capacitances of each window using one of three parasitic capacitance extraction methods that include: 1) rule-based; 2) novel deep-neural-networks-based; and 3) field-solver methods. This hybrid methodology uses neural-networks classifiers to determine an appropriate extraction method for each window. Moreover, as an intermediate parasitic capacitance extraction method between rule-based and field-solver methods, a novel deep-neural-networks-based extraction method is developed. This intermediate level of accuracy and speed is needed since using only rule-based and field-solver methods (for hybrid extraction) results in using field-solver most of the time for any required high accuracy extraction. Eventually, a parasitic-aware layout routing optimization and analysis methodology is implemented based on an incremental parasitic extraction and a fast optimization methodology. Unlike existing flows that do not provide a mechanism to analyze the impact of modifying layout geometries on a circuit performance, the proposed methodology provides novel sensitivity circuit models to analyze the integrity of signals in layout routes. Such circuit models are based on an accurate matrix circuit representation, a cost function, and an accurate parasitic sensitivity extraction. The circuit models identify critical parasitic elements along with the corresponding layout geometries in a certain route, where they measure the sensitivity of a route’s performance to corresponding layout geometries very fast. Moreover, the proposed methodology uses a nonlinear programming technique to optimize problematic routes with pre-determined degrees of freedom using the proposed circuit models. Furthermore, it uses a novel incremental parasitic extraction method to extract parasitic elements of modified geometries efficiently, where the incremental extraction is used as a part of the routing optimization process to improve the optimization runtime and increase the optimization accuracy
    corecore