38 research outputs found

    Design methodology and productivity improvement in high speed VLSI circuits

    Get PDF
    2017 Spring.Includes bibliographical references.To view the abstract, please see the full text of the document

    Circuits and Systems Advances in Near Threshold Computing

    Get PDF
    Modern society is witnessing a sea change in ubiquitous computing, in which people have embraced computing systems as an indispensable part of day-to-day existence. Computation, storage, and communication abilities of smartphones, for example, have undergone monumental changes over the past decade. However, global emphasis on creating and sustaining green environments is leading to a rapid and ongoing proliferation of edge computing systems and applications. As a broad spectrum of healthcare, home, and transport applications shift to the edge of the network, near-threshold computing (NTC) is emerging as one of the promising low-power computing platforms. An NTC device sets its supply voltage close to its threshold voltage, dramatically reducing the energy consumption. Despite showing substantial promise in terms of energy efficiency, NTC is yet to see widescale commercial adoption. This is because circuits and systems operating with NTC suffer from several problems, including increased sensitivity to process variation, reliability problems, performance degradation, and security vulnerabilities, to name a few. To realize its potential, we need designs, techniques, and solutions to overcome these challenges associated with NTC circuits and systems. The readers of this book will be able to familiarize themselves with recent advances in electronics systems, focusing on near-threshold computing

    FDSOI Design using Automated Standard-Cell-Grained Body Biasing

    Get PDF
    With the introduction of FDSOI processes at competitive technology nodes, body biasing on an unprecedented scale was made possible. Body biasing influences one of the central transistor characteristics, the threshold voltage. By being able to heighten or lower threshold voltage by more than 100mV, the very physics of transistor switching can be manipulated at run time. Furthermore, as body biasing does not lead to different signal levels, it can be applied much more fine-grained than, e.g., DVFS. With the state of the art mainly focused on combinations of body biasing with DVFS, it has thus ignored granularities unfeasible for DVFS. This thesis fills this gap by proposing body bias domain partitioning techniques and for body bias domain partitionings thereby generated, algorithms that search for body bias assignments. Several different granularities ranging from entire cores to small groups of standard cells were examined using two principal approaches: Designer aided pre-partitioning based determination of body bias domains and a first-time, fully automatized, netlist based approach called domain candidate exploration. Both approaches operate along the lines of activation and timing of standard cell groups. These approaches were evaluated using the example of a Dynamically Reconfigurable Processor (DRP), a highly efficient category of reconfigurable architectures which consists of an array of processing elements and thus offers many opportunities for generalization towards many-core architectures. Finally, the proposed methods were validated by manufacturing a test-chip. Extensive simulation runs as well as the test-chip evaluation showed the validity of the proposed methods and indicated substantial improvements in energy efficiency compared to the state of the art. These improvements were accomplished by the fine-grained partitioning of the DRP design. This method allowed reducing dynamic power through supply voltage levels yielding higher clock frequencies using forward body biasing, while simultaneously reducing static power consumption in unused parts.Die Einführung von FDSOI Prozessen in gegenwärtigen Prozessgrößen ermöglichte die Nutzung von Substratvorspannung in nie zuvor dagewesenem Umfang. Substratvorspannung beeinflusst unter anderem eine zentrale Eigenschaft von Transistoren, die Schwellspannung. Mittels Substratvorspannung kann diese um mehr als 100mV erhöht oder gesenkt werden, was es ermöglicht, die schiere Physik des Schaltvorgangs zu manipulieren. Da weiterhin hiervon der Signalpegel der digitalen Signale unberührt bleibt, kann diese Technik auch in feineren Granularitäten angewendet werden, als z.B. Dynamische Spannungs- und Frequenz Anpassung (Engl. Dynamic Voltage and Frequency Scaling, Abk. DVFS). Da jedoch der Stand der Technik Substratvorspannung hauptsächlich in Kombinationen mit DVFS anwendet, werden feinere Granularitäten, welche für DVFS nicht mehr wirtschaftlich realisierbar sind, nicht berücksichtigt. Die vorliegende Arbeit schließt diese Lücke, indem sie Partitionierungsalgorithmen zur Unterteilung eines Entwurfs in Substratvorspannungsdomänen vorschlägt und für diese hierdurch unterteilten Domänen entsprechende Substratvorspannungen berechnet. Hierzu wurden verschiedene Granularitäten berücksichtigt, von ganzen Prozessorkernen bis hin zu kleinen Gruppen von Standardzellen. Diese Entwürfe wurden dann mit zwei verschiedenen Herangehensweisen unterteilt: Chipdesigner unterstützte, vorpartitionierungsbasierte Bestimmung von Substratvorspannungsdomänen, sowie ein erstmals vollautomatisierter, Netzlisten basierter Ansatz, in dieser Arbeit Domänen Kandidaten Exploration genannt. Beide Ansätze funktionieren nach dem Prinzip der Aktivierung, d.h. zu welchem Zeitpunkt welcher Teil des Entwurfs aktiv ist, sowie der Signallaufzeit durch die entsprechenden Entwurfsteile. Diese Ansätze wurden anhand des Beispiels Dynamisch Rekonfigurierbarer Prozessoren (DRP) evaluiert. DRPs stellen eine Klasse hocheffizienter rekonfigurierbarer Architekturen dar, welche hauptsächlich aus einem Feld von Rechenelementen besteht und dadurch auch zahlreiche Möglichkeiten zur Verallgemeinerung hinsichtlich Many-Core Architekturen zulässt. Schließlich wurden die vorgeschlagenen Methoden in einem Testchip validiert. Alle ermittelten Ergebnisse zeigen im Vergleich zum Stand der Technik drastische Verbesserungen der Energieeffizienz, welche durch die feingranulare Unterteilung in Substratvorspannungsdomänen erzielt wurde. Hierdurch konnten durch die Anwendung von Substratvorspannung höhere Taktfrequenzen bei gleicher Versorgungsspannung erzielt werden, während zeitgleich in zeitlich unkritischen oder ungenutzten Entwurfsteilen die statische Leistungsaufnahme minimiert wurde

    Designing energy-efficient sub-threshold logic circuits using equalization and non-volatile memory circuits using memristors

    Full text link
    The very large scale integration (VLSI) community has utilized aggressive complementary metal-oxide semiconductor (CMOS) technology scaling to meet the ever-increasing performance requirements of computing systems. However, as we enter the nanoscale regime, the prevalent process variation effects degrade the CMOS device reliability. Hence, it is increasingly essential to explore emerging technologies which are compatible with the conventional CMOS process for designing highly-dense memory/logic circuits. Memristor technology is being explored as a potential candidate in designing non-volatile memory arrays and logic circuits with high density, low latency and small energy consumption. In this thesis, we present the detailed functionality of multi-bit 1-Transistor 1-memRistor (1T1R) cell-based memory arrays. We present the performance and energy models for an individual 1T1R memory cell and the memory array as a whole. We have considered TiO2- and HfOx-based memristors, and for these technologies there is a sub-10% difference between energy and performance computed using our models and HSPICE simulations. Using a performance-driven design approach, the energy-optimized TiO2-based RRAM array consumes the least write energy (4.06 pJ/bit) and read energy (188 fJ/bit) when storing 3 bits/cell for 100 nsec write and 1 nsec read access times. Similarly, HfOx-based RRAM array consumes the least write energy (365 fJ/bit) and read energy (173 fJ/bit) when storing 3 bits/cell for 1 nsec write and 200 nsec read access times. On the logic side, we investigate the use of equalization techniques to improve the energy efficiency of digital sequential logic circuits in sub-threshold regime. We first propose the use of a variable threshold feedback equalizer circuit with combinational logic blocks to mitigate the timing errors in digital logic designed in sub-threshold regime. This mitigation of timing errors can be leveraged to reduce the dominant leakage energy by scaling supply voltage or decreasing the propagation delay. At the fixed supply voltage, we can decrease the propagation delay of the critical path in a combinational logic block using equalizer circuits and, correspondingly decrease the leakage energy consumption. For a 8-bit carry lookahead adder designed in UMC 130 nm process, the operating frequency can be increased by 22.87% (on average), while reducing the leakage energy by 22.6% (on average) in the sub-threshold regime. Overall, the feedback equalization technique provides up to 35.4% lower energy-delay product compared to the conventional non-equalized logic. We also propose a tunable adaptive feedback equalizer circuit that can be used with sequential digital logic to mitigate the process variation effects and reduce the dominant leakage energy component in sub-threshold digital logic circuits. For a 64-bit adder designed in 130 nm our proposed approach can reduce the normalized delay variation of the critical path delay from 16.1% to 11.4% while reducing the energy-delay product by 25.83% at minimum energy supply voltage. In addition, we present detailed energy-performance models of the adaptive feedback equalizer circuit. This work serves as a foundation for the design of robust, energy-efficient digital logic circuits in sub-threshold regime

    Max Operation in Statistical Static Timing Analysis on the Non-Gaussian Variation Sources for VLSI Circuits

    Full text link
    As CMOS technology continues to scale down, process variation introduces significant uncertainty in power and performance to VLSI circuits and significantly affects their reliability. If this uncertainty is not properly handled, it may become the bottleneck of CMOS technology improvement. As a result, deterministic analysis is no longer conservative and may result in either overestimation or underestimation of the circuit delay. As we know that Static-Timing Analysis (STA) is a deterministic way of computing the delay imposed by the circuits design and layout. It is based on a predetermined set of possible events of process variations, also called corners of the circuit. Although it is an excellent tool, current trends in process scaling have imposed significant difficulties to STA. Therefore, there is a need for another tool, which can resolve the aforementioned problems, and Statistical Static Timing Analysis (SSTA) has become the frontier research topic in recent years in combating such variation effects. There are two types of SSTA methods, path-based SSTA and block-based SSTA. The goal of SSTA is to parameterize timing characteristics of the timing graph as a function of the underlying sources of process parameters that are modeled as random variables. By performing SSTA, designers can obtain the timing distribution (yield) and its sensitivity to various process parameters. Such information is of tremendous value for both timing sign-off and design optimization for robustness and high profit margins. The block-based SSTA is the most efficient SSTA method in recent years. In block-based SSTA, there are two major atomic operations max and add. The add operation is simple; however, the max operation is much more complex. There are two main challenges in SSTA. The Topological Correlation that emerges from reconvergent paths, these are the ones that originate from a common node and then converge again at another node (reconvergent node). Such correlation complicates the maximum operation. The second challenge is the Spatial Correlation. It arises due to device proximity on the die and gives rise to the problems of modeling delay and arrival time. This dissertation presents statistical Nonlinear and Nonnormals canonical form of timing delay model considering process variation. This dissertation is focusing on four aspects: (1) Statistical timing modeling and analysis; (2) High level circuit synthesis with system level statistical static timing analysis; (3) Architectural implementations of the atomic operations (max and add); and (4) Design methodology. To perform statistical timing modeling and analysis, we first present an efficient and accurate statistical static timing analysis (SSTA) flow for non-linear cell delay model with non-Gaussian variation sources. To achieve system level SSTA we apply statistical timing analysis to high-level synthesis flow, and develop yield driven synthesis framework so that the impact of process variations is taken into account during high-level synthesis. To accomplish architectural implementation, we present the vector thread architecture for max operator to minimize delay and variation. Finally, we present comparison analysis with ISCAS benchmark circuits suites. In the last part of this dissertation, a SSTA design methodology is presented

    Radiation Hardened by Design Methodologies for Soft-Error Mitigated Digital Architectures

    Get PDF
    abstract: Digital architectures for data encryption, processing, clock synthesis, data transfer, etc. are susceptible to radiation induced soft errors due to charge collection in complementary metal oxide semiconductor (CMOS) integrated circuits (ICs). Radiation hardening by design (RHBD) techniques such as double modular redundancy (DMR) and triple modular redundancy (TMR) are used for error detection and correction respectively in such architectures. Multiple node charge collection (MNCC) causes domain crossing errors (DCE) which can render the redundancy ineffectual. This dissertation describes techniques to ensure DCE mitigation with statistical confidence for various designs. Both sequential and combinatorial logic are separated using these custom and computer aided design (CAD) methodologies. Radiation vulnerability and design overhead are studied on VLSI sub-systems including an advanced encryption standard (AES) which is DCE mitigated using module level coarse separation on a 90-nm process with 99.999% DCE mitigation. A radiation hardened microprocessor (HERMES2) is implemented in both 90-nm and 55-nm technologies with an interleaved separation methodology with 99.99% DCE mitigation while achieving 4.9% increased cell density, 28.5 % reduced routing and 5.6% reduced power dissipation over the module fences implementation. A DMR register-file (RF) is implemented in 55 nm process and used in the HERMES2 microprocessor. The RF array custom design and the decoders APR designed are explored with a focus on design cycle time. Quality of results (QOR) is studied from power, performance, area and reliability (PPAR) perspective to ascertain the improvement over other design techniques. A radiation hardened all-digital multiplying pulsed digital delay line (DDL) is designed for double data rate (DDR2/3) applications for data eye centering during high speed off-chip data transfer. The effect of noise, radiation particle strikes and statistical variation on the designed DDL are studied in detail. The design achieves the best in class 22.4 ps peak-to-peak jitter, 100-850 MHz range at 14 pJ/cycle energy consumption. Vulnerability of the non-hardened design is characterized and portions of the redundant DDL are separated in custom and auto-place and route (APR). Thus, a range of designs for mission critical applications are implemented using methodologies proposed in this work and their potential PPAR benefits explored in detail.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201

    Circuit design for embedded memory in low-power integrated circuits

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012.Cataloged from PDF version of thesis.Includes bibliographical references (p. 141-152).This thesis explores the challenges for integrating embedded static random access memory (SRAM) and non-volatile memory-based on ferroelectric capacitor technology-into lowpower integrated circuits. First considered is the impact of process variation in deep-submicron technologies on SRAM, which must exhibit higher density and performance at increased levels of integration with every new semiconductor generation. Techniques to speed up the statistical analysis of physical memory designs by a factor of 100 to 10,000 relative to the conventional Monte Carlo Method are developed. The proposed methods build upon the Importance Sampling simulation algorithm and efficiently explore the sample space of transistor parameter fluctuation. Process variation in SRAM at low-voltage is further investigated experimentally with a 512kb 8T SRAM test chip in 45nm SOI CMOS technology. For active operation, an AC coupled sense amplifier and regenerative global bitline scheme are designed to operate at the limit of on current and off current separation on a single-ended SRAM bitline. The SRAM operates from 1.2 V down to 0.57 V with access times from 400ps to 3.4ns. For standby power, a data retention voltage sensor predicts the mismatch-limited minimum supply voltage without corrupting the contents of the memory. The leakage power of SRAM forces the chip designer to seek non-volatile memory in applications such as portable electronics that retain significant quantities of data over long durations. In this scenario, the energy cost of accessing data must be minimized. This thesis presents a ferroelectric random access memory (FRAM) prototype that addresses the challenges of sensing diminishingly small charge under conditions favorable to low access energy with a time-to-digital sensing scheme. The 1 Mb IT1C FRAM fabricated in 130 nm CMOS operates from 1.5 V to 1.0 V with corresponding access energy from 19.2 pJ to 9.8 pJ per bit. Finally, the computational state of sequential elements interspersed in CMOS logic, also restricts the ability to power gate. To enable simple and fast turn-on, ferroelectric capacitors are integrated into the design of a standard cell register, whose non-volatile operation is made compatible with the digital design flow. A test-case circuit containing ferroelectric registers exhibits non-volatile operation and consumes less than 1.3 pJ per bit of state information and less than 10 clock cycles to save or restore with no minimum standby power requirement in-between active periods.by Masood Qazi.Ph.D
    corecore