15 research outputs found

    Circuit Techniques for Adaptive and Reliable High Performance Computing.

    Full text link
    Increasing power density with process scaling has caused stagnation in the clock speed of modern microprocessors. Accordingly, designers have adopted message passing and shared memory based multicore architectures in order to keep up with the rapidly rising demand for computing throughput. At the same time, applications are not entirely parallel and improving single-thread performance continues to remain critical. Additionally, reliability is also worsening with process scaling, and margining for failures due to process and environmental variations in modern technologies consumes an increasingly large portion of the power/performance envelope. In the wake of multicore computing, reliability of signal synchronization between the cores is also becoming increasingly critical. This forces designers to search for alternate efficient methods to improve compute performance while addressing reliability. Accordingly, this dissertation presents innovative circuit and architectural techniques for variation-tolerance, performance and reliability targeted at datapath logic, signal synchronization and memories. Firstly, a domino logic based design style for datapath logic is presented that uses Adaptive Robustness Tuning (ART) in addition to timing speculation to provide up to 71% performance gains over conventional domino logic in 32bx32b multiplier in 65nm CMOS. Margins are reduced until functionality errors are detected, that are used to guide the tuning. Secondly, for signal synchronization across clock domains, a new class of dynamic logic based synchronizers with single-cycle synchronization latency is presented, where pulses, rather than stable intermediate voltages cause metastability. Such pulses are amplified using skewed inverters to improve mean time between failures by ~1e6x over jamb latches and double flip-flops at 2GHz in 65nm CMOS. Thirdly, a reconfigurable sensing scheme for 6T SRAMs is presented that employs auto-zero calibration and pre-amplification to improve sensing reliability (by up to 1.2 standard deviations of NMOS threshold voltage in 28nm CMOS); this increased reliability is in turn traded for ~42% sensing speedup. Finally, a main memory architecture design methodology to address reliability and power in the context of Exascale computing systems is presented. Based on 3D-stacked DRAMs, the methodology co-optimizes DRAM access energy, refresh power and the increased cost of error resilience, to meet stringent power and reliability constraints.PhDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/107238/1/bharan_1.pd

    A design methodology for robust, energy-efficient, application-aware memory systems

    Get PDF
    Memory design is a crucial component of VLSI system design from area, power and performance perspectives. To meet the increasingly challenging system specifications, architecture, circuit and device level innovations are required for existing memory technologies. Emerging memory solutions are widely explored to cater to strict budgets. This thesis presents design methodologies for custom memory design with the objective of power-performance benefits across specific applications. Taking example of STTRAM (spin transfer torque random access memory) as an emerging memory candidate, the design space is explored to find optimal energy design solution. A thorough thermal reliability study is performed to estimate detection reliability challenges and circuit solutions are proposed to ensure reliable operation. Adoption of the application-specific optimal energy solution is shown to yield considerable energy benefits in a read-heavy application called MBC (memory based computing). Circuit level customizations are studied for the volatile SRAM (static random access memory) memory, which will provide improved energy-delay product (EDP) for the same MBC application. Memory design has to be aware of upcoming challenges from not only the application nature but also from the packaging front. Taking 3D die-folding as an example, SRAM performance shift under die-folding is illustrated. Overall the thesis demonstrates how knowledge of the system and packaging can help in achieving power efficient and high performance memory design.Ph.D

    The 1992 4th NASA SERC Symposium on VLSI Design

    Get PDF
    Papers from the fourth annual NASA Symposium on VLSI Design, co-sponsored by the IEEE, are presented. Each year this symposium is organized by the NASA Space Engineering Research Center (SERC) at the University of Idaho and is held in conjunction with a quarterly meeting of the NASA Data System Technology Working Group (DSTWG). One task of the DSTWG is to develop new electronic technologies that will meet next generation electronic data system needs. The symposium provides insights into developments in VLSI and digital systems which can be used to increase data systems performance. The NASA SERC is proud to offer, at its fourth symposium on VLSI design, presentations by an outstanding set of individuals from national laboratories, the electronics industry, and universities. These speakers share insights into next generation advances that will serve as a basis for future VLSI design

    Straintronics: A Leap towards Ultimate Energy Efficiency of Magnetic Memory and Logic

    Full text link
    After decades of exponential growth of the semiconductor industries, predicted by Moore’s Law, the complementary metal-oxide semiconductor (CMOS) circuits are approaching their end of the road, as the feature sizes reach sub-10nm regimes, leaving electrical engineers with a profusion of design challenges in terms of energy limitations and power density. The latter has left the road for alternative technologies wide open to help CMOS overcome the present challenges. Magnetic random access memories (MRAM) are one of the candidates to assist with aforesaid obstacles. Proposed in the early 90’s, MRAM has been under research and development for decades. The expedition for energy efficient MRAM is carried out by the fact that magnetic logic, potentially, has orders of magnitude lower switching energy compared to a charge-based CMOS logic since, in a nanomagnet, magnetic domains would self-align with each other. Regrettably, conventional methods for switching the state of the cell in an MRAM, field induced magnetization switching (FIMS) and spin transfer torque (STT), use electric current (flow of charges) to switch the state of the magnet, nullifying the energy advantage, stated above. In order to maximize the energy efficiency, the amount of charge required to switch the state of the MTJ should be minimized. To this end, straintronics, as an alternative energy efficient method to FIMS and STT to switch the state of a nanomagnet, is proposed recently. The method states that by combining piezoelectricity and inverse magnetostriction, the magnetization state of the device can flip, within few nano-seconds while reducing the switching energy by orders of magnitude compared to STT and FIMS. This research focuses on analysis, design, modeling, and applications of straintronics-based MTJ. The first goal is to perform an in-depth analysis on the static and dynamic behavior of the device. Next, we are aiming to increase the accuracy of the model by including the effect of temperature and thermal noise on the device’s behavior. The goal of performing such analysis is to create a comprehensive model of the device that predicts both static and dynamic responses of the magnetization to applied stress. The model will be used to interface the device with CMOS controllers and switches in large systems. Next, in an attempt to speed up the simulation of such devices in multi-megabyte memory systems, a liberal model has been developed by analytically approximating a solution to the magnetization dynamics, which should be numerically solved otherwise. The liberal model demonstrates more than two orders of magnitude speed improvement compared to the conventional numerical models. Highlighting the applications of the straintronics devices by combining such devices with peripheral CMOS circuitry is another goal of the research. Design of a proof-of-concept 2 kilo-bit nonvolatile straintronics-based memory was introduced in our recent work. To highlight the potential applications of the straintronics device, beyond data storage, the use of the principle in ultra-fast yet low power true random number generation and neuron/synapse design for artificial neural networks have been investigated. Lastly, in an attempt to investigate the practicality of the straintronics principle, the effect of process variations and interface imperfections on the switching behavior of the magnetization is investigated. The results reveal the destructive aftermath of fabrication imperfections on the switching pattern of the device, leaving careful pulse-shaping, alternative topologies, or combination with STT as the last resorts for successful strain-based magnetization switching.PHDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/137010/1/barangi_1.pd

    A Solder-Defined Computer Architecture for Backdoor and Malware Resistance

    Get PDF
    This research is about securing control of those devices we most depend on for integrity and confidentiality. An emerging concern is that complex integrated circuits may be subject to exploitable defects or backdoors, and measures for inspection and audit of these chips are neither supported nor scalable. One approach for providing a “supply chain firewall” may be to forgo such components, and instead to build central processing units (CPUs) and other complex logic from simple, generic parts. This work investigates the capability and speed ceiling when open-source hardware methodologies are fused with maker-scale assembly tools and visible-scale final inspection. The author has designed, and demonstrated in simulation, a 36-bit CPU and protected memory subsystem that use only synchronous static random access memory (SRAM) and trivial glue logic integrated circuits as components. The design presently lacks preemptive multitasking, ability to load firmware into the SRAMs used as logic elements, and input/output. Strategies are presented for adding these missing subsystems, again using only SRAM and trivial glue logic. A load-store architecture is employed with four clock cycles per instruction. Simulations indicate that a clock speed of at least 64 MHz is probable, corresponding to 16 million instructions per second (16 MIPS), despite the architecture containing no microprocessors, field programmable gate arrays, programmable logic devices, application specific integrated circuits, or other purchased complex logic. The lower speed, larger size, higher power consumption, and higher cost of an “SRAM minicomputer,” compared to traditional microcontrollers, may be offset by the fully open architecture—hardware and firmware—along with more rigorous user control, reliability, transparency, and auditability of the system. SRAM logic is also particularly well suited for building arithmetic logic units, and can implement complex operations such as population count, a hash function for associative arrays, or a pseudorandom number generator with good statistical properties in as few as eight clock cycles per 36-bit word processed. 36-bit unsigned multiplication can be implemented in software in 47 instructions or fewer (188 clock cycles). A general theory is developed for fast SRAM parallel multipliers should they be needed

    Miniature high dynamic range time-resolved CMOS SPAD image sensors

    Get PDF
    Since their integration in complementary metal oxide (CMOS) semiconductor technology in 2003, single photon avalanche diodes (SPADs) have inspired a new era of low cost high integration quantum-level image sensors. Their unique feature of discerning single photon detections, their ability to retain temporal information on every collected photon and their amenability to high speed image sensor architectures makes them prime candidates for low light and time-resolved applications. From the biomedical field of fluorescence lifetime imaging microscopy (FLIM) to extreme physical phenomena such as quantum entanglement, all the way to time of flight (ToF) consumer applications such as gesture recognition and more recently automotive light detection and ranging (LIDAR), huge steps in detector and sensor architectures have been made to address the design challenges of pixel sensitivity and functionality trade-off, scalability and handling of large data rates. The goal of this research is to explore the hypothesis that given the state of the art CMOS nodes and fabrication technologies, it is possible to design miniature SPAD image sensors for time-resolved applications with a small pixel pitch while maintaining both sensitivity and built -in functionality. Three key approaches are pursued to that purpose: leveraging the innate area reduction of logic gates and finer design rules of advanced CMOS nodes to balance the pixel’s fill factor and processing capability, smarter pixel designs with configurable functionality and novel system architectures that lift the processing burden off the pixel array and mediate data flow. Two pathfinder SPAD image sensors were designed and fabricated: a 96 × 40 planar front side illuminated (FSI) sensor with 66% fill factor at 8.25μm pixel pitch in an industrialised 40nm process and a 128 × 120 3D-stacked backside illuminated (BSI) sensor with 45% fill factor at 7.83μm pixel pitch. Both designs rely on a digital, configurable, 12-bit ripple counter pixel allowing for time-gated shot noise limited photon counting. The FSI sensor was operated as a quanta image sensor (QIS) achieving an extended dynamic range in excess of 100dB, utilising triple exposure windows and in-pixel data compression which reduces data rates by a factor of 3.75×. The stacked sensor is the first demonstration of a wafer scale SPAD imaging array with a 1-to-1 hybrid bond connection. Characterisation results of the detector and sensor performance are presented. Two other time-resolved 3D-stacked BSI SPAD image sensor architectures are proposed. The first is a fully integrated 5-wire interface system on chip (SoC), with built-in power management and off-focal plane data processing and storage for high dynamic range as well as autonomous video rate operation. Preliminary images and bring-up results of the fabricated 2mm² sensor are shown. The second is a highly configurable design capable of simultaneous multi-bit oversampled imaging and programmable region of interest (ROI) time correlated single photon counting (TCSPC) with on-chip histogram generation. The 6.48μm pitch array has been submitted for fabrication. In-depth design details of both architectures are discussed

    Pixel design and characterization of high-performance tandem OLED microdisplays

    Get PDF
    Organic Light-Emitting Diode (OLED) microdisplays - miniature Electronic Displays comprising a sandwich of organic light emitting diode over a substrate containing CMOS circuits designed to function as an active matrix backplane – were first reported in the 1990s and, since then, have advanced to the mainstream. The smaller dimensions and higher performance of CMOS circuit elements compared to that of equivalent thin film transistors implemented in technologies for large OLED display panels offer a distinct advantage for ultra-miniature display screens. Conventional OLED has suffered from lifetime degradation at high brightness and high current density. Recently, tandem-structure OLED devices have been developed using charge generation layers to implement two or more OLED units in a single stack. They can achieve higher brightness at a given current density. The combination of emissive-nature, fast response, medium to high luminance, low power consumption and appropriate lifetime makes OLED a favoured candidate for near-to-eye systems. However, it is also challenging to evaluate the pixel level optical response of OLED microdisplays as the pixel pitch is extremely small and relative low light output per pixel. Advanced CMOS Single Photon Avalanche Diode (SPAD) technology is progressing rapidly and is being deployed in a wide range of applications. It is also suggested as a replacement for photomultiplier tube (PMT) for photonic experiments that require high sensitivity. CMOS SPAD is a potential tool for better and cheaper display optical characterizations. In order to incorporate the novel tandem structure OLED within the computer aided design (CAD) flow of microdisplays, we have developed an equivalent circuit model that accurately describes the tandem OLED electrical characteristics. Specifically, new analogue pulse width modulation (PWM) pixel circuit designs have been implemented and fabricated in small arrays for test and characterization purposes. We report on the design and characterization of these novel pixel drive circuits for OLED microdisplays. Our drive circuits are designed to allow a state-of-the-art sub-pixel pitch of around 5 μm and implemented in 130 nm CMOS. A performance comparison with a previous published analogue PWM pixel is reported. Moreover, we have employed CMOS SPAD sensors to perform detailed optical measurements on the OLED microdisplay pixels at very high sampling rate (50 kHz, 10 μs exposure), very low light level (2×10-4 cd/m2) and over a very wide dynamic range (83 dB) of luminance. This offers a clear demonstration of the potential of the CMOS SPAD technology to reveal hitherto obscure details of the optical characteristics of individual and groups of OLED pixels and thereby in display metrology in general. In summary, there are three key contributions to knowledge reported in this thesis. The first is a new equivalent circuit model specifically for tandem structure OLED. The model is verified to provide accurately illustrate the electrical response of the tandem OLED with different materials. The second is the novel analogue PWM pixel achieve a 5μm sub-pixel pitch with 2.4 % pixel-to-pixel variation. The third is the new application and successful characterization experiment of OLED microdisplay pixels with SPAD sensors. It revealed the OLED pixel overshoot behaviour with a QIS SPAD sensor

    Solid State Circuits Technologies

    Get PDF
    The evolution of solid-state circuit technology has a long history within a relatively short period of time. This technology has lead to the modern information society that connects us and tools, a large market, and many types of products and applications. The solid-state circuit technology continuously evolves via breakthroughs and improvements every year. This book is devoted to review and present novel approaches for some of the main issues involved in this exciting and vigorous technology. The book is composed of 22 chapters, written by authors coming from 30 different institutions located in 12 different countries throughout the Americas, Asia and Europe. Thus, reflecting the wide international contribution to the book. The broad range of subjects presented in the book offers a general overview of the main issues in modern solid-state circuit technology. Furthermore, the book offers an in depth analysis on specific subjects for specialists. We believe the book is of great scientific and educational value for many readers. I am profoundly indebted to the support provided by all of those involved in the work. First and foremost I would like to acknowledge and thank the authors who worked hard and generously agreed to share their results and knowledge. Second I would like to express my gratitude to the Intech team that invited me to edit the book and give me their full support and a fruitful experience while working together to combine this book

    GSI Scientific Report 2007 [GSI Report 2008-1]

    Get PDF
    corecore