39 research outputs found

    Ultra-Low Power Circuit Design for Cubic-Millimeter Wireless Sensor Platform.

    Full text link
    Modern daily life is surrounded by smaller and smaller computing devices. As Bell’s Law predicts, the research community is now looking at tiny computing platforms and mm3-scale sensor systems are drawing an increasing amount of attention since they can create a whole new computing environment. Designing mm3-scale sensor nodes raises various circuit and system level challenges and we have addressed and proposed novel solutions for many of these challenges to create the first complete 1.0mm3 sensor system including a commercial microprocessor. We demonstrate a 1.0mm3 form factor sensor whose modular die-stacked structure allows maximum volume utilization. Low power I2C communication enables inter-layer serial communication without losing compatibility to standard I2C communication protocol. A dual microprocessor enables concurrent computation for the sensor node control and measurement data processing. A multi-modal power management unit allowed energy harvesting from various harvesting sources. An optical communication scheme is provided for initial programming, synchronization and re-programming after recovery from battery discharge. Standby power reduction techniques are investigated and a super cut-off power gating scheme with an ultra-low power charge pump reduces the standby power of logic circuits by 2-19× and memory by 30%. Different approaches for designing low-power memory for mm3-scale sensor nodes are also presented in this work. A dual threshold voltage gain cell eDRAM design achieves the lowest eDRAM retention power and a 7T SRAM design based on hetero-junction tunneling transistors reduces the standby power of SRAM by 9-19× with only 15% area overhead. We have paid special attention to the timer for the mm3-scale sensor systems and propose a multi-stage gate-leakage-based timer to limit the standard deviation of the error in hourly measurement to 196ms and a temperature compensation scheme reduces temperature dependency to 31ppm/°C. These techniques for designing ultra-low power circuits for a mm3-scale sensor enable implementation of a 1.0mm3 sensor node, which can be used as a skeleton for future micro-sensor systems in variety of applications. These microsystems imply the continuation of the Bell’s Law, which also predicts the massive deployment of mm3-scale computing systems and emergence of even smaller and more powerful computing systems in the near future.Ph.D.Electrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/91438/1/sori_1.pd

    Subthreshold SRAM Design for Energy Efficient Applications in Nanometric CMOS Technologies

    Get PDF
    Embedded SRAM circuits are vital components in a modern system on chip (SOC) that can occupy up to 90% of the total area. Therefore, SRAM circuits heavily affect SOC performance, reliability, and yield. In addition, most of the SRAM bitcells are in standby mode and significantly contribute to the total leakage current and leakage power consumption. The aggressive demand in portable devices and billions of connected sensor networks requires long battery life. Therefore, careful design of SRAM circuits with minimal power consumption is in high demand. Reducing the power consumption is mainly achieved by reducing the power supply voltage in the idle mode. However, simply reducing the supply voltage imposes practical limitations on SRAM circuits such as reduced static noise margin, poor write margin, reduced number of cells per bitline, and reduced bitline sensing margin that might cause read/write failures. In addition, the SRAM bitcell has contradictory requirements for read stability and writability. Improving the read stability can cause difficulties in a write operation or vice versa. In this thesis, various techniques for designing subthreshold energy-efficient SRAM circuits are proposed. The proposed techniques include improvement in read margin and write margin, speed improvement, energy consumption reduction, new bitcell architecture and utilizing programmable wordline boosting. A programmable wordline boosting technique is exploited on a conventional 6T SRAM bitcell to improve the operational speed. In addition, wordline boosting can reduce the supply voltage while maintaining the operational frequency. The reduction of the supply voltage allows the memory macro to operate with reduced power consumption. To verify the design, a 16-kb SRAM was fabricated using the TSMC 65 nm CMOS technology. Measurement results show that the maximum operational frequency increases up to 33.3% when wordline boosting is applied. Besides, the supply voltage can be reduced while maintaining the same frequency. This allows reducing the energy consumption to be reduced by 22.2%. The minimum energy consumption achieved is 0.536 fJ/b at 400 mV. Moreover, to improve the read margin, a 6T bitcell SRAM with a PMOS access transistor is proposed. Utilizing a PMOS access transistor results in lower zero level degradation, and hence higher read stability. In addition, the access transistor connected to the internal node holding V DD acts as a stabilizer and counterbalances the effect of zero level degradation. In order to improve the writability, wordline boosting is exploited. Wordline boosting also helps to compensate for the lower speed of the PMOS access transistor compared to a NMOS transistor. To verify our design, a 2kb SRAM is fabricated in the TSMC 65 nm CMOS technology. Measurement results show that the maximum operating frequency of the test chip is at 3.34 MHz at 290 mV. The minimum energy consumption is measured as 1.1 fJ/b at 400 mV

    ULTRALOW-POWER, LOW-VOLTAGE DIGITAL CIRCUITS FOR BIOMEDICAL SENSOR NODES

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Interconnect and Memory Design for Intelligent Mobile System

    Full text link
    Technology scaling has driven the transistor to a smaller area, higher performance and lower power consuming which leads us into the mobile and edge computing era. However, the benefits of technology scaling are diminishing today, as the wire delay and energy scales far behind that of the logics, which makes communication more expensive than computation. Moreover, emerging data centric algorithms like deep learning have a growing demand on SRAM capacity and bandwidth. High access energy and huge leakage of the large on-chip SRAM have become the main limiter of realizing an energy efficient low power smart sensor platform. This thesis presents several architecture and circuit solutions to enable intelligent mobile systems, including voltage scalable interconnect scheme, Compute-In-Memory (CIM), low power memory system from edge deep learning processor and an ultra-low leakage stacked voltage domain SRAM for low power smart image signal processor (ISP). Four prototypes are implemented for demonstration and verification. The first two seek the solutions to the slow and high energy global on-chip interconnect: the first prototype proposes a reconfigurable self-timed regenerator based global interconnect scheme to achieve higher performance and energy-efficiency in wide voltage range, while the second one presents a non Von Neumann architecture, a hybrid in-/near-memory Compute SRAM (CRAM), to address the locality issue. The next two works focus on low-power low-leakage SRAM design for Intelligent sensors. The third prototype is a low power memory design for a deep learning processor with 270KB custom SRAM and Non-Uniform Memory Access architecture. The fourth prototype is an ultra-low leakage SRAM for motion-triggered low power smart imager sensor system with voltage domain stacking and a novel array swapping mechanism. The work presented in this dissertation exploits various optimizations in both architecture level (exploiting temporal and spatial locality) and circuit customization to overcome the main challenges in making extremely energy-efficient battery-powered intelligent mobile devices. The impact of the work is significant in the era of Internet-of-Things (IoT) and the age of AI when the mobile computing systems get ubiquitous, intelligent and longer battery life, powered by these proposed solutions.PHDElectrical and Computer EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/155232/1/jiwang_1.pd

    Replica Technique for Adaptive Refresh Timing of Gain-Cell-Embedded DRAM

    Get PDF
    Gain cells have recently been shown to be a viable alternative to static random access memory in low-power applications due to their low leakage currents and high density. The primary component of power consumption in these arrays is the dynamic power consumed during periodic refresh operations. Refresh timing is traditionally set according to a worst-case evaluation of retention time under extreme process variations, and worst-case access statistics, leading to frequent power-hungry refresh cycles. In this brief we present a replica technique for automatically tracking the retention time of a gain-cell-embedded dynamic-random-access-memory macrocell according to process variations and operating statistics, thereby reducing the data retention power of the array. A 2-kb array was designed and fabricated in a mature 0.18-mu m CMOS process, appropriate for integration in ultralow power applications, such as biomedical sensors. Measurements show efficient retention time tracking across a range of supply voltages and access statistics, lowering the refresh frequency by more than 5x, as compared with traditional worst-case design

    Study and development of low power consumption SRAMs on 28 nm FD-SOI CMOS process

    Get PDF
    Since analog circuit designs in CMOS nanometer (< 90 nm) nodes can be substantially affected by manufacturing process variations, circuit performance becomes more challenging to achieve efficient solutions by using analytical models. Extensive simulations are thus commonly required to provide a high yield. On the other hand, due to the fact that the classical bulk MOS structure is reaching scaling limits (< 32 nm), alternative approaches are being developed as successors, such as fully depleted silicon-oninsulator (FD-SOI), Multigate MOSFET, FinFETs, among others, and new design techniques emerge by taking advantage of the improved features of these devices. This thesis focused on the development of analytical expressions for the major performance parameters of the SRAM cache implemented in 28 nm FD-SOI CMOS, mainly to explore the transistor dimensions at low computational cost, thereby producing efficient designs in terms of energy consumption, speed and yield. By taking advantage of both low computational cost and close agreement results of the developed models, in this thesis we were able to propose a non-traditional sizing procedure for the simple 6T-SRAM cell, that unlike the traditional thin-cell design, transistor lengths are used as a design variable in order to reduce the static leakage. The single-P-well (SPW) structure in combination with reverse-body-biasing (RBB) technique were used to achieve a better balance between P-type and N-type transistors. As a result, we developed a 128 kB SRAM cache, whose post-layout simulations show that the circuit consumes an average energy per operation of 0.604 pJ/word-access (64 I/O bits) at supply voltage of 0.45 V and operation frequency of 40 MHz. The total chip area of the 128 kB SRAM cache is 0.060 mm2 .O projeto de circuitos analogicos em processos nanométricos CMOS ( < 90 nm) per substancialmente afetado pelas variacões do processo de fabricacão, sendo cada vez mais desafiador para os projetistas alcançar soluções eficientes no desempenho dos circuitos mediante o uso de modelos analíticos. Simulacões extensas com alto custo com- putacional sao normalmente requeridas para providenciar um correto funcionamento do circuito. Por outro lado, devido ao fato que a estrutura bulk-CMOS esta alcançando seus limites de escala (< 32 nm), outros transistores foram desenvolvidos como sucessores, tais como o fully depleted silicon-on-insulator (FD-SOI), Multigate MOSFET, entre outros, surgindo novas tecnicas de projeto que utilizam as características aprimoradas destes dispositivos. Dessa forma, esta tese de doutorado se foca no desenvolvimento de modelos analíticos dos parametros mais importantes do cache SRAM implementado em processo CMOS FD-SOI de 28 nm, principalmente para explorar as dimensõoes dos transistores com baixo custo computacional, e assim produzir solucões eficientes em termos de consumo de energia, velocidade e rendimento. Aproveitando o baixo custo computacional e a alta concordância dos modelos analíticos, nesta tese fomos capazes de propor um dimensionamento nao tradicional para a célula de memória 6T-SRAM, em que diferentemente é do classico dimensionamento "thin-cell”, os comprimentos dos transistores são utilizados como variável de projeto com o fim de reduzir o consumo estático de corrente. A estrutura single-P-well (SPW), combinada com a técnica reverse-body-biasing (RBB) foram utilizadas para alcançar um melhor balanço entre as correntes específicas dos transistores do tipo P e N

    Architecture and Circuit Design Optimization for Compute-In-Memory

    Get PDF
    The objective of the proposed research is to optimize computing-in-memory (CIM) design for accelerating Deep Neural Network (DNN) algorithms. As compute peripheries such as analog-to-digital converter (ADC) introduce significant overhead in CIM inference design, the research first focuses on the circuit optimization for inference acceleration and proposes a resistive random access memory (RRAM) based ADC-free in-memory compute scheme. We comprehensively explore the trade-offs involving different types of ADCs and investigate a new ADC design especially suited for the CIM, which performs the analog shift-add for multiple weight significance bits, improving the throughput and energy efficiency under similar area constraints. Furthermore, we prototype an ADC-free CIM inference chip design with a fully-analog data processing manner between sub-arrays, which can significantly improve the hardware performance over the conventional CIM designs and achieve near-software classification accuracy on ImageNet and CIFAR-10/-100 dataset. Secondly, the research focuses on hardware support for CIM on-chip training. To maximize hardware reuse of CIM weight stationary dataflow, we propose the CIM training architectures with the transpose weight mapping strategy. The cell design and periphery circuitry are modified to efficiently support bi-directional compute. A novel solution of signed number multiplication is also proposed to handle the negative input in backpropagation. Finally, we propose an SRAM-based CIM training architecture and comprehensively explore the system-level hardware performance for DNN on-chip training based on silicon measurement results.Ph.D

    Ultra-low-power SRAM design in high variability advanced CMOS

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.Cataloged from PDF version of thesis.Includes bibliographical references (p. 163-181).Embedded SRAMs are a critical component in modern digital systems, and their role is preferentially increasing. As a result, SRAMs strongly impact the overall power, performance, and area, and, in order to manage these severely constrained trade-offs, they must be specially designed for target applications. Highly energy-constrained systems (e.g. implantable biomedical devices, multimedia handsets, etc.) are an important class of applications driving ultra-low-power SRAMs. This thesis analyzes the energy of an SRAM sub-array. Since supply- and threshold-voltage have a strong effect, targets for these are established in order to optimize energy. Despite the heavy emphasis on leakage-energy, analysis of a high-density 256x256 sub-array in 45nm LP CMOS points to two necessary optimizations: (1) aggressive supply-voltage reduction (in addition to Vt elevation), and (2) performance enhancement. Important SRAM metrics, including read/write/hold-margin and read-current, are also investigated to identify trade-offs of these optimizations. Based on the need to lower supply-voltage, a 0.35V 256kb SRAM is demonstrated in 65nm LP CMOS. It uses an 8T bit-cell with peripheral circuit-assists to improve write-margin and bit-line leakage. Additionally, redundancy, to manage the increasing impact of variability in the periphery, is proposed to improve the area-offset trade-off of sense-amplifiers, demonstrating promise for highly advanced technology nodes. Based on the need to improve performance, which is limited by density constraints, a 64kb SRAM, using an offset-compensating sense-amplifier, is demonstrated in 45nm LP CMOS with high-density 0.25[mu]m2 bit-cells.(cont.) The sense-amplifier is regenerative, but non -strobed, overcoming timing uncertainties limiting performance, and it is single-ended, for compatibility with 8T cells. Compared to a conventional strobed sense-amplifier, it achieves 34% improvement in worst-case access-time and 4x improvement in the standard deviation of the access-time.by Naveen Verma.Ph.D
    corecore