6 research outputs found

    Circuit design for embedded memory in low-power integrated circuits

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012.Cataloged from PDF version of thesis.Includes bibliographical references (p. 141-152).This thesis explores the challenges for integrating embedded static random access memory (SRAM) and non-volatile memory-based on ferroelectric capacitor technology-into lowpower integrated circuits. First considered is the impact of process variation in deep-submicron technologies on SRAM, which must exhibit higher density and performance at increased levels of integration with every new semiconductor generation. Techniques to speed up the statistical analysis of physical memory designs by a factor of 100 to 10,000 relative to the conventional Monte Carlo Method are developed. The proposed methods build upon the Importance Sampling simulation algorithm and efficiently explore the sample space of transistor parameter fluctuation. Process variation in SRAM at low-voltage is further investigated experimentally with a 512kb 8T SRAM test chip in 45nm SOI CMOS technology. For active operation, an AC coupled sense amplifier and regenerative global bitline scheme are designed to operate at the limit of on current and off current separation on a single-ended SRAM bitline. The SRAM operates from 1.2 V down to 0.57 V with access times from 400ps to 3.4ns. For standby power, a data retention voltage sensor predicts the mismatch-limited minimum supply voltage without corrupting the contents of the memory. The leakage power of SRAM forces the chip designer to seek non-volatile memory in applications such as portable electronics that retain significant quantities of data over long durations. In this scenario, the energy cost of accessing data must be minimized. This thesis presents a ferroelectric random access memory (FRAM) prototype that addresses the challenges of sensing diminishingly small charge under conditions favorable to low access energy with a time-to-digital sensing scheme. The 1 Mb IT1C FRAM fabricated in 130 nm CMOS operates from 1.5 V to 1.0 V with corresponding access energy from 19.2 pJ to 9.8 pJ per bit. Finally, the computational state of sequential elements interspersed in CMOS logic, also restricts the ability to power gate. To enable simple and fast turn-on, ferroelectric capacitors are integrated into the design of a standard cell register, whose non-volatile operation is made compatible with the digital design flow. A test-case circuit containing ferroelectric registers exhibits non-volatile operation and consumes less than 1.3 pJ per bit of state information and less than 10 clock cycles to save or restore with no minimum standby power requirement in-between active periods.by Masood Qazi.Ph.D

    Challenges and Directions for Low-Voltage SRAM

    Full text link

    Design and analysis of SRAMs for energy harvesting systems

    Get PDF
    PhD ThesisAt present, the battery is employed as a power source for wide varieties of microelectronic systems ranging from biomedical implants and sensor net-works to portable devices. However, the battery has several limitations and incurs many challenges for the majority of these systems. For instance, the design considerations of implantable devices concern about the battery from two aspects, the toxic materials it contains and its lifetime since replacing the battery means a surgical operation. Another challenge appears in wire-less sensor networks, where hundreds or thousands of nodes are scattered around the monitored environment and the battery of each node should be maintained and replaced regularly, nonetheless, the batteries in these nodes do not all run out at the same time. Since the introduction of portable systems, the area of low power designs has witnessed extensive research, driven by the industrial needs, towards the aim of extending the lives of batteries. Coincidentally, the continuing innovations in the field of micro-generators made their outputs in the same range of several portable applications. This overlap creates a clear oppor-tunity to develop new generations of electronic systems that can be powered, or at least augmented, by energy harvesters. Such self-powered systems benefit applications where maintaining and replacing batteries are impossi-ble, inconvenient, costly, or hazardous, in addition to decreasing the adverse effects the battery has on the environment. The main goal of this research study is to investigate energy harvesting aware design techniques for computational logic in order to enable the capa- II bility of working under non-deterministic energy sources. As a case study, the research concentrates on a vital part of all computational loads, SRAM, which occupies more than 90% of the chip area according to the ITRS re-ports. Essentially, this research conducted experiments to find out the design met-ric of an SRAM that is the most vulnerable to unpredictable energy sources, which has been confirmed to be the timing. Accordingly, the study proposed a truly self-timed SRAM that is realized based on complete handshaking protocols in the 6T bit-cell regulated by a fully Speed Independent (SI) tim-ing circuitry. The study proved the functionality of the proposed design in real silicon. Finally, the project enhanced other performance metrics of the self-timed SRAM concentrating on the bit-line length and the minimum operational voltage by employing several additional design techniques.Umm Al-Qura University, the Ministry of Higher Education in the Kingdom of Saudi Arabia, and the Saudi Cultural Burea

    Circuit Techniques for Adaptive and Reliable High Performance Computing.

    Full text link
    Increasing power density with process scaling has caused stagnation in the clock speed of modern microprocessors. Accordingly, designers have adopted message passing and shared memory based multicore architectures in order to keep up with the rapidly rising demand for computing throughput. At the same time, applications are not entirely parallel and improving single-thread performance continues to remain critical. Additionally, reliability is also worsening with process scaling, and margining for failures due to process and environmental variations in modern technologies consumes an increasingly large portion of the power/performance envelope. In the wake of multicore computing, reliability of signal synchronization between the cores is also becoming increasingly critical. This forces designers to search for alternate efficient methods to improve compute performance while addressing reliability. Accordingly, this dissertation presents innovative circuit and architectural techniques for variation-tolerance, performance and reliability targeted at datapath logic, signal synchronization and memories. Firstly, a domino logic based design style for datapath logic is presented that uses Adaptive Robustness Tuning (ART) in addition to timing speculation to provide up to 71% performance gains over conventional domino logic in 32bx32b multiplier in 65nm CMOS. Margins are reduced until functionality errors are detected, that are used to guide the tuning. Secondly, for signal synchronization across clock domains, a new class of dynamic logic based synchronizers with single-cycle synchronization latency is presented, where pulses, rather than stable intermediate voltages cause metastability. Such pulses are amplified using skewed inverters to improve mean time between failures by ~1e6x over jamb latches and double flip-flops at 2GHz in 65nm CMOS. Thirdly, a reconfigurable sensing scheme for 6T SRAMs is presented that employs auto-zero calibration and pre-amplification to improve sensing reliability (by up to 1.2 standard deviations of NMOS threshold voltage in 28nm CMOS); this increased reliability is in turn traded for ~42% sensing speedup. Finally, a main memory architecture design methodology to address reliability and power in the context of Exascale computing systems is presented. Based on 3D-stacked DRAMs, the methodology co-optimizes DRAM access energy, refresh power and the increased cost of error resilience, to meet stringent power and reliability constraints.PhDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/107238/1/bharan_1.pd

    Análise do impacto de pel decimation na codificação de vídeos de alta resolução

    Get PDF
    Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro Tecnológico, Programa de Pós-Graduação em Ciência da Computação, Florianópolis, 2014.Ao mesmo tempo em que o número de pixels por quadro tende a aumentar pela iminente adoção de resoluções ultra altas, a subamostragem de pixels, também conhecida por pel decimation, surge como uma opção viável para aumentar a eficiência energética da codificação de vídeo. Este trabalho investiga os impactos em energia e qualidade, quando pel decimation é aplicado ao cálculo da Soma das Diferenças Absolutas (SAD), a qual é a métrica de similaridade mais utilizada durante a etapa de estimação de movimento. Primeiramente, apresenta-se uma análise de qualidade de 15 padrões de subamostragem. Os 10.860 pontos experimentais usados proporcionam evidência estatística de que a razão de amostragem 4:3 proposta apresenta velocidade de codificação duas vezes maior do que a amostragem completa, perdendo apenas 5% em DSSIM e 1% em PSNR. A razão 4:3 apresenta o melhor custo-benefício entre aceleração e redução de qualidade, comparando-se com razões de menor amostragem. Para obter estimativas de área em silício e energia por bloco, cinco arquiteturas para cálculo da SAD foram projetadas e sintetizadas para uma biblioteca standard cell industrial. Dentre elas, uma pode ser configurada para operar com razões de amostragem 1:1, 4:3, 2:1 ou 4:1, ao passo que as demais foram projetadas para operar exclusivamente com cada uma destas razões de amostragem. A arquitetura configurável, operando em amostragem completa, consome 3,54 pJ/bloco (60% menos que a versão não-configurável), podendo ser reduzida até 1,34 pJ/bloco utilizando-se a razão de amostragem 4:1, com redução de 2,8% em PSNR e 14,1% em DSSIM. Finalmente, demonstra-se que a aceleração de codificação de um determinado padrão de subamostragem deve-se à redução conjunta do número de pixels amostradas e do número total de cálculos de SAD. Assim, modelando-se as componentes de energia da codificação de vídeos, demonstra-se que a eficiência energética do processo de codificação como um todo pode ser melhorada além da razão de subamostragem. Utilizando-se uma arquitetura de SAD configurável, a economia de energia pode ser de até 95,11%.Abstract : As the number of pixels per frame tends to increase by the upcoming adoption of ultra high resolutions, pixel subsampling, also known as pel decimation, appears as a viable means to improve the energy efficiency of video coding. This work investigates the impacts on energy and quality when pel decimation is applied to the Sum of Absolute Differences (SAD) calculation, which is the most used similarity metric in motion estimation step of video coding. Firstly, a quality assessment of 15 pel decimation patterns is presented. The 10,680 experimental points used provide statistical evidence that the proposed 4:3 ratio leads to an encoding speedup of more than two times in comparison to full sampling, losing only 5% in DSSIM and 1% in PSNR. Compared with lower sampling ratios, it presents a better trade-off between speedup and quality loss. To obtain estimates for silicon area and energy per block, five SAD architectures were designed and synthesized for an industrial standard cell library. Among those, one can be configured to operate with 1:1, 4:3, 2:1 or 4:1 sampling ratios, whereas the rest are tailored to operate exclusively with each one of these ratios. The configurable architecture consumes 3.54pJ/block operating in full sampling (60% lower than the nonconfigurable). The energy can be further reduced until 1.34pJ/block by using 4:1 ratio, with losses of 2.8% in PSNR and 14.1% in DSSIM. Finally, it is shown that the speedup of a given subsampling pattern is due the reduction of both the number of sampled pixels and the total number of SAD calculations. Therefore, by modeling the video coding energy components, it is shown that the whole video compression energy efficiency can be increased beyond the sampling ratio. By using a configurable SAD architecture operating in 4:1 ratio the energy savings are up to 95:11%
    corecore