75 research outputs found

    Multi-port Memory Design for Advanced Computer Architectures

    Get PDF
    In this thesis, we describe and evaluate novel memory designs for multi-port on-chip and off-chip use in advanced computer architectures. We focus on combining multi-porting and evaluating the performance over a range of design parameters. Multi-porting is essential for caches and shared-data systems, especially multi-core System-on-chips (SOC). It can significantly increase the memory access throughput. We evaluate FinFET voltage-mode multi-port SRAM cells using different metrics including leakage current, static noise margin and read/write performance. Simulation results show that single-ended multi-port FinFET SRAMs with isolated read ports offer improved read stability and flexibility over classical double-ended structures at the expense of write performance. By increasing the size of the access transistors, we show that the single-ended multi-port structures can achieve equivalent write performance to the classical double-ended multi-port structure for 9% area overhead. Moreover, compared with CMOS SRAM, FinFET SRAM has better stability and standby power. We also describe new methods for the design of FinFET current-mode multi-port SRAM cells. Current-mode SRAMs avoid the full-swing of the bitline, reducing dynamic power and access time. However, that comes at the cost of voltage drop, which compromises stability. The design proposed in this thesis utilizes the feature of Independent Gate (IG) mode FinFET, which can leverage threshold voltage by controlling the back gate voltage, to merge two transistors into one through high-Vt and low-Vt transistors. This design not only reduces the voltage drop, but it also reduces the area in multi-port current-mode SRAM design. For off-chip memory, we propose a novel two-port 1-read, 1-write (1R1W) phasechange memory (PCM) cell, which significantly reduces the probability of blocking at the bank levels. Different from the traditional PCM cell, the access transistors are at the top and connected to the bitline. We use Verilog-A to model the behavior of Ge2Sb2Te5 (GST: the storage component). We evaluate the performance of the two-port cell by transistor sizing and voltage pumping. Simulation results show that pMOS transistor is more practical than nMOS transistor as the access device when both area and power are considered. The estimated area overhead is 1.7�, compared to single-port PCM cell. In brief, the contribution we make in this thesis is that we propose and evaluate three different kinds of multi-port memories that are favorable for advanced computer architectures

    X-SRAM: Enabling In-Memory Boolean Computations in CMOS Static Random Access Memories

    Get PDF
    Silicon-based Static Random Access Memories (SRAM) and digital Boolean logic have been the workhorse of the state-of-art computing platforms. Despite tremendous strides in scaling the ubiquitous metal-oxide-semiconductor transistor, the underlying \textit{von-Neumann} computing architecture has remained unchanged. The limited throughput and energy-efficiency of the state-of-art computing systems, to a large extent, results from the well-known \textit{von-Neumann bottleneck}. The energy and throughput inefficiency of the von-Neumann machines have been accentuated in recent times due to the present emphasis on data-intensive applications like artificial intelligence, machine learning \textit{etc}. A possible approach towards mitigating the overhead associated with the von-Neumann bottleneck is to enable \textit{in-memory} Boolean computations. In this manuscript, we present an augmented version of the conventional SRAM bit-cells, called \textit{the X-SRAM}, with the ability to perform in-memory, vector Boolean computations, in addition to the usual memory storage operations. We propose at least six different schemes for enabling in-memory vector computations including NAND, NOR, IMP (implication), XOR logic gates with respect to different bit-cell topologies - the 8T cell and the 8+^+T Differential cell. In addition, we also present a novel \textit{`read-compute-store'} scheme, wherein the computed Boolean function can be directly stored in the memory without the need of latching the data and carrying out a subsequent write operation. The feasibility of the proposed schemes has been verified using predictive transistor models and Monte-Carlo variation analysis.Comment: This article has been accepted in a future issue of IEEE Transactions on Circuits and Systems-I: Regular Paper

    Power Efficient Data-Aware SRAM Cell for SRAM-Based FPGA Architecture

    Get PDF
    The design of low-power SRAM cell becomes a necessity in today\u27s FPGAs, because SRAM is a critical component in FPGA design and consumes a large fraction of the total power. The present chapter provides an overview of various factors responsible for power consumption in FPGA and discusses the design techniques of low-power SRAM-based FPGA at system level, device level, and architecture levels. Finally, the chapter proposes a data-aware dynamic SRAM cell to control the power consumption in the cell. Stack effect has been adopted in the design to reduce the leakage current. The various peripheral circuits like address decoder circuit, write/read enable circuits, and sense amplifier have been modified to implement a power-efficient SRAM-based FPGA

    Robust low-power digital circuit design in nano-CMOS technologies

    Get PDF
    Device scaling has resulted in large scale integrated, high performance, low-power, and low cost systems. However the move towards sub-100 nm technology nodes has increased variability in device characteristics due to large process variations. Variability has severe implications on digital circuit design by causing timing uncertainties in combinational circuits, degrading yield and reliability of memory elements, and increasing power density due to slow scaling of supply voltage. Conventional design methods add large pessimistic safety margins to mitigate increased variability, however, they incur large power and performance loss as the combination of worst cases occurs very rarely. In-situ monitoring of timing failures provides an opportunity to dynamically tune safety margins in proportion to on-chip variability that can significantly minimize power and performance losses. We demonstrated by simulations two delay sensor designs to detect timing failures in advance that can be coupled with different compensation techniques such as voltage scaling, body biasing, or frequency scaling to avoid actual timing failures. Our simulation results using 45 nm and 32 nm technology BSIM4 models indicate significant reduction in total power consumption under temperature and statistical variations. Future work involves using dual sensing to avoid useless voltage scaling that incurs a speed loss. SRAM cache is the first victim of increased process variations that requires handcrafted design to meet area, power, and performance requirements. We have proposed novel 6 transistors (6T), 7 transistors (7T), and 8 transistors (8T)-SRAM cells that enable variability tolerant and low-power SRAM cache designs. Increased sense-amplifier offset voltage due to device mismatch arising from high variability increases delay and power consumption of SRAM design. We have proposed two novel design techniques to reduce offset voltage dependent delays providing a high speed low-power SRAM design. Increasing leakage currents in nano-CMOS technologies pose a major challenge to a low-power reliable design. We have investigated novel segmented supply voltage architecture to reduce leakage power of the SRAM caches since they occupy bulk of the total chip area and power. Future work involves developing leakage reduction methods for the combination logic designs including SRAM peripherals

    A METHODOLOGY OF SPICE SIMULATION TO EXTRACT SRAM SETUP AND HOLD TIMING PARAMETERS BASED ON DFF DELAY DEGRADATION

    Get PDF
    SRAM is a significant component in high speed computer design, which serves mainly as high speed storage elements like register files in microprocessors, or the interface like multiple-level caches between high speed processing elements and low speed peripherals. One method to design the SRAM is to use commercial memory compiler. Such compiler can generate different density/speed SRAM designs with single/dual/multiple ports to fulfill design purpose. There are discrepancy of the SRAM timing parameters between extracted layout netlist SPICE simulation vs. equation-based Liberty file (.lib) by a commercial memory compiler. This compiler takes spec values as its input and uses them as the starting points to generate the timing tables/matrices in the .lib. Originally large spec values are given to guarantee design correctness. While such spec values are usually too pessimistic when comparing with the results from extracted layout SPICE simulation, which serves as the “golden” rule. Besides, there is no margin information built-in such .lib generated by this compiler. A new methodology is proposed to get accurate spec values for the input of this compiler to generate more realistic matrices in .lib, which will benefit during the integration of the SRAM IP and timing analysis

    Statistical analysis and design of subthreshold operation memories

    Get PDF
    This thesis presents novel methods based on a combination of well-known statistical techniques for faster estimation of memory yield and their application in the design of energy-efficient subthreshold memories. The emergence of size-constrained Internet-of-Things (IoT) devices and proliferation of the wearable market has brought forward the challenge of achieving the maximum energy efficiency per operation in these battery operated devices. Achieving this sought-after minimum energy operation is possible under sub-threshold operation of the circuit. However, reliable memory operation is currently unattainable at these ultra-low operating voltages because of the memory circuit's vanishing noise margins which shrink further in the presence of random process variations. The statistical methods, presented in this thesis, make the yield optimization of the sub-threshold memories computationally feasible by reducing the SPICE simulation overhead. We present novel modifications to statistical sampling techniques that reduce the SPICE simulation overhead in estimating memory failure probability. These sampling scheme provides 40x reduction in finding most probable failure point and 10x reduction in estimating failure probability using the SPICE simulations compared to the existing proposals. We then provide a novel method to create surrogate models of the memory margins with better extrapolation capability than the traditional regression methods. These models, based on Gaussian process regression, encode the sensitivity of the memory margins with respect to each individual threshold variation source in a one-dimensional kernel. We find that our proposed additive kernel based models have 32% smaller out-of-sample error (that is, better extrapolation capability outside training set) than using the six-dimensional universal kernel like Radial Basis Function (RBF). The thesis also explores the topological modifications to the SRAM bitcell to achieve faster read operation at the sub-threshold operating voltages. We present a ten-transistor SRAM bitcell that achieves 2x faster read operation than the existing ten-transistor sub-threshold SRAM bitcells, while ensuring similar noise margins. The SRAM bitcell provides 70% reduction in dynamic energy at the cost of 42% increase in the leakage energy per read operation. Finally, we investigate the energy efficiency of the eDRAM gain-cells as an alternative to the SRAM bitcells in the size-constrained IoT devices. We find that reducing their write path leakage current is the only way to reduce the read energy at Minimum Energy operation Point (MEP). Further, we study the effect of transistor up-sizing under the presence of threshold voltage variations on the mean MEP read energy by performing statistical analysis based on the ANOVA test of the full-factorial experimental design.Esta tesis presenta nuevos métodos basados en una combinación de técnicas estadísticas conocidas para la estimación rápida del rendimiento de la memoria y su aplicación en el diseño de memorias de energia eficiente de sub-umbral. La aparición de los dispositivos para el Internet de las cosas (IOT) y la proliferación del mercado portátil ha presentado el reto de lograr la máxima eficiencia energética por operación de estos dispositivos operados con baterias. La eficiencia de energía es posible si se considera la operacion por debajo del umbral de los circuitos. Sin embargo, la operación confiable de memoria es actualmente inalcanzable en estos bajos niveles de voltaje debido a márgenes de ruido de fuga del circuito de memoria, los cuales se pueden reducir aún más en presencia de variaciones randomicas de procesos. Los métodos estadísticos, que se presentan en esta tesis, hacen que la optimización del rendimiento de las memorias por debajo del umbral computacionalmente factible mediante la simulación SPICE. Presentamos nuevas modificaciones a las técnicas de muestreo estadístico que reducen la sobrecarga de simulación SPICE en la estimación de la probabilidad de fallo de memoria. Estos esquemas de muestreo proporciona una reducción de 40 veces en la búsqueda de puntos de fallo más probable, y 10 veces la reducción en la estimación de la probabilidad de fallo mediante las simulaciones SPICE en comparación con otras propuestas existentes. A continuación, se proporciona un método novedoso para crear modelos sustitutos de los márgenes de memoria con una mejor capacidad de extrapolación que los métodos tradicionales de regresión. Estos modelos, basados en el proceso de regresión Gaussiano, codifican la sensibilidad de los márgenes de memoria con respecto a cada fuente de variación de umbral individual en un núcleo de una sola dimensión. Los modelos propuestos, basados en kernel aditivos, tienen un error 32% menor que el error out-of-sample (es decir, mejor capacidad de extrapolación fuera del conjunto de entrenamiento) en comparacion con el núcleo universal de seis dimensiones como la función de base radial (RBF). La tesis también explora las modificaciones topológicas a la celda binaria SRAM para alcanzar velocidades de lectura mas rapidas dentro en el contexto de operaciones en el umbral de tensiones de funcionamiento. Presentamos una celda binaria SRAM de diez transistores que consigue aumentar en 2 veces la operación de lectura en comparacion con las celdas sub-umbral de SRAM de diez transistores existentes, garantizando al mismo tiempo los márgenes de ruido similares. La celda binaria SRAM proporciona una reducción del 70% en energía dinámica a costa del aumento del 42% en la energía de fuga por las operaciones de lectura. Por último, se investiga la eficiencia energética de las células de ganancia eDRAM como una alternativa a los bitcells SRAM en los dispositivos de tamaño limitado IOT. Encontramos que la reducción de la corriente de fuga en el path de escritura es la única manera de reducir la energía de lectura en el Punto Mínimo de Energía (MEP). Además, se estudia el efecto del transistor de dimensionamiento en virtud de la presencia de variaciones de voltaje de umbral en la media de energia de lecture MEP mediante el análisis estadístico basado en la prueba de ANOVA del diseño experimental factorial completo.Postprint (published version

    Ultra-low-power SRAM design in high variability advanced CMOS

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.Cataloged from PDF version of thesis.Includes bibliographical references (p. 163-181).Embedded SRAMs are a critical component in modern digital systems, and their role is preferentially increasing. As a result, SRAMs strongly impact the overall power, performance, and area, and, in order to manage these severely constrained trade-offs, they must be specially designed for target applications. Highly energy-constrained systems (e.g. implantable biomedical devices, multimedia handsets, etc.) are an important class of applications driving ultra-low-power SRAMs. This thesis analyzes the energy of an SRAM sub-array. Since supply- and threshold-voltage have a strong effect, targets for these are established in order to optimize energy. Despite the heavy emphasis on leakage-energy, analysis of a high-density 256x256 sub-array in 45nm LP CMOS points to two necessary optimizations: (1) aggressive supply-voltage reduction (in addition to Vt elevation), and (2) performance enhancement. Important SRAM metrics, including read/write/hold-margin and read-current, are also investigated to identify trade-offs of these optimizations. Based on the need to lower supply-voltage, a 0.35V 256kb SRAM is demonstrated in 65nm LP CMOS. It uses an 8T bit-cell with peripheral circuit-assists to improve write-margin and bit-line leakage. Additionally, redundancy, to manage the increasing impact of variability in the periphery, is proposed to improve the area-offset trade-off of sense-amplifiers, demonstrating promise for highly advanced technology nodes. Based on the need to improve performance, which is limited by density constraints, a 64kb SRAM, using an offset-compensating sense-amplifier, is demonstrated in 45nm LP CMOS with high-density 0.25[mu]m2 bit-cells.(cont.) The sense-amplifier is regenerative, but non -strobed, overcoming timing uncertainties limiting performance, and it is single-ended, for compatibility with 8T cells. Compared to a conventional strobed sense-amplifier, it achieves 34% improvement in worst-case access-time and 4x improvement in the standard deviation of the access-time.by Naveen Verma.Ph.D

    Cache designs for reliable hybrid high and ultra-low voltage operation

    Get PDF
    Increasing demand for implementing highly-miniaturized battery-powered ultra-low-cost systems (e.g., below 1 USD) in emerging applications such as body, urban life and environment monitoring, etc., has introduced many challenges in the chip design. Such applications require high performance occasionally, but very little energy consumption during most of the time in order to extend battery lifetime. In addition, they require real-time guarantees. The most suitable technological solution for those devices consists of using hybrid processors able to operate at: (i) high voltage to provide high performance and (ii) near-/sub-threshold (NST) voltage to provide ultra-low energy consumption. However, the most efficient SRAM memories for each voltage level differ and it is mandatory trading off different SRAM designs, especially in cache memories, which occupy most of the processor¿s area. In this Thesis, we analyze the performance/power tradeoffs involved in the design of SRAM L1 caches for reliable hybrid high and NST Vcc operation from a microarchitectural perspective. We develop new, simple, single-Vcc domain hybrid cache architectures and data management mechanisms that satisfy all stringent needs of our target market. Proposed solutions are shown to have high energy efficiency with negligible impact on average performance while maintaining strong performance guarantees as required for our target market

    Design and Implementation of Low Power SRAM Using Highly Effective Lever Shifters

    Get PDF
    The explosive growth of battery-operated devices has made low-power design a priority in recent years. In high-performance Systems-on-Chip, leakage power consumption has become comparable to the dynamic component, and its relevance increases as technology scales. These trends are even more evident for SRAM memory devices since they are a dominant source of standby power consumption in low-power application processors. The on-die SRAM power consumption is particularly important for increasingly pervasive mobile and handheld applications where battery life is a key design and technology attribute. In the SRAM-memory design, SRAM cells also comprise the most significant portion of the total chip. Moreover, the increasing number of transistors in the SRAM memories and the MOSs\u27 increasing leakage current in the scaled technologies have turned the SRAM unit into a power-hungry block for both dynamic and static viewpoints. Although the scaling of the supply voltage enables low-power consumption, the SRAM cells\u27 data stability becomes a major concern. Thus, the reduction of SRAM leakage power has become a critical research concern. To address the leakage power consumption in high-performance cache memories, a stream of novel integrated circuit and architectural level techniques are proposed by researchers including leakage-current management techniques, cell array leakage reduction techniques, bitline leakage reduction techniques, and leakage current compensation techniques. The main goal of this work was to improve the cell array leakage reduction techniques in order to minimize the leakage power for SRAM memory design in low-power applications. This study performs the body biasing application to reduce leakage current as well. To adjust the NMOSs\u27 threshold voltage and consequently leakage current, a negative DC voltage could be applied to their body terminal as a second gate. As a result, in order to generate a negative DC voltage, this study proposes a negative voltage reference that includes a trimming circuit and a negative level shifter. These enhancements are employed to a 10kb SRAM memory operating at 0.3V in a 65nm CMOS process
    corecore