40 research outputs found

    Reliability in the face of variability in nanometer embedded memories

    Get PDF
    In this thesis, we have investigated the impact of parametric variations on the behaviour of one performance-critical processor structure - embedded memories. As variations manifest as a spread in power and performance, as a first step, we propose a novel modeling methodology that helps evaluate the impact of circuit-level optimizations on architecture-level design choices. Choices made at the design-stage ensure conflicting requirements from higher-levels are decoupled. We then complement such design-time optimizations with a runtime mechanism that takes advantage of adaptive body-biasing to lower power whilst improving performance in the presence of variability. Our proposal uses a novel fully-digital variation tracking hardware using embedded DRAM (eDRAM) cells to monitor run-time changes in cache latency and leakage. A special fine-grain body-bias generator uses the measurements to generate an optimal body-bias that is needed to meet the required yield targets. A novel variation-tolerant and soft-error hardened eDRAM cell is also proposed as an alternate candidate for replacing existing SRAM-based designs in latency critical memory structures. In the ultra low-power domain where reliable operation is limited by the minimum voltage of operation (Vddmin), we analyse the impact of failures on cache functional margin and functional yield. Towards this end, we have developed a fully automated tool (INFORMER) capable of estimating memory-wide metrics such as power, performance and yield accurately and rapidly. Using the developed tool, we then evaluate the #effectiveness of a new class of hybrid techniques in improving cache yield through failure prevention and correction. Having a holistic perspective of memory-wide metrics helps us arrive at design-choices optimized simultaneously for multiple metrics needed for maintaining lifetime requirements

    Reliability and security in low power circuits and systems

    Get PDF
    With the massive deployment of mobile devices in sensitive areas such as healthcare and defense, hardware reliability and security have become hot research topics in recent years. These topics, although different in definition, are usually correlated. This dissertation offers an in-depth treatment on enhancing the reliability and security of low power circuits and systems. The first part of the dissertation deals with the reliability of sub-threshold designs, which use supply voltage lower than the threshold voltage (Vth) of transistors to reduce power. The exponential relationship between delay and Vth significantly jeopardizes their reliability due to process variation induced timing violations. In order to address this problem, this dissertation proposes a novel selective body biasing scheme. In the first work, the selective body biasing problem is formulated as a linearly constrained statistical optimization model, and the adaptive filtering concept is borrowed from the signal processing community to develop an efficient solution. However, since the adaptive filtering algorithm lacks theoretical justification and guaranteed convergence rate, in the second work, a new approach based on semi-infinite programming with incremental hypercubic sampling is proposed, which demonstrates better solution quality with shorter runtime. The second work deals with the security of low power crypto-processors, equipped with Random Dynamic Voltage Scaling (RDVS), in the presence of Correlation Power Analysis (CPA) attacks. This dissertation firstly demonstrates that the resistance of RDVS to CPA can be undermined by lowering power supply voltage. Then, an alarm circuit is proposed to resist this attack. However, the alarm circuit will lead to potential denial-of-service due to noise-triggered false alarms. A non-zero sum game model is then formulated and the Nash Equilibria is analyzed --Abstract, page iii

    FDSOI Design using Automated Standard-Cell-Grained Body Biasing

    Get PDF
    With the introduction of FDSOI processes at competitive technology nodes, body biasing on an unprecedented scale was made possible. Body biasing influences one of the central transistor characteristics, the threshold voltage. By being able to heighten or lower threshold voltage by more than 100mV, the very physics of transistor switching can be manipulated at run time. Furthermore, as body biasing does not lead to different signal levels, it can be applied much more fine-grained than, e.g., DVFS. With the state of the art mainly focused on combinations of body biasing with DVFS, it has thus ignored granularities unfeasible for DVFS. This thesis fills this gap by proposing body bias domain partitioning techniques and for body bias domain partitionings thereby generated, algorithms that search for body bias assignments. Several different granularities ranging from entire cores to small groups of standard cells were examined using two principal approaches: Designer aided pre-partitioning based determination of body bias domains and a first-time, fully automatized, netlist based approach called domain candidate exploration. Both approaches operate along the lines of activation and timing of standard cell groups. These approaches were evaluated using the example of a Dynamically Reconfigurable Processor (DRP), a highly efficient category of reconfigurable architectures which consists of an array of processing elements and thus offers many opportunities for generalization towards many-core architectures. Finally, the proposed methods were validated by manufacturing a test-chip. Extensive simulation runs as well as the test-chip evaluation showed the validity of the proposed methods and indicated substantial improvements in energy efficiency compared to the state of the art. These improvements were accomplished by the fine-grained partitioning of the DRP design. This method allowed reducing dynamic power through supply voltage levels yielding higher clock frequencies using forward body biasing, while simultaneously reducing static power consumption in unused parts.Die Einführung von FDSOI Prozessen in gegenwärtigen Prozessgrößen ermöglichte die Nutzung von Substratvorspannung in nie zuvor dagewesenem Umfang. Substratvorspannung beeinflusst unter anderem eine zentrale Eigenschaft von Transistoren, die Schwellspannung. Mittels Substratvorspannung kann diese um mehr als 100mV erhöht oder gesenkt werden, was es ermöglicht, die schiere Physik des Schaltvorgangs zu manipulieren. Da weiterhin hiervon der Signalpegel der digitalen Signale unberührt bleibt, kann diese Technik auch in feineren Granularitäten angewendet werden, als z.B. Dynamische Spannungs- und Frequenz Anpassung (Engl. Dynamic Voltage and Frequency Scaling, Abk. DVFS). Da jedoch der Stand der Technik Substratvorspannung hauptsächlich in Kombinationen mit DVFS anwendet, werden feinere Granularitäten, welche für DVFS nicht mehr wirtschaftlich realisierbar sind, nicht berücksichtigt. Die vorliegende Arbeit schließt diese Lücke, indem sie Partitionierungsalgorithmen zur Unterteilung eines Entwurfs in Substratvorspannungsdomänen vorschlägt und für diese hierdurch unterteilten Domänen entsprechende Substratvorspannungen berechnet. Hierzu wurden verschiedene Granularitäten berücksichtigt, von ganzen Prozessorkernen bis hin zu kleinen Gruppen von Standardzellen. Diese Entwürfe wurden dann mit zwei verschiedenen Herangehensweisen unterteilt: Chipdesigner unterstützte, vorpartitionierungsbasierte Bestimmung von Substratvorspannungsdomänen, sowie ein erstmals vollautomatisierter, Netzlisten basierter Ansatz, in dieser Arbeit Domänen Kandidaten Exploration genannt. Beide Ansätze funktionieren nach dem Prinzip der Aktivierung, d.h. zu welchem Zeitpunkt welcher Teil des Entwurfs aktiv ist, sowie der Signallaufzeit durch die entsprechenden Entwurfsteile. Diese Ansätze wurden anhand des Beispiels Dynamisch Rekonfigurierbarer Prozessoren (DRP) evaluiert. DRPs stellen eine Klasse hocheffizienter rekonfigurierbarer Architekturen dar, welche hauptsächlich aus einem Feld von Rechenelementen besteht und dadurch auch zahlreiche Möglichkeiten zur Verallgemeinerung hinsichtlich Many-Core Architekturen zulässt. Schließlich wurden die vorgeschlagenen Methoden in einem Testchip validiert. Alle ermittelten Ergebnisse zeigen im Vergleich zum Stand der Technik drastische Verbesserungen der Energieeffizienz, welche durch die feingranulare Unterteilung in Substratvorspannungsdomänen erzielt wurde. Hierdurch konnten durch die Anwendung von Substratvorspannung höhere Taktfrequenzen bei gleicher Versorgungsspannung erzielt werden, während zeitgleich in zeitlich unkritischen oder ungenutzten Entwurfsteilen die statische Leistungsaufnahme minimiert wurde

    Etude de la variabilité en technologie FDSOI : du transistor aux cellules mémoires SRAM

    No full text
    The scaling of bulk MOSFETs transistors is facing various difficulties at the nanometer era. The variability of the electrical characteristics becomes a major challenge which increases as the device dimensions are scaled down. Fully-Depleted Silicon On Insulator (FDSOI) technology, developed as an alternative to bulk transistors, exhibits a better electrostatic immunity which enables higher performances. Moreover, the reduction of the Random Dopant Fluctuation allows excellent variability immunity for the FDSOI technology due to its undoped channel. It leads to a yield enhancement and a reduction of the minimum supply voltage of SRAM circuits. The variability has been analyzed deeply during this thesis in this technology, both on the threshold voltage (VT) and on the ON-state current (ISAT). The correlation between the electrical characteristics of MOSFETs devices (i.e., the threshold voltage and the standard deviation σVT) and SRAM cells (i.e., the SNM and σSNM) has been investigated thanks to an extensive experimental study and modeling. This purpose of this thesis is also to analyze the specific FDSOI variability source: silicon thickness fluctuations. An analytical model has been developed in order to quantify the impact of local TSi variations on the VT variability for 28 and 20nm technology nodes, as well as on a 200Mb SRAM array. This model also enables to evaluate the silicon thickness mean (µTsi) and standard deviation (σTsi) specifications for next technology nodes.La miniaturisation des transistors MOSFETs sur silicium massif présente de nombreux enjeux en raison de l'apparition de phénomènes parasites. Notamment, la réduction de la surface des dispositifs entraîne une dégradation de la variabilité de leurs caractéristiques électriques. La technologie planaire totalement désertée, appelée communément FDSOI (pour Fully Depleted Silicon on Insulator), permet d'améliorer le contrôle électrostatique de la grille sur le canal de conduction et par conséquent d'optimiser les performances. De plus, de par la présence d'un canal non dopé, il est possible de réduire efficacement la variabilité de la tension de seuil des transistors. Cela se traduit par un meilleur rendement et par une diminution de la tension minimale d'alimentation des circuits SRAM (pour Static Random Access Memory). Une étude détaillée de la variabilité intrinsèque à cette technologie a été réalisée durant ce travail de recherche, aussi bien sur la tension de seuil (VT) que sur le courant de drain à l'état passant (ISAT). De plus, le lien existant entre la fluctuation des caractéristiques électriques des transistors et des circuits SRAM a été expérimentalement analysé en détail. Une large partie de cette thèse est enfin dédiée à l'investigation de la source de variabilité spécifique à la technologie FDSOI : les fluctuations de l'épaisseur du film de silicium. Un modèle analytique a été développé durant cette thèse afin d'étudier l'influence des fluctuations locales de TSi sur la variabilité de la tension de seuil des transistors pour les nœuds technologiques 28 et 20nm, ainsi que sur un circuit SRAM de 200Mb. Ce modèle a également pour but de fournir des spécifications en termes d'uniformité σTsi et d'épaisseur moyenne µTsi du film de silicium pour les prochains nœuds technologiques

    Choose-Your-Own Adventure: A Lightweight, High-Performance Approach To Defect And Variation Mitigation In Reconfigurable Logic

    Get PDF
    For field-programmable gate arrays (FPGAs), fine-grained pre-computed alternative configurations, combined with simple test-based selection, produce limited per-chip specialization to counter yield loss, increased delay, and increased energy costs that come from fabrication defects and variation. This lightweight approach achieves much of the benefit of knowledge-based full specialization while reducing to practical, palatable levels the computational, testing, and load-time costs that obstruct the application of the knowledge-based approach. In practice this may more than double the power-limited computational capabilities of dies fabricated with 22nm technologies. Contributions of this work: • Choose-Your-own-Adventure (CYA), a novel, lightweight, scalable methodology to achieve defect and variation mitigation • Implementation of CYA, including preparatory components (generation of diverse alternative paths) and FPGA load-time components • Detailed performance characterization of CYA – Comparison to conventional loading and dynamic frequency and voltage scaling (DFVS) – Limit studies to characterize the quality of the CYA implementation and identify potential areas for further optimizatio

    Dependable Embedded Systems

    Get PDF
    This Open Access book introduces readers to many new techniques for enhancing and optimizing reliability in embedded systems, which have emerged particularly within the last five years. This book introduces the most prominent reliability concerns from today’s points of view and roughly recapitulates the progress in the community so far. Unlike other books that focus on a single abstraction level such circuit level or system level alone, the focus of this book is to deal with the different reliability challenges across different levels starting from the physical level all the way to the system level (cross-layer approaches). The book aims at demonstrating how new hardware/software co-design solution can be proposed to ef-fectively mitigate reliability degradation such as transistor aging, processor variation, temperature effects, soft errors, etc. Provides readers with latest insights into novel, cross-layer methods and models with respect to dependability of embedded systems; Describes cross-layer approaches that can leverage reliability through techniques that are pro-actively designed with respect to techniques at other layers; Explains run-time adaptation and concepts/means of self-organization, in order to achieve error resiliency in complex, future many core systems

    Low power architectures for streaming applications

    Get PDF

    Gestión de jerarquías de memoria híbridas a nivel de sistema

    Get PDF
    Tesis inédita de la Universidad Complutense de Madrid, Facultad de Informática, Departamento de Arquitectura de Computadoras y Automática y de Ku Leuven, Arenberg Doctoral School, Faculty of Engineering Science, leída el 11/05/2017.In electronics and computer science, the term ‘memory’ generally refers to devices that are used to store information that we use in various appliances ranging from our PCs to all hand-held devices, smart appliances etc. Primary/main memory is used for storage systems that function at a high speed (i.e. RAM). The primary memory is often associated with addressable semiconductor memory, i.e. integrated circuits consisting of silicon-based transistors, used for example as primary memory but also other purposes in computers and other digital electronic devices. The secondary/auxiliary memory, in comparison provides program and data storage that is slower to access but offers larger capacity. Examples include external hard drives, portable flash drives, CDs, and DVDs. These devices and media must be either plugged in or inserted into a computer in order to be accessed by the system. Since secondary storage technology is not always connected to the computer, it is commonly used for backing up data. The term storage is often used to describe secondary memory. Secondary memory stores a large amount of data at lesser cost per byte than primary memory; this makes secondary storage about two orders of magnitude less expensive than primary storage. There are two main types of semiconductor memory: volatile and nonvolatile. Examples of non-volatile memory are ‘Flash’ memory (sometimes used as secondary, sometimes primary computer memory) and ROM/PROM/EPROM/EEPROM memory (used for firmware such as boot programs). Examples of volatile memory are primary memory (typically dynamic RAM, DRAM), and fast CPU cache memory (typically static RAM, SRAM, which is fast but energy-consuming and offer lower memory capacity per are a unit than DRAM). Non-volatile memory technologies in Si-based electronics date back to the 1990s. Flash memory is widely used in consumer electronic products such as cellphones and music players and NAND Flash-based solid-state disks (SSDs) are increasingly displacing hard disk drives as the primary storage device in laptops, desktops, and even data centers. The integration limit of Flash memories is approaching, and many new types of memory to replace conventional Flash memories have been proposed. The rapid increase of leakage currents in Silicon CMOS transistors with scaling poses a big challenge for the integration of SRAM memories. There is also the case of susceptibility to read/write failure with low power schemes. As a result of this, over the past decade, there has been an extensive pooling of time, resources and effort towards developing emerging memory technologies like Resistive RAM (ReRAM/RRAM), STT-MRAM, Domain Wall Memory and Phase Change Memory(PRAM). Emerging non-volatile memory technologies promise new memories to store more data at less cost than the expensive-to build silicon chips used by popular consumer gadgets including digital cameras, cell phones and portable music players. These new memory technologies combine the speed of static random-access memory (SRAM), the density of dynamic random-access memory (DRAM), and the non-volatility of Flash memory and so become very attractive as another possibility for future memory hierarchies. The research and information on these Non-Volatile Memory (NVM) technologies has matured over the last decade. These NVMs are now being explored thoroughly nowadays as viable replacements for conventional SRAM based memories even for the higher levels of the memory hierarchy. Many other new classes of emerging memory technologies such as transparent and plastic, three-dimensional(3-D), and quantum dot memory technologies have also gained tremendous popularity in recent years...En el campo de la informática, el término ‘memoria’ se refiere generalmente a dispositivos que son usados para almacenar información que posteriormente será usada en diversos dispositivos, desde computadoras personales (PC), móviles, dispositivos inteligentes, etc. La memoria principal del sistema se utiliza para almacenar los datos e instrucciones de los procesos que se encuentre en ejecución, por lo que se requiere que funcionen a alta velocidad (por ejemplo, DRAM). La memoria principal está implementada habitualmente mediante memorias semiconductoras direccionables, siendo DRAM y SRAM los principales exponentes. Por otro lado, la memoria auxiliar o secundaria proporciona almacenaje(para ficheros, por ejemplo); es más lenta pero ofrece una mayor capacidad. Ejemplos típicos de memoria secundaria son discos duros, memorias flash portables, CDs y DVDs. Debido a que estos dispositivos no necesitan estar conectados a la computadora de forma permanente, son muy utilizados para almacenar copias de seguridad. La memoria secundaria almacena una gran cantidad de datos aun coste menor por bit que la memoria principal, siendo habitualmente dos órdenes de magnitud más barata que la memoria primaria. Existen dos tipos de memorias de tipo semiconductor: volátiles y no volátiles. Ejemplos de memorias no volátiles son las memorias Flash (algunas veces usadas como memoria secundaria y otras veces como memoria principal) y memorias ROM/PROM/EPROM/EEPROM (usadas para firmware como programas de arranque). Ejemplos de memoria volátil son las memorias DRAM (RAM dinámica), actualmente la opción predominante a la hora de implementar la memoria principal, y las memorias SRAM (RAM estática) más rápida y costosa, utilizada para los diferentes niveles de cache. Las tecnologías de memorias no volátiles basadas en electrónica de silicio se remontan a la década de1990. Una variante de memoria de almacenaje por carga denominada como memoria Flash es mundialmente usada en productos electrónicos de consumo como telefonía móvil y reproductores de música mientras NAND Flash solid state disks(SSDs) están progresivamente desplazando a los dispositivos de disco duro como principal unidad de almacenamiento en computadoras portátiles, de escritorio e incluso en centros de datos. En la actualidad, hay varios factores que amenazan la actual predominancia de memorias semiconductoras basadas en cargas (capacitivas). Por un lado, se está alcanzando el límite de integración de las memorias Flash, lo que compromete su escalado en el medio plazo. Por otra parte, el fuerte incremento de las corrientes de fuga de los transistores de silicio CMOS actuales, supone un enorme desafío para la integración de memorias SRAM. Asimismo, estas memorias son cada vez más susceptibles a fallos de lectura/escritura en diseños de bajo consumo. Como resultado de estos problemas, que se agravan con cada nueva generación tecnológica, en los últimos años se han intensificado los esfuerzos para desarrollar nuevas tecnologías que reemplacen o al menos complementen a las actuales. Los transistores de efecto campo eléctrico ferroso (FeFET en sus siglas en inglés) se consideran una de las alternativas más prometedores para sustituir tanto a Flash (por su mayor densidad) como a DRAM (por su mayor velocidad), pero aún está en una fase muy inicial de su desarrollo. Hay otras tecnologías algo más maduras, en el ámbito de las memorias RAM resistivas, entre las que cabe destacar ReRAM (o RRAM), STT-RAM, Domain Wall Memory y Phase Change Memory (PRAM)...Depto. de Arquitectura de Computadores y AutomáticaFac. de InformáticaTRUEunpu

    Collective analog bioelectronic computation

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from student submitted PDF version of thesis.Includes bibliographical references (p. 677-710).In this thesis, I present two examples of fast-and-highly-parallel analog computation inspired by architectures in biology. The first example, an RF cochlea, maps the partial differential equations that describe fluid-membrane-hair-cell wave propagation in the biological cochlea to an equivalent inductor-capacitor-transistor integrated circuit. It allows ultra-broadband spectrum analysis of RF signals to be performed in a rapid low-power fashion, thus enabling applications for universal or software radio. The second example exploits detailed similarities between the equations that describe chemical-reaction dynamics and the equations that describe subthreshold current flow in transistors to create fast-and-highly-parallel integrated-circuit models of protein-protein and gene-protein networks inside a cell. Due to a natural mapping between the Poisson statistics of molecular flows in a chemical reaction and Poisson statistics of electronic current flow in a transistor, stochastic effects are automatically incorporated into the circuit architecture, allowing highly computationally intensive stochastic simulations of large-scale biochemical reaction networks to be performed rapidly. I show that the exponentially tapered transmission-line architecture of the mammalian cochlea performs constant-fractional-bandwidth spectrum analysis with O(N) expenditure of both analysis time and hardware, where N is the number of analyzed frequency bins. This is the best known performance of any spectrum-analysis architecture, including the constant-resolution Fast Fourier Transform (FFT), which scales as O(N logN), or a constant-fractional-bandwidth filterbank, which scales as O (N2).(cont.) The RF cochlea uses this bio-inspired architecture to perform real-time, on-chip spectrum analysis at radio frequencies. I demonstrate two cochlea chips, implemented in standard 0.13m CMOS technology, that decompose the RF spectrum from 600MHz to 8GHz into 50 log-spaced channels, consume < 300mW of power, and possess 70dB of dynamic range. The real-time spectrum analysis capabilities of my chips make them uniquely suitable for ultra-broadband universal or software radio receivers of the future. I show that the protein-protein and gene-protein chips that I have built are particularly suitable for simulation, parameter discovery and sensitivity analysis of interaction networks in cell biology, such as signaling, metabolic, and gene regulation pathways. Importantly, the chips carry out massively parallel computations, resulting in simulation times that are independent of model complexity, i.e., O(1). They also automatically model stochastic effects, which are of importance in many biological systems, but are numerically stiff and simulate slowly on digital computers. Currently, non-fundamental data-acquisition limitations show that my proof-of-concept chips simulate small-scale biochemical reaction networks at least 100 times faster than modern desktop machines. It should be possible to get 103 to 106 simulation speedups of genome-scale and organ-scale intracellular and extracellular biochemical reaction networks with improved versions of my chips. Such chips could be important both as analysis tools in systems biology and design tools in synthetic biology.by Soumyajit Mandal.Ph.D
    corecore