11 research outputs found

    Low-Power Design of Digital VLSI Circuits around the Point of First Failure

    Get PDF
    As an increase of intelligent and self-powered devices is forecasted for our future everyday life, the implementation of energy-autonomous devices that can wirelessly communicate data from sensors is crucial. Even though techniques such as voltage scaling proved to effectively reduce the energy consumption of digital circuits, additional energy savings are still required for a longer battery life. One of the main limitations of essentially any low-energy technique is the potential degradation of the quality of service (QoS). Thus, a thorough understanding of how circuits behave when operated around the point of first failure (PoFF) is key for the effective application of conventional energy-efficient methods as well as for the development of future low-energy techniques. In this thesis, a variety of circuits, techniques, and tools is described to reduce the energy consumption in digital systems when operated either in the safe and conservative exact region, close to the PoFF, or even inside the inexact region. A straightforward approach to reduce the power consumed by clock distribution while safely operating in the exact region is dual-edge-triggered (DET) clocking. However, the DET approach is rarely taken, primarily due to the perceived complexity of its integration. In this thesis, a fully automated design flow is introduced for applying DET clocking to a conventional single-edge-triggered (SET) design. In addition, the first static true-single-phase-clock DET flip-flop (DET-FF) that completely avoids clock-overlap hazards of DET registers is proposed. Even though the correct timing of synchronous circuits is ensured in worst-case conditions, the critical path might not always be excited. Thus, dynamic clock adjustment (DCA) has been proposed to trim any available dynamic timing margin by changing the operating clock frequency at runtime. This thesis describes a dynamically-adjustable clock generator (DCG) capable of modifying the period of the produced clock signal on a cycle-by-cycle basis that enables the DCA technique. In addition, a timing-monitoring sequential (TMS) that detects input transitions on either one of the clock phases to enable the selection of the best timing-monitoring strategy at runtime is proposed. Energy-quality scaling techniques aimat trading lower energy consumption for a small degradation on the QoS whenever approximations can be tolerated. In this thesis, a low-power methodology for the perturbation of baseline coefficients in reconfigurable finite impulse response (FIR) filters is proposed. The baseline coefficients are optimized to reduce the switching activity of the multipliers in the FIR filter, enabling the possibility of scaling the power consumption of the filter at runtime. The area as well as the leakage power of many system-on-chips is often dominated by embedded memories. Gain-cell embedded DRAM (GC-eDRAM) is a compact, low-power and CMOS-compatible alternative to the conventional static random-access memory (SRAM) when a higher memory density is desired. However, due to GC-eDRAMs relying on many interdependent variables, the adaptation of existing memories and the design of future GCeDRAMs prove to be highly complex tasks. Thus, the first modeling tool that estimates timing, memory availability, bandwidth, and area of GC-eDRAMs for a fast exploration of their design space is proposed in this thesis

    Design and Implementation of Low Power SRAM Using Highly Effective Lever Shifters

    Get PDF
    The explosive growth of battery-operated devices has made low-power design a priority in recent years. In high-performance Systems-on-Chip, leakage power consumption has become comparable to the dynamic component, and its relevance increases as technology scales. These trends are even more evident for SRAM memory devices since they are a dominant source of standby power consumption in low-power application processors. The on-die SRAM power consumption is particularly important for increasingly pervasive mobile and handheld applications where battery life is a key design and technology attribute. In the SRAM-memory design, SRAM cells also comprise the most significant portion of the total chip. Moreover, the increasing number of transistors in the SRAM memories and the MOSs\u27 increasing leakage current in the scaled technologies have turned the SRAM unit into a power-hungry block for both dynamic and static viewpoints. Although the scaling of the supply voltage enables low-power consumption, the SRAM cells\u27 data stability becomes a major concern. Thus, the reduction of SRAM leakage power has become a critical research concern. To address the leakage power consumption in high-performance cache memories, a stream of novel integrated circuit and architectural level techniques are proposed by researchers including leakage-current management techniques, cell array leakage reduction techniques, bitline leakage reduction techniques, and leakage current compensation techniques. The main goal of this work was to improve the cell array leakage reduction techniques in order to minimize the leakage power for SRAM memory design in low-power applications. This study performs the body biasing application to reduce leakage current as well. To adjust the NMOSs\u27 threshold voltage and consequently leakage current, a negative DC voltage could be applied to their body terminal as a second gate. As a result, in order to generate a negative DC voltage, this study proposes a negative voltage reference that includes a trimming circuit and a negative level shifter. These enhancements are employed to a 10kb SRAM memory operating at 0.3V in a 65nm CMOS process

    Single event upset hardened embedded domain specific reconfigurable architecture

    Get PDF

    Towards trustworthy computing on untrustworthy hardware

    Get PDF
    Historically, hardware was thought to be inherently secure and trusted due to its obscurity and the isolated nature of its design and manufacturing. In the last two decades, however, hardware trust and security have emerged as pressing issues. Modern day hardware is surrounded by threats manifested mainly in undesired modifications by untrusted parties in its supply chain, unauthorized and pirated selling, injected faults, and system and microarchitectural level attacks. These threats, if realized, are expected to push hardware to abnormal and unexpected behaviour causing real-life damage and significantly undermining our trust in the electronic and computing systems we use in our daily lives and in safety critical applications. A large number of detective and preventive countermeasures have been proposed in literature. It is a fact, however, that our knowledge of potential consequences to real-life threats to hardware trust is lacking given the limited number of real-life reports and the plethora of ways in which hardware trust could be undermined. With this in mind, run-time monitoring of hardware combined with active mitigation of attacks, referred to as trustworthy computing on untrustworthy hardware, is proposed as the last line of defence. This last line of defence allows us to face the issue of live hardware mistrust rather than turning a blind eye to it or being helpless once it occurs. This thesis proposes three different frameworks towards trustworthy computing on untrustworthy hardware. The presented frameworks are adaptable to different applications, independent of the design of the monitored elements, based on autonomous security elements, and are computationally lightweight. The first framework is concerned with explicit violations and breaches of trust at run-time, with an untrustworthy on-chip communication interconnect presented as a potential offender. The framework is based on the guiding principles of component guarding, data tagging, and event verification. The second framework targets hardware elements with inherently variable and unpredictable operational latency and proposes a machine-learning based characterization of these latencies to infer undesired latency extensions or denial of service attacks. The framework is implemented on a DDR3 DRAM after showing its vulnerability to obscured latency extension attacks. The third framework studies the possibility of the deployment of untrustworthy hardware elements in the analog front end, and the consequent integrity issues that might arise at the analog-digital boundary of system on chips. The framework uses machine learning methods and the unique temporal and arithmetic features of signals at this boundary to monitor their integrity and assess their trust level

    CIRCUITS AND ARCHITECTURE FOR BIO-INSPIRED AI ACCELERATORS

    Get PDF
    Technological advances in microelectronics envisioned through Moore’s law have led to powerful processors that can handle complex and computationally intensive tasks. Nonetheless, these advancements through technology scaling have come at an unfavorable cost of significantly larger power consumption, which has posed challenges for data processing centers and computers at scale. Moreover, with the emergence of mobile computing platforms constrained by power and bandwidth for distributed computing, the necessity for more energy-efficient scalable local processing has become more significant. Unconventional Compute-in-Memory architectures such as the analog winner-takes-all associative-memory and the Charge-Injection Device processor have been proposed as alternatives. Unconventional charge-based computation has been employed for neural network accelerators in the past, where impressive energy efficiency per operation has been attained in 1-bit vector-vector multiplications, and in recent work, multi-bit vector-vector multiplications. In the latter, computation was carried out by counting quanta of charge at the thermal noise limit, using packets of about 1000 electrons. These systems are neither analog nor digital in the traditional sense but employ mixed-signal circuits to count the packets of charge and hence we call them Quasi-Digital. By amortizing the energy costs of the mixed-signal encoding/decoding over compute-vectors with many elements, high energy efficiencies can be achieved. In this dissertation, I present a design framework for AI accelerators using scalable compute-in-memory architectures. On the device level, two primitive elements are designed and characterized as target computational technologies: (i) a multilevel non-volatile cell and (ii) a pseudo Dynamic Random-Access Memory (pseudo-DRAM) bit-cell. At the level of circuit description, compute-in-memory crossbars and mixed-signal circuits were designed, allowing seamless connectivity to digital controllers. At the level of data representation, both binary and stochastic-unary coding are used to compute Vector-Vector Multiplications (VMMs) at the array level. Finally, on the architectural level, two AI accelerator for data-center processing and edge computing are discussed. Both designs are scalable multi-core Systems-on-Chip (SoCs), where vector-processor arrays are tiled on a 2-layer Network-on-Chip (NoC), enabling neighbor communication and flexible compute vs. memory trade-off. General purpose Arm/RISCV co-processors provide adequate bootstrapping and system-housekeeping and a high-speed interface fabric facilitates Input/Output to main memory

    High-Performance Energy-Efficient and Reliable Design of Spin-Transfer Torque Magnetic Memory

    Get PDF
    In this dissertation new computing paradigms, architectures and design philosophy are proposed and evaluated for adopting the STT-MRAM technology as highly reliable, energy efficient and fast memory. For this purpose, a novel cross-layer framework from the cell-level all the way up to the system- and application-level has been developed. In these framework, the reliability issues are modeled accurately with appropriate fault models at different abstraction levels in order to analyze the overall failure rates of the entire memory and its Mean Time To Failure (MTTF) along with considering the temperature and process variation effects. Design-time, compile-time and run-time solutions have been provided to address the challenges associated with STT-MRAM. The effectiveness of the proposed solutions is demonstrated in extensive experiments that show significant improvements in comparison to state-of-the-art solutions, i.e. lower-power, higher-performance and more reliable STT-MRAM design

    Network-on-Chip

    Get PDF
    Addresses the Challenges Associated with System-on-Chip Integration Network-on-Chip: The Next Generation of System-on-Chip Integration examines the current issues restricting chip-on-chip communication efficiency, and explores Network-on-chip (NoC), a promising alternative that equips designers with the capability to produce a scalable, reusable, and high-performance communication backbone by allowing for the integration of a large number of cores on a single system-on-chip (SoC). This book provides a basic overview of topics associated with NoC-based design: communication infrastructure design, communication methodology, evaluation framework, and mapping of applications onto NoC. It details the design and evaluation of different proposed NoC structures, low-power techniques, signal integrity and reliability issues, application mapping, testing, and future trends. Utilizing examples of chips that have been implemented in industry and academia, this text presents the full architectural design of components verified through implementation in industrial CAD tools. It describes NoC research and developments, incorporates theoretical proofs strengthening the analysis procedures, and includes algorithms used in NoC design and synthesis. In addition, it considers other upcoming NoC issues, such as low-power NoC design, signal integrity issues, NoC testing, reconfiguration, synthesis, and 3-D NoC design. This text comprises 12 chapters and covers: The evolution of NoC from SoC—its research and developmental challenges NoC protocols, elaborating flow control, available network topologies, routing mechanisms, fault tolerance, quality-of-service support, and the design of network interfaces The router design strategies followed in NoCs The evaluation mechanism of NoC architectures The application mapping strategies followed in NoCs Low-power design techniques specifically followed in NoCs The signal integrity and reliability issues of NoC The details of NoC testing strategies reported so far The problem of synthesizing application-specific NoCs Reconfigurable NoC design issues Direction of future research and development in the field of NoC Network-on-Chip: The Next Generation of System-on-Chip Integration covers the basic topics, technology, and future trends relevant to NoC-based design, and can be used by engineers, students, and researchers and other industry professionals interested in computer architecture, embedded systems, and parallel/distributed systems

    Low energy digital circuit design using sub-threshold operation

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, February 2006.Includes bibliographical references (p. 189-202).Scaling of process technologies to deep sub-micron dimensions has made power management a significant concern for circuit designers. For emerging low power applications such as distributed micro-sensor networks or medical applications, low energy operation is the primary concern instead of speed, with the eventual goal of harvesting energy from the environment. Sub-threshold operation offers a promising solution for ultra-low-energy applications because it often achieves the minimum energy per operation. While initial explorations into sub-threshold circuits demonstrate its promise, sub-threshold circuit design remains in its infancy. This thesis makes several contributions that make sub-threshold design more accessible to circuit designers. First, a model for energy consumption in sub-threshold provides an analytical solution for the optimum VDD to minimize energy. Fitting this model to a generic circuit allows easy estimation of the impact of processing and environmental parameters on the minimum energy point. Second, analysis of device sizing for sub-threshold circuits shows the trade-offs between sizing for minimum energy and for minimum voltage operation.(cont.) A programmable FIR filter test chip fabricated in 0.18pum bulk CMOS provides measurements to confirm the model and the sizing analysis. Third, a low-overhead method for integrating sub-threshold operation with high performance applications extends dynamic voltage scaling across orders of magnitude of frequency and provides energy scalability down to the minimum energy point. A 90nm bulk CMOS test chip confirms the range of operation for ultra-dynamic voltage scaling. Finally, sub-threshold operation is extended to memories. Analysis of traditional SRAM bitcells and architectures leads to development of a new bitcell for robust sub-threshold SRAM operation. The sub-threshold SRAM is analyzed experimentally in a 65nm bulk CMOS test chip.by Benton H. Calhoun.Ph.D

    Architektur- und Leistungsanalyse eines Mehgenerationen-SDRAM-Controllers für gemischte Kritikalitätssysteme

    Get PDF
    Due to their high-density and low-cost, DDR SDRAM are the prevailing choice for implementing the main memory of a computer system. Nevertheless, the aforementioned benefits come at the cost of a complex two-stage access protocol, which ultimately means that the time required to serve a memory request depends on the history of previous requests. Otherly stated, DDR SDRAMs are a stateful resource. The main goal of this dissertation is to design a controller that leverages the state of DDR SDRAMs in a mixed criticality environment. More specifically, the controller should provide good average performance for best-effort requestors without compromising timing guarantees for critical requestors. With that regard, this dissertation firstly identifies two challenges of growing relevance for the design of memory controllers for the mixed criticality domain. The first challenge is the data bus turnaround time. The second challenge is the rank-to-rank switching time and only affects multi-rank modules. After pinpointing the two aforementioned challenges, this dissertation proposes a SDRAM controller to tackle them. The proposed controller bundles read and write operations in their corresponding ranks, thus minimizing the number of data bus turnarounds and rank switching events. As a consequence, the average performance of the controller is improved. However, the bundling is carefully designed so that real-time guarantees for critical requestors can be extracted. Moreover, as it will become clear, both the operation of the controller and the corresponding analysis of the temporal properties are described in terms of a generation-independent notation. This is a desirable feature because different SDRAM generations have different architectural features and possibly, timing constraints. Finally, an extensive comparison with the related work is performed. Furthermore, trends in worst-case latency over DDR SDRAM from different speed bins and generations are presented and thoroughly discussed.Aufgrund ihrer hohen Dichte und geringen Kosten sind DDR SDRAM die vorherrschende Wahl für die Implementierung des Hauptspeichers eines Computersystems. Die oben genannten Vorteile gehen jedoch zu Lasten eines komplexen zweistufigen Zugriffsprotokolls, was letztendlich bedeutet, dass die Zeit, die benötigt wird, um eine Speicheranforderung zu bedienen, von der Historie früherer Zugriffe abhängt. Anders ausgedrückt, DDR SDRAM sind eine zustandsabhängige Ressource, was die Umsetzung gemischter Kritikalitäten weiter erschwert, da unterschiedliche Ebenen der Kritikalität widersprüchliche Bedürfnisse haben. Das Hauptziel dieser Dissertation ist es, einen Controller zu entwickeln, der den Zustand der DDR-SDRAMs in einer gemischten Kritikalitätsumgebung nutzt. Genauer gesagt, der Controller soll eine gute durchschnittliche Leistung für best-effort Zugriffe ermöglichen, ohne die Garantien für kritische Zugriffe zu gefährden. In diesem Zusammenhang identifiziert diese Dissertation zunächst zwei Herausforderungen von wachsender Relevanz für das Design von Speichercontrollern für Systeme gemischter Kritikalität. Die erste Herausforderung ist die notwendige Zeit zur Richtungsänderung des Datenbusses. Die zweite Herausforderung ist die Rang-zu-Rang-Schaltzeit und betrifft nur Module mit mehreren Rängen. Nach dem Aufzeigen der beiden oben genannten Herausforderungen, schlägt diese Dissertation einen SDRAM Controller vor, um sie anzugehen. Der vorgeschlagene Controller bündelt Lese und Schreib Operationen in ihren entsprechenden Rängen, wodurch die Anzahl der Richtungsänderungen des Datenbusses und die Anzahl der Rangwechsel minimiert wird. Dadurch wird die durchschnittliche Leistung des Controllers verbessert. Die Bündelung ist so konzipiert, dass Echtzeit-Garantien für kritische Zugriffe abgeleitet werden können. Darüber hinaus werden, wie sich zeigen wird, sowohl das Verhalten des Controllers als auch die entsprechende Analyse der zeitlichen Eigenschaften in Form einer generationsunabhängigen Notation beschrieben. Dies ist ein wünschenswertes Merkmal, da verschiedene SDRAM Generationen unterschiedliche architektonische Merkmale und zeitliche Beschränkungen haben. Abschließend wird ein ausführlicher Vergleich mit inhaltlich verwandten Arbeiten durchgeführt. Außerdem werden Trends in der Worst-Case-Latenz von DDR SDRAM aus verschiedenen Geschwindigkeitsklassen und Generationen vorgestellt und ausführlich diskutiert
    corecore