101 research outputs found

    Embedding Logic and Non-volatile Devices in CMOS Digital Circuits for Improving Energy Efficiency

    Get PDF
    abstract: Static CMOS logic has remained the dominant design style of digital systems for more than four decades due to its robustness and near zero standby current. Static CMOS logic circuits consist of a network of combinational logic cells and clocked sequential elements, such as latches and flip-flops that are used for sequencing computations over time. The majority of the digital design techniques to reduce power, area, and leakage over the past four decades have focused almost entirely on optimizing the combinational logic. This work explores alternate architectures for the flip-flops for improving the overall circuit performance, power and area. It consists of three main sections. First, is the design of a multi-input configurable flip-flop structure with embedded logic. A conventional D-type flip-flop may be viewed as realizing an identity function, in which the output is simply the value of the input sampled at the clock edge. In contrast, the proposed multi-input flip-flop, named PNAND, can be configured to realize one of a family of Boolean functions called threshold functions. In essence, the PNAND is a circuit implementation of the well-known binary perceptron. Unlike other reconfigurable circuits, a PNAND can be configured by simply changing the assignment of signals to its inputs. Using a standard cell library of such gates, a technology mapping algorithm can be applied to transform a given netlist into one with an optimal mixture of conventional logic gates and threshold gates. This approach was used to fabricate a 32-bit Wallace Tree multiplier and a 32-bit booth multiplier in 65nm LP technology. Simulation and chip measurements show more than 30% improvement in dynamic power and more than 20% reduction in core area. The functional yield of the PNAND reduces with geometry and voltage scaling. The second part of this research investigates the use of two mechanisms to improve the robustness of the PNAND circuit architecture. One is the use of forward and reverse body biases to change the device threshold and the other is the use of RRAM devices for low voltage operation. The third part of this research focused on the design of flip-flops with non-volatile storage. Spin-transfer torque magnetic tunnel junctions (STT-MTJ) are integrated with both conventional D-flipflop and the PNAND circuits to implement non-volatile logic (NVL). These non-volatile storage enhanced flip-flops are able to save the state of system locally when a power interruption occurs. However, manufacturing variations in the STT-MTJs and in the CMOS transistors significantly reduce the yield, leading to an overly pessimistic design and consequently, higher energy consumption. A detailed analysis of the design trade-offs in the driver circuitry for performing backup and restore, and a novel method to design the energy optimal driver for a given yield is presented. Efficient designs of two nonvolatile flip-flop (NVFF) circuits are presented, in which the backup time is determined on a per-chip basis, resulting in minimizing the energy wastage and satisfying the yield constraint. To achieve a yield of 98%, the conventional approach would have to expend nearly 5X more energy than the minimum required, whereas the proposed tunable approach expends only 26% more energy than the minimum. A non-volatile threshold gate architecture NV-TLFF are designed with the same backup and restore circuitry in 65nm technology. The embedded logic in NV-TLFF compensates performance overhead of NVL. This leads to the possibility of zero-overhead non-volatile datapath circuits. An 8-bit multiply-and- accumulate (MAC) unit is designed to demonstrate the performance benefits of the proposed architecture. Based on the results of HSPICE simulations, the MAC circuit with the proposed NV-TLFF cells is shown to consume at least 20% less power and area as compared to the circuit designed with conventional DFFs, without sacrificing any performance.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201

    BOOLEAN AND BRAIN-INSPIRED COMPUTING USING SPIN-TRANSFER TORQUE DEVICES

    Get PDF
    Several completely new approaches (such as spintronic, carbon nanotube, graphene, TFETs, etc.) to information processing and data storage technologies are emerging to address the time frame beyond current Complementary Metal-Oxide-Semiconductor (CMOS) roadmap. The high speed magnetization switching of a nano-magnet due to current induced spin-transfer torque (STT) have been demonstrated in recent experiments. Such STT devices can be explored in compact, low power memory and logic design. In order to truly leverage STT devices based computing, researchers require a re-think of circuit, architecture, and computing model, since the STT devices are unlikely to be drop-in replacements for CMOS. The potential of STT devices based computing will be best realized by considering new computing models that are inherently suited to the characteristics of STT devices, and new applications that are enabled by their unique capabilities, thereby attaining performance that CMOS cannot achieve. The goal of this research is to conduct synergistic exploration in architecture, circuit and device levels for Boolean and brain-inspired computing using nanoscale STT devices. Specifically, we first show that the non-volatile STT devices can be used in designing configurable Boolean logic blocks. We propose a spin-memristor threshold logic (SMTL) gate design, where memristive cross-bar array is used to perform current mode summation of binary inputs and the low power current mode spintronic threshold device carries out the energy efficient threshold operation. Next, for brain-inspired computing, we have exploited different spin-transfer torque device structures that can implement the hard-limiting and soft-limiting artificial neuron transfer functions respectively. We apply such STT based neuron (or ‘spin-neuron’) in various neural network architectures, such as hierarchical temporal memory and feed-forward neural network, for performing “human-like” cognitive computing, which show more than two orders of lower energy consumption compared to state of the art CMOS implementation. Finally, we show the dynamics of injection locked Spin Hall Effect Spin-Torque Oscillator (SHE-STO) cluster can be exploited as a robust multi-dimensional distance metric for associative computing, image/ video analysis, etc. Our simulation results show that the proposed system architecture with injection locked SHE-STOs and the associated CMOS interface circuits can be suitable for robust and energy efficient associative computing and pattern matching

    Joint Transceiver Design Algorithms for Multiuser MISO Relay Systems with Energy Harvesting

    Full text link
    In this paper, we investigate a multiuser relay system with simultaneous wireless information and power transfer. Assuming that both base station (BS) and relay station (RS) are equipped with multiple antennas, this work studies the joint transceiver design problem for the BS beamforming vectors, the RS amplify-and-forward transformation matrix and the power splitting (PS) ratios at the single-antenna receivers. Firstly, an iterative algorithm based on alternating optimization (AO) and with guaranteed convergence is proposed to successively optimize the transceiver coefficients. Secondly, a novel design scheme based on switched relaying (SR) is proposed that can significantly reduce the computational complexity and overhead of the AO based designs while maintaining a similar performance. In the proposed SR scheme, the RS is equipped with a codebook of permutation matrices. For each permutation matrix, a latent transceiver is designed which consists of BS beamforming vectors, optimally scaled RS permutation matrix and receiver PS ratios. For the given CSI, the optimal transceiver with the lowest total power consumption is selected for transmission. We propose a concave-convex procedure based and subgradient-type iterative algorithms for the non-robust and robust latent transceiver designs. Simulation results are presented to validate the effectiveness of all the proposed algorithms

    Energy-Efficient Digital Circuit Design using Threshold Logic Gates

    Get PDF
    abstract: Improving energy efficiency has always been the prime objective of the custom and automated digital circuit design techniques. As a result, a multitude of methods to reduce power without sacrificing performance have been proposed. However, as the field of design automation has matured over the last few decades, there have been no new automated design techniques, that can provide considerable improvements in circuit power, leakage and area. Although emerging nano-devices are expected to replace the existing MOSFET devices, they are far from being as mature as semiconductor devices and their full potential and promises are many years away from being practical. The research described in this dissertation consists of four main parts. First is a new circuit architecture of a differential threshold logic flipflop called PNAND. The PNAND gate is an edge-triggered multi-input sequential cell whose next state function is a threshold function of its inputs. Second a new approach, called hybridization, that replaces flipflops and parts of their logic cones with PNAND cells is described. The resulting \hybrid circuit, which consists of conventional logic cells and PNANDs, is shown to have significantly less power consumption, smaller area, less standby power and less power variation. Third, a new architecture of a field programmable array, called field programmable threshold logic array (FPTLA), in which the standard lookup table (LUT) is replaced by a PNAND is described. The FPTLA is shown to have as much as 50% lower energy-delay product compared to conventional FPGA using well known FPGA modeling tool called VPR. Fourth, a novel clock skewing technique that makes use of the completion detection feature of the differential mode flipflops is described. This clock skewing method improves the area and power of the ASIC circuits by increasing slack on timing paths. An additional advantage of this method is the elimination of hold time violation on given short paths. Several circuit design methodologies such as retiming and asynchronous circuit design can use the proposed threshold logic gate effectively. Therefore, the use of threshold logic flipflops in conventional design methodologies opens new avenues of research towards more energy-efficient circuits.Dissertation/ThesisDoctoral Dissertation Computer Science 201

    Dense implementations of binary cellular nonlinear networks : from CMOS to nanotechnology

    Get PDF
    This thesis deals with the design and hardware realization of the cellular neural/nonlinear network (CNN)-type processors operating on data in the form of black and white (B/W) images. The ultimate goal is to achieve a very compact yet versatile cell structure that would allow for building a network with a very large spatial resolution. It is very important to be able to implement an array with a great number of cells on a single die. Not only it improves the computational power of the processor, but it might be the enabling factor for new applications as well. Larger resolution can be achieved in two ways. First, the cell functionality and operating principles can be tailored to improve the layout compactness. The other option is to use more advanced fabrication technology – either a newer, further downscaled CMOS process or one of the emerging nanotechnologies. It can be beneficial to realize an array processor as two separate parts – one dedicated for gray-scale and the other for B/W image processing, as their designs can be optimized. For instance, an implementation of a CNN dedicated for B/W image processing can be significantly simplified. When working with binary images only, all coefficients in the template matrix can also be reduced to binary values. In this thesis, such a binary programming scheme is presented as a means to reduce the cell size as well as to provide the circuits composed of emerging nanodevices with an efficient programmability. Digital programming can be very fast and robust, and leads to very compact coefficient circuits. A test structure of a binary-programmable CNN has been designed and implemented with standard 0.18 ”m CMOS technology. A single cell occupies only 155 ”m2, which corresponds to a cell density of 6451 cells per square millimeter. A variety of templates have been tested and the measured chip performance is discussed. Since the minimum feature size of modern CMOS devices has already entered the nanometer scale, and the limitations of further scaling are projected to be reached within the next decade or so, more and more interest and research activity is attracted by nanotechnology. Investigation of the quantum physics phenomena and development of new devices and circuit concepts, which would allow to overcome the CMOS limitations, is becoming an increasingly important science. A single-electron tunneling (SET) transistor is one of the most attractive nanodevices. While relying on the Coulomb interactions, these devices can be connected directly with a wire or through a coupling capacitance. To develop suitable structures for implementing the binary programming scheme with capacitive couplings, the CNN cell based on the floating gate MOSFET (FG-MOSFET) has been designed. This approach can be considered as a step towards a programmable cell implementation with nanodevices. Capacitively coupled CNN has been simulated and the presented results confirm the proper operation. Therefore, the same circuit strategies have also been applied to the CNN cell designed for SET technology. The cell has been simulated to work well with the binary programming scheme applied. This versatile structure can be implemented either as a pure SET design or as a SET-FET hybrid. In addition to the designs mentioned above, a number of promising nanodevices and emerging circuit architectures are introduced.reviewe

    Proceedings of Monterey Workshop 2001 Engineering Automation for Sofware Intensive System Integration

    Get PDF
    The 2001 Monterey Workshop on Engineering Automation for Software Intensive System Integration was sponsored by the Office of Naval Research, Air Force Office of Scientific Research, Army Research Office and the Defense Advance Research Projects Agency. It is our pleasure to thank the workshop advisory and sponsors for their vision of a principled engineering solution for software and for their many-year tireless effort in supporting a series of workshops to bring everyone together.This workshop is the 8 in a series of International workshops. The workshop was held in Monterey Beach Hotel, Monterey, California during June 18-22, 2001. The general theme of the workshop has been to present and discuss research works that aims at increasing the practical impact of formal methods for software and systems engineering. The particular focus of this workshop was "Engineering Automation for Software Intensive System Integration". Previous workshops have been focused on issues including, "Real-time & Concurrent Systems", "Software Merging and Slicing", "Software Evolution", "Software Architecture", "Requirements Targeting Software" and "Modeling Software System Structures in a fastly moving scenario".Office of Naval ResearchAir Force Office of Scientific Research Army Research OfficeDefense Advanced Research Projects AgencyApproved for public release, distribution unlimite

    Fault and Defect Tolerant Computer Architectures: Reliable Computing With Unreliable Devices

    Get PDF
    This research addresses design of a reliable computer from unreliable device technologies. A system architecture is developed for a fault and defect tolerant (FDT) computer. Trade-offs between different techniques are studied and yield and hardware cost models are developed. Fault and defect tolerant designs are created for the processor and the cache memory. Simulation results for the content-addressable memory (CAM)-based cache show 90% yield with device failure probabilities of 3 x 10(-6), three orders of magnitude better than non fault tolerant caches of the same size. The entire processor achieves 70% yield with device failure probabilities exceeding 10(-6). The required hardware redundancy is approximately 15 times that of a non-fault tolerant design. While larger than current FT designs, this architecture allows the use of devices much more likely to fail than silicon CMOS. As part of model development, an improved model is derived for NAND Multiplexing. The model is the first accurate model for small and medium amounts of redundancy. Previous models are extended to account for dependence between the inputs and produce more accurate results

    Long Duration Exposure Facility (LDEF). Mission 1 Experiments

    Get PDF
    Spaceborne experiments using the space shuttle payload known as the Long Duration Exposure Facility are described. Experiments in the fields of materials, coatings, thermal systems, power and propulsion, electronic, and optics are discussed

    Simulation of Clinical PET Studies for the Assessment of Quantification Methods

    Get PDF
    On this PhD thesis we developed a methodology for evaluating the robustness of SUV measurements based on MC simulations and the generation of novel databases of simulated studies based on digital anthropomorphic phantoms. This methodology has been applied to different problems related to quantification that were not previously addressed. Two methods for estimating the extravasated dose were proposed andvalidated in different scenarios using MC simulations. We studied the impact of noise and low counting in the accuracy and repeatability of three commonly used SUV metrics (SUVmax, SUVmean and SUV50). The same model was used to study the effect of physiological muscular uptake variations on the quantification of FDG-PET studies. Finally, our MC models were applied to simulate 18F-fluorocholine (FCH) studies. The aim was to study the effect of spill-in counts from neighbouring regions on the quantification of small regions close to high activity extended sources
    • 

    corecore