412 research outputs found

    Designing energy-efficient sub-threshold logic circuits using equalization and non-volatile memory circuits using memristors

    Full text link
    The very large scale integration (VLSI) community has utilized aggressive complementary metal-oxide semiconductor (CMOS) technology scaling to meet the ever-increasing performance requirements of computing systems. However, as we enter the nanoscale regime, the prevalent process variation effects degrade the CMOS device reliability. Hence, it is increasingly essential to explore emerging technologies which are compatible with the conventional CMOS process for designing highly-dense memory/logic circuits. Memristor technology is being explored as a potential candidate in designing non-volatile memory arrays and logic circuits with high density, low latency and small energy consumption. In this thesis, we present the detailed functionality of multi-bit 1-Transistor 1-memRistor (1T1R) cell-based memory arrays. We present the performance and energy models for an individual 1T1R memory cell and the memory array as a whole. We have considered TiO2- and HfOx-based memristors, and for these technologies there is a sub-10% difference between energy and performance computed using our models and HSPICE simulations. Using a performance-driven design approach, the energy-optimized TiO2-based RRAM array consumes the least write energy (4.06 pJ/bit) and read energy (188 fJ/bit) when storing 3 bits/cell for 100 nsec write and 1 nsec read access times. Similarly, HfOx-based RRAM array consumes the least write energy (365 fJ/bit) and read energy (173 fJ/bit) when storing 3 bits/cell for 1 nsec write and 200 nsec read access times. On the logic side, we investigate the use of equalization techniques to improve the energy efficiency of digital sequential logic circuits in sub-threshold regime. We first propose the use of a variable threshold feedback equalizer circuit with combinational logic blocks to mitigate the timing errors in digital logic designed in sub-threshold regime. This mitigation of timing errors can be leveraged to reduce the dominant leakage energy by scaling supply voltage or decreasing the propagation delay. At the fixed supply voltage, we can decrease the propagation delay of the critical path in a combinational logic block using equalizer circuits and, correspondingly decrease the leakage energy consumption. For a 8-bit carry lookahead adder designed in UMC 130 nm process, the operating frequency can be increased by 22.87% (on average), while reducing the leakage energy by 22.6% (on average) in the sub-threshold regime. Overall, the feedback equalization technique provides up to 35.4% lower energy-delay product compared to the conventional non-equalized logic. We also propose a tunable adaptive feedback equalizer circuit that can be used with sequential digital logic to mitigate the process variation effects and reduce the dominant leakage energy component in sub-threshold digital logic circuits. For a 64-bit adder designed in 130 nm our proposed approach can reduce the normalized delay variation of the critical path delay from 16.1% to 11.4% while reducing the energy-delay product by 25.83% at minimum energy supply voltage. In addition, we present detailed energy-performance models of the adaptive feedback equalizer circuit. This work serves as a foundation for the design of robust, energy-efficient digital logic circuits in sub-threshold regime

    Approximate logic circuits: Theory and applications

    Get PDF
    CMOS technology scaling, the process of shrinking transistor dimensions based on Moore's law, has been the thrust behind increasingly powerful integrated circuits for over half a century. As dimensions are scaled to few tens of nanometers, process and environmental variations can significantly alter transistor characteristics, thus degrading reliability and reducing performance gains in CMOS designs with technology scaling. Although design solutions proposed in recent years to improve reliability of CMOS designs are power-efficient, the performance penalty associated with these solutions further reduces performance gains with technology scaling, and hence these solutions are not well-suited for high-performance designs. This thesis proposes approximate logic circuits as a new logic synthesis paradigm for reliable, high-performance computing systems. Given a specification, an approximate logic circuit is functionally equivalent to the given specification for a "significant" portion of the input space, but has a smaller delay and power as compared to a circuit implementation of the original specification. This contributions of this thesis include (i) a general theory of approximation and efficient algorithms for automated synthesis of approximations for unrestricted random logic circuits, (ii) logic design solutions based on approximate circuits to improve reliability of designs with negligible performance penalty, and (iii) efficient decomposition algorithms based on approxiiii mate circuits to improve performance of designs during logic synthesis. This thesis concludes with other potential applications of approximate circuits and identifies. open problems in logic decomposition and approximate circuit synthesis

    SWIFT: A Low-Power Network-On-Chip Implementing the Token Flow Control Router Architecture With Swing-Reduced Interconnects

    Get PDF
    A 64-bit, 8 × 8 mesh network-on-chip (NoC) is presented that uses both new architectural and circuit design techniques to improve on-chip network energy-efficiency, latency, and throughput. First, we propose token flow control, which enables bypassing of flit buffering in routers, thereby reducing buffer size and their power consumption. We also incorporate reduced-swing signaling in on-chip links and crossbars to minimize datapath interconnect energy. The 64-node NoC is experimentally validated with a 2 × 2 test chip in 90 nm, 1.2 V CMOS that incorporates traffic generators to emulate the traffic of the full network. Compared with a fully synthesized baseline 8 × 8 NoC architecture designed to meet the same peak throughput, the fabricated prototype reduces network latency by 20% under uniform random traffic, when both networks are run at their maximum operating frequencies. When operated at the same frequencies, the SWIFT NoC reduces network power by 38% and 25% at saturation and low loads, respectively

    An in-depth look at prior art in fast round-robin arbiter circuits

    Get PDF
    Arbiters are found where shared resources exist such as busses, switching fabrics, processing elements. Round-robin is a fair arbitration method, where requestors get near-equal shares of a common resource or service. Round-robin arbitration (RRA) finds use in network switches/routers and processor boards/systems as well as many other applications that have concurrency. Today's electronic systems require arbiters with hundreds of ports (e.g., switching fabrics with virtual I/O queues) and clock speeds near the limits of even the latest microelectronics fabrication processes/libraries. Achieving high clock speeds in the presence of large number of ports is only possible with highly parallel arbiter architectures. This paper presents an in-depth literature survey of previous work on this problem. It looks at RRA work in the literature in a bigger context, then defines the typical RRA problem (RRA_typical), and specifically investigates work on fast architectures that solve the RRA_typical problem. There are five such works that are really competitive. This report takes a very in-depth look at these works. It explains each architecture and how/why it works from a unique perspective that cannot be found in the original publication of that architecture. It also proposes improvements to these architectures. We wrote generators for the improved versions of these architectures. We will share a summary of synthesis results in this report – although a detailed account of how these results were obtained and their analysis is the subject of another (upcoming) publicatio

    Reliable Software for Unreliable Hardware - A Cross-Layer Approach

    Get PDF
    A novel cross-layer reliability analysis, modeling, and optimization approach is proposed in this thesis that leverages multiple layers in the system design abstraction (i.e. hardware, compiler, system software, and application program) to exploit the available reliability enhancing potential at each system layer and to exchange this information across multiple system layers

    Towards trainable synthesis for optimized circuit deployment on FPGA

    Get PDF
    Field Programmable Gate Arrays (FPGAs) utilize multiple programmable elements and non-programmable blocks. After synthesizing an input Hardware Design Language (HDL) design into a circuit, optimizations are used to discover a satisfactory deployment on a target FPGA. HDLs' compound operations, such as addition, can be implemented in various ways and thus, multiple but functionally equivalent circuits can be synthesized. To leverage this, we propose a methodology that first enables configurable synthesis of compound operations. Second, it trains the system using a set of HDL files and architectures to optimize target performance objectives, such as critical path length and power. We prototyped our technique in the open source Verilog-To-Routing (VTR) tool. We subsequently produced two configuration files targeting different deployment objectives; experimental results with the VTR Verilog benchmarks revealed significant improvements
    corecore