67 research outputs found

    Near- and Sub-Threshold Design for Ultra-Low-Power Embedded Systems

    Get PDF
    Ultra-low-power (ULP) software-programmable architectures are gradually replacing dedicated VLSI circuits in many applications, including health care and other critical areas. However, the cost for more flexibility is the less frugal use of energy. This cost can be partially recovered by aggressive supply voltage scaling, often deep into the sub-threshold regime, which, however, raises concerns on performance, standby leakage, and reliability. In this talk, we will discuss some of the issues and possible solutions to ULP computing and embedded systems desigm at scaled voltages. We will discuss architectural choices and circuit level aspects and illustrate them with examples including robust Sub-VT memories, ULP multi-core systems, and Sub-VT application specific processors

    Two-Port Low-Power Gain-Cell Storage Array: Voltage Scaling and Retention Time

    Get PDF
    The impact of supply voltage scaling on the retention time of a 2-transistor (2T) gain-cell (GC) storage array is investigated, in order to enable low-power/low-voltage data storage. The retention time can be increased when scaling down the supply voltage for a given access statistics and a given write bit-line (WBL) control scheme. Moreover, for a given supply voltage, the retention time can be further increased by controlling the WBL to a voltage level between the supply rails during idle and read states. These two concepts are proved by means of Spectre simulation of a GC-storage array implemented in 180-nm CMOS technology. The proposed 2-kb storage macro is operated at only 40% of the nominal supply voltage and leverages the GCs to enable two-port operation with a negligible area-increase compared to a single-port implementation

    Standard-Cell Based Memories (SCMs): from Sub-VT to Error-Resilient Systems

    Get PDF
    Embedded memories consume an increasingly dominant part of the overall area and power of a large variety of systems-on-chip [ITRS’09]: 1) biomedical implants and wireless sensor networks require robust memories operating in the sub-VT domain; 2) many handheld devices and microprocessors are operated near to threshold-voltage; and 3) fault-tolerant systems/error-resilient computing has attracted interest due to increaing process variations. Standard-cell based memories (SCMs) entail minimum design effort and are immediately functional in any system from reliable sub-VT to error-resilient high-performance. In particular, sub-VT SCMs ensure robustness and improve access bandwidth and energy-efficiency compared to sub-VT SRAM macros. Adding only one custom cell (low-leakage latch) to a commercial standard-cell library further improves energy-efficiency of sub-VT SCMs. In fault-tolerant systems requiring small data retention times, a small amount of errors in the memory content does not severely impede system functionality, and dynamic latches yield SCMs smaller than commercial 6T SRAM macros for storage capacities up to at least 2kb. Various silicon-prooven SCM architectures are presented, and the best-practice SCM implementations for both sub-VT and above-VT applications are derived. To reduce leakage power in sub-VT SCMs, a latch with few highly resistive VDD-ground path is designed using transistor stacking and stretching. For the benefit of smaller silicon area, but at the cost of reduced robustness, various dynamic latches are integrated in the SCM compilation flow

    Replica Bit-Line Technique for Embedded Multilevel Gain-Cell DRAM

    Get PDF
    Multilevel gain-cell DRAMs are interesting to improve the area-efficiency of modern fault-tolerant systems-on-chip implemented in deep-submicron CMOS technologies. This paper addresses the problem of long access times in such multilevel gain-cell DRAMs, which are further aggravated by process parameter variations. A replica bit-line (BL) technique, previously proposed for SRAM, is adapted to speed up the multilevel read operation at a negligible area-increase. Moreover, the same replica column is used to improve the write access time. An 8-kb DRAM macro implemented in 90-nm CMOS technology shows that the replica column is able to successfully track die-to-die process, voltage, and temperature variations to generate control signals with optimum delay. Finally, Monte-Carlo simulations show that a small timing margin of 100 ps is sufficient to also cope with within-die process variations

    Design and failure analysis of logic-compatible multilevel gain-cell-based DRAM for fault-tolerant VLSI systems

    Get PDF
    This paper considers the problem of increasing the storage density in fault-tolerant VLSI systems which require only limited data retention times. To this end, the concept of storing many bits per memory cell is applied to area-efficient and fully logic-compatible gain-cell-based dynamic memories. A memory macro in 90-nm CMOS technology including multilevel write and read circuits is proposed and analyzed with respect to its read failure probability due to within-die process variations by means of Monte Carlo simulations

    A 15.8 pJ/bit/iter quasi-cyclic LDPC decoder for IEEE 802.11n in 90 nm CMOS

    Get PDF
    We present a low-power quasi-cyclic (QC) low density parity check (LDPC) decoder that meets the throughput requirements of the highest-rate (600 Mbps) modes of the IEEE 802.11n WLAN standard. The design is based on the layered offset-min-sum algorithm and is runtime-programmable to process different code matrices (including all rates and block lengths specified by IEEE 802.11n). The register-transfer-level implementation has been optimized for best energy efficiency. The corresponding 90nm CMOS ASIC has a core area of 1.77mm2 and achieves a maximum throughput of 680 Mbps at 346MHz clock frequency and 10 decoding iterations. The measured energy efficiency is 15.8 pJ/bit/iteration at a nominal operating voltage of 1.0V

    An Overlap-Contention Free True-Single-Phase Clock Dual-Edge-Triggered Flip-Flop

    Get PDF
    Dual-edge-triggered (DET) synchronous operation is a very attractive option for low-power, high-performance designs. Compared to conventional single-edge synchronous systems, DET operation is capable of providing the same throughput at half the clock frequency. This can lead to significant power savings on the clock network that is often one of the major contributors to total system power. However, in order to implement DET operation, special registers need to be introduced that sample data on both clock-edges. These registers are more complex than their single-edge counterparts, and often suffer from a certain amount of clock-overlap between the main clock and the internally generated inverted clock. This overlap can cause contention inside the cell and lead to logic failures, especially when operating at scaled power supplies and under process variations that characterize nanometer technologies. This paper presents a novel, static DET flip-flop (DET-FF) with a true-single-phase clock that completely avoids clock overlap hazards by eliminating the need for an inverted clock edge for functionality. The proposed DET FF was implemented in a standard 40nm CMOS technology, showing full functionality at low-voltage operating points, where conventional DET-FFs fail. Under a near-threshold, 500mV supply voltage, the proposed cell also provides a 35% lower CK-to-Q delay and the lowest power-delay-product compared to all considered DET-FF implementations. © 2015 IEEE

    Fractionally Spaced Complex Sub-Nyquist Sampling for Multi-Gigabit 60 GHz Wireless Communication

    Get PDF
    A novel analog front-end architecture based on complex sub-Nyquist sampling for the intermediate frequency (IF) stage of a mmWave receiver is proposed. With this front-end, the use of a wideband hybrid coupler and two half-rate analog-to-digital converters (ADCs) allow for a flexible placement of the IF. It is shown that digital compensation of the impairments introduced by the non-ideal 90 degree hybrid coupler is required to use high modulation orders. Further a digital signal processing (DSP) architecture is presented which performs equalization of a fractionally spaced sub-sampled IF signal in frequency domain (FD) and integrates the compensation of the impairments with low overhead. Based on this DSP architecture a working 60GHz single-carrier link is demonstrated. Measurement results show the feasibility of 256QAM modulated transmission with a bandwidth of up to 1.8 GHz and a resulting raw data rate of 12.8 Gb/s using our frontend architecture with the digital FD compensation
    • …
    corecore