8 research outputs found
Performance-Driven Energy-Efficient VLSI.
Today, there are two prevalent platforms in VLSI systems: high-performance and ultra-low power. High-speed designs, usually operating at GHz level, provide the required computation abilities to systems but also consume a large amount of power; microprocessors and signal processing units are examples of this type of designs. For ultra-low power designs, voltage scaling methods are usually used to reduce power consumption and extend battery life. However, circuit delay in ultra-low power designs increases exponentially, as voltage is scaled below Vth, and subthreshold leakage energy also increases in a near-exponential fashion.
Many methods have been proposed to address key design challenges on these two platforms, energy consumption in high-performance designs, and performance/reliability in ultra-low power designs. In this thesis, charge-recovery design is explored as a solution targeting both platforms to achieve increased energy efficiency over conventional CMOS designs without compromising performance or reliability.
To improve performance while still achieving high energy efficiency for ultra-low power designs, we propose Subthreshold Boost Logic (SBL), a new circuit family that relies on charge-recovery design techniques to achieve order-of-magnitude improvements in operating frequencies, and achieve high energy efficiency compared to conventional subthreshold designs. To demonstrate the performance and energy efficiency of SBL, we present a 14-tap 8-bit finite-impulse response (FIR) filter test-chip fabricated in a 0.13µm process. With a single 0.27V supply, the test-chip achieves its most energy efficient operating point at 20MHz, consuming 15.57pJ per cycle with a recovery rate of 89% and a FoM equal to 17.37 nW/Tap/MHz/InBit/CoeffBit.
To reduce energy consumption at multi-GHz level frequencies, we explore the application of resonant-clocking to the design of a 5-bit non-interleaved resonant-clock ash ADC with a sampling rate of 7GS/s. The ADC has been designed in a 65nm bulk CMOS process. An integrated 0.77nH inductor is used to resonate the entire clock distribution network to achieve energy efficient operation. Operating at 5.5GHz, the ADC consumes 28mW, yielding 396fJ per conversion step. The clock network accounts for 10.7% of total power and consumes 54% less energy over CV^2. By comparison, in a typical ash ADC design, 30% of total power is clock-related.Ph.D.Electrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/89779/1/wsma_1.pd
Energy-Efficient Digital Signal Processing Hardware Design.
As CMOS technology has developed considerably in the last few decades, many SoCs have been implemented across different application areas due to reduced area and power consumption. Digital signal processing (DSP) algorithms are frequently employed in these systems to achieve more accurate operation or faster computation. However, CMOS technology scaling started to slow down recently and relatively large systems consume too much power to rely only on the scaling effect while system power budget such as battery capacity improves slowly. In addition, there exist increasing needs for miniaturized computing systems including sensor nodes that can accomplish similar operations with significantly smaller power budget.
Voltage scaling is one of the most promising power saving techniques due to quadratic switching power reduction effect, making it necessary feature for even high-end processors. However, in order to achieve maximum possible energy efficiency, systems should operate in near or sub-threshold regimes where leakage takes significant portion of power.
In this dissertation, a few key energy-aware design approaches are described. Considering prominent leakage and larger PVT variability in low operating voltages, multi-level energy saving techniques to be described are applied to key building blocks in DSP applications: architecture study, algorithm-architecture co-optimization, and robust yet low-power memory design. Finally, described approaches are applied to design examples including a visual navigation accelerator, ultra-low power biomedical SoC and face detection/recognition processor, resulting in 2~100 times power savings than state-of-the-art.PhDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/110496/1/djeon_1.pd
ULTRALOW-POWER, LOW-VOLTAGE DIGITAL CIRCUITS FOR BIOMEDICAL SENSOR NODES
Ph.DDOCTOR OF PHILOSOPH
Near-Threshold Computing: Past, Present, and Future.
Transistor threshold voltages have stagnated in recent years, deviating from constant-voltage scaling theory and directly limiting supply voltage scaling. To overcome the resulting energy and power dissipation barriers, energy efficiency can be improved through aggressive voltage scaling, and there has been increased interest in operating at near-threshold computing (NTC) supply voltages. In this region sizable energy gains are achieved with moderate performance loss, some of which can be regained through parallelism.
This thesis first provides a methodical definition of how near to threshold is "near threshold" and continues with an in-depth examination of NTC across past, present, and future CMOS technologies. By systematically defining near-threshold, the trends and tradeoffs are analyzed, lending insight in how best to design and optimize near-threshold systems.
NTC works best for technologies that feature good circuit delay scalability, therefore technologies without strong short-channel effects. Early planar technologies (prior to 90nm or so) featured good circuit scalability (8x energy gains), but lacked area in which to add cores for parallelization. Recent planar nodes (32nm – 20nm) feature more area for cores but suffer from poor delay scalability, and so are not well-suited for NTC (4x energy gains).
The switch to FinFET CMOS technology allows for a return to strong voltage scalability (8x gain), reversing trends seen in planar technologies, while dark silicon has created an opportunity to add cores for parallelization. Improved FinFET voltage scalability even allows for latency reduction of a single task, as long as the task is sufficiently parallelizable (< 10% serial code).
Finally, we will look at a technique for fast voltage boosting, called Shortstop, in which a core's operating voltage is raised in 10s of cycles. Shortstop can be used to quickly respond to single-threaded performance demands of a near-threshold system by leveraging the innate parasitic inductance of a dedicated dirty supply rail, further improving energy efficiency. The technique is demonstrated in a wirebond implementation and is able to boost a core up to 1.8x faster than a header-based approach, while reducing supply droop by 2-7x. An improved flip-chip architecture is also proposed.PhDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113600/1/npfet_1.pd
Timing-Error Tolerance Techniques for Low-Power DSP: Filters and Transforms
Low-power Digital Signal Processing (DSP) circuits are critical to commercial System-on-Chip design for battery powered devices. Dynamic Voltage Scaling (DVS) of digital circuits can reclaim worst-case supply voltage margins for delay variation, reducing power consumption. However, removing static margins without compromising robustness is tremendously challenging, especially in an era of escalating reliability concerns due to continued process scaling. The Razor DVS scheme addresses these concerns, by ensuring robustness using explicit timing-error detection and correction circuits. Nonetheless, the design of low-complexity and low-power error correction is often challenging. In this thesis, the Razor framework is applied to fixed-precision DSP filters and transforms. The inherent error tolerance of many DSP algorithms is exploited to achieve very low-overhead error correction. Novel error correction schemes for DSP datapaths are proposed, with very low-overhead circuit realisations. Two new approximate error correction approaches are proposed. The first is based on an adapted sum-of-products form that prevents errors in intermediate results reaching the output, while the second approach forces errors to occur only in less significant bits of each result by shaping the critical path distribution. A third approach is described that achieves exact error correction using time borrowing techniques on critical paths. Unlike previously published approaches, all three proposed are suitable for high clock frequency implementations, as demonstrated with fully placed and routed FIR, FFT and DCT implementations in 90nm and 32nm CMOS. Design issues and theoretical modelling are presented for each approach, along with SPICE simulation results demonstrating power savings of 21 – 29%. Finally, the design of a baseband transmitter in 32nm CMOS for the Spectrally Efficient FDM (SEFDM) system is presented. SEFDM systems offer bandwidth savings compared to Orthogonal FDM (OFDM), at the cost of increased complexity and power consumption, which is quantified with the first VLSI architecture
Recommended from our members
Variation-Tolerant and Voltage-Scalable Integrated Circuits Design
Ultra-low-voltage (ULV) operation where the supply voltage of the digital computing hardware is scaled down to the level near or below transistor threshold voltage (e.g. 300-500mV) is a key technique to achieve high computing energy efficiency. It has enabled many new exciting applications in the field of Internet of Things (IoT) devices and energy-constrained applications such as medical implants, environment sensors, and micro-robots. Ultra-low-voltage (ULV) operation is also commonly used with the emerging architectures that are often non Von-Neumann style to empower energy-efficient cognitive computing.
One the biggest challenge in realizing ULV design is the large circuit delay variability. To guarantee functionality in the worst-case process, voltage, and temperature (PVT) condition, the traditional safety margin approach requires operating at a slower clock frequency or higher supply voltage which significantly limits the achievable energy efficiency of the hardware. To fully claim the energy efficiency of ULV, the large circuit delay variation needs to be adaptively handled. However, the existing adaptive techniques that are optimized for nominal supply voltage operation and traditional Von-Neumann architectures become inefficient for ULV designs and emerging architectures.
This thesis presents adaptive techniques based on timing error detection and correction (EDAC) that are more suitable for the energy-constrained ULV designs and the emerging architectures. The proposed techniques are demonstrated in three test chips: (1) R-Processor: A 0.4V resilient processor with a voltage-scalable and low-overhead in-situ EDAC technique. It achieves 38% energy efficiency improvement or 2.3X throughput improvement as compared to the traditional safety margin approach. (2) A 450mV timing-margin-free waveform sorter for brain computer interface (BCI) microsystem. It achieves 49.3% higher energy efficiency and 35.6% higher throughput than the traditional safety margin approach. (3) Ultra-low-power and robust power-management system which consists of a microprocessor employing ULV EDAC, 63-ratio integrated switched-capacitor DC-DC converter, and a fully-digital error based regulation controller.
In this thesis, we also explore circuits for emerging techniques. The first is temperature sensors for dynamic-thermal-management (DTM). The modern high-performance microprocessors suffer from ever-increasing power densities which has led to reliability concerns and increased cooling costs from excessive heat. In order to monitor and manage the thermal behavior, DTM techniques embed multiple temperature sensors and use its information. The size, accuracy, and voltage-scalability of the sensor are critical for the performance of DTM. Therefore, we propose a temperature sensor that directly senses transistor threshold voltage and the test chip demonstrates 9X smaller area, 3X higher accuracy, and 200mV lower voltage scalability (down to 400mV) than the previous state-of-art.
Another area of exploration is interconnect design for ultra-dynamic-voltage-scaling (UDVS) systems. UDVS has been proposed for applications that require both high performance and high energy efficiency. UDVS can provide peak performance with nominal supply voltage when work load is high. When work load is moderate or low, UDVS systems can switch to ULV operation for higher energy efficiency. One of the critical challenges for developing UDVS systems is the inflexibility in various circuit fabrics that are often optimized for a single supply voltage. One critical example is conventional repeater based long interconnects which suffers from non-optimal performance and energy efficiency in UDVS systems. Therefore, in this thesis, we propose a reconfigurable interconnect design based on regenerators and demonstrate near optimal performance and energy efficiency across the supply voltage of 0.3V and 1V
Recommended from our members
Multitasking on Wireless Sensor Networks
A Wireless Sensor Network (WSN) is a loose interconnection among distributed embedded devices called motes. Motes have constrained sensing, computing, and communicating resources and operate for a long period of time on a small energy supply. Although envisioned as a platform for facilitating and inspiring a new spectrum of applications, after a decade of research the WSN is limited to collecting data and sporadically updating system parameters. Programming other applications, including those that have real-time constraints, or designing WSNs operating with multiple applications require enhanced system architectures, new abstractions, and design methodologies. This dissertation introduces a system design methodology for multitasking on WSNs. It allows programmers to create an abstraction of a single, integrated system running with multiple tasks. Every task has a dedicated protocol stack. Thus, different tasks can have different computation logics and operate with different communication protocols. This facilitates the execution of heterogeneous applications on the same WSN and allows programmers to implement a variety of system services. The services that have been implemented provide energy-monitoring, tasks scheduling, and communication between the tasks. The experimental section evaluates implementations of the WSN software designed with the presented methodology. A new set of tools for testbed deployments is introduced and used to test examples of WSNs running with applications interacting with the physical world. Using remote testbeds with over 100 motes, the results show the feasibility of the proposed methodology in constructing a robust and scalable WSN system abstraction, which can improve the run-time performance of applications, such as data collection and point-to-point streaming