Search CORE

5 research outputs found

Scalable Energy-Recovery Architectures.

Author: Ou Tai-Chuan
Publication venue
Publication date: 01/01/2015
Field of study

Energy efficiency is a critical challenge for today's integrated circuits, especially for high-end digital signal processing and communications that require both high throughput and low energy dissipation for extended battery life. Charge-recovery logic recovers and reuses charge using inductive elements and has the potential to achieve order-of-magnitude improvement in energy efficiency while maintaining high performance. However, the lack of large-scale high-speed silicon demonstrations and inductor area overheads are two major concerns. This dissertation focuses on scalable charge-recovery designs. We present a semi-automated design flow to enable the design of large-scale charge-recovery chips. We also present a new architecture that uses in-package inductors, eliminating the area overheads caused by the use of integrated inductors in high-performance charge-recovery chips. To demonstrate our semi-automated flow, which uses custom-designed standard-cell-like dynamic cells, we have designed a 576-bit charge-recovery low-density parity-check (LDPC) decoder chip. Functioning correctly at clock speeds above 1 GHz, this prototype is the first-ever demonstration of a GHz-speed charge-recovery chip of significant complexity. In terms of energy consumption, this chip improves over recent state-of-the-art LDPCs by at least 1.3 times with comparable or better area efficiency. To demonstrate our architecture for eliminating inductor overheads, we have designed a charge-recovery LDPC decoder chip with in-package inductors. This test-chip has been fabricated in a 65nm CMOS flip-chip process. A custom 6-layer FC-BGA package substrate has been designed with 16 inductors embedded in the fifth layer of the package substrate, yielding higher Q and significantly improving area efficiency and energy efficiency compared to their on-chip counterparts. From measurements, this chip achieves at least 2.3 times lower energy consumption with better area efficiency over state-of-the-art published designs.PhDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/116653/1/terryou_1.pd

Deep Blue Documents at the University of Michigan

Energy-Efficient Neural Network Architectures

Author: Wu Hsi-Shou
Publication venue
Publication date: 01/01/2018
Field of study

Emerging systems for artificial intelligence (AI) are expected to rely on deep neural networks (DNNs) to achieve high accuracy for a broad variety of applications, including computer vision, robotics, and speech recognition. Due to the rapid growth of network size and depth, however, DNNs typically result in high computational costs and introduce considerable power and performance overheads. Dedicated chip architectures that implement DNNs with high energy efficiency are essential for adding intelligence to interactive edge devices, enabling them to complete increasingly sophisticated tasks by extending battery lie. They are also vital for improving performance in cloud servers that support demanding AI computations. This dissertation focuses on architectures and circuit technologies for designing energy-efficient neural network accelerators. First, a deep-learning processor is presented for achieving ultra-low power operation. Using a heterogeneous architecture that includes a low-power always-on front-end and a selectively-enabled high-performance back-end, the processor dynamically adjusts computational resources at runtime to support conditional execution in neural networks and meet performance targets with increased energy efficiency. Featuring a reconfigurable datapath and a memory architecture optimized for energy efficiency, the processor supports multilevel dynamic activation of neural network segments, performing object detection tasks with 5.3x lower energy consumption in comparison with a static execution baseline. Fabricated in 40nm CMOS, the processor test-chip dissipates 0.23mW at 5.3 fps. It demonstrates energy scalability up to 28.6 TOPS/W and can be configured to run a variety of workloads, including severely power-constrained ones such as always-on monitoring in mobile applications. To further improve the energy efficiency of the proposed heterogeneous architecture, a new charge-recovery logic family, called zero-short-circuit current (ZSCC) logic, is proposed to decrease the power consumption of the always-on front-end. By relying on dedicated circuit topologies and a four-phase clocking scheme, ZSCC operates with significantly reduced short-circuit currents, realizing order-of-magnitude power savings at relatively low clock frequencies (in the order of a few MHz). The efficiency and applicability of ZSCC is demonstrated through an ANSI S1.11 1/3 octave filter bank chip for binaural hearing aids with two microphones per ear. Fabricated in a 65nm CMOS process, this charge-recovery chip consumes 13.8µW with a 1.75MHz clock frequency, achieving 9.7x power reduction per input in comparison with a 40nm monophonic single-input chip that represents the published state of the art. The ability of ZSCC to further increase the energy efficiency of the heterogeneous neural network architecture is demonstrated through the design and evaluation of a ZSCC-based front-end. Simulation results show 17x power reduction compared with a conventional static CMOS implementation of the same architecture.PHDElectrical and Computer EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/147614/1/hsiwu_1.pd

Deep Blue Documents at the University of Michigan

Secure and Energy-Efficient Processors

Author: Lu Shengshuo
Publication venue
Publication date
Field of study

Security has become an essential part of digital information storage and processing. Both high-end and low-end applications, such as data centers and Internet of Things (IoT), rely on robust security to ensure proper operation. Encryption of information is the primary means for enabling security. Among all encryption standards, Advanced Encryption Standard (AES) is a widely adopted cryptographic algorithm, due to its simplicity and high security. Although encryption standards in general are extremely difficult to break mathematically, they are vulnerable to so-called side channel attacks, which exploit electrical signatures of operating chips, such as power trace or magnetic field radiation, to crack the encryption. Differential Power Analysis (DPA) attack is a representative and powerful side-channel attack method, which has demonstrated high effectiveness in cracking secure chips. This dissertation explores circuits and architectures that offer protection against DPA attacks in high-performance security applications and in low-end IoT applications. The effectiveness of the proposed technologies is evaluated. First, a 128-bit Advanced Encryption Standard (AES) core for high-performance security applications is designed, fabricated and evaluated in a 65nm CMOS technology. A novel charge-recovery logic family, called Bridge Boost Logic (BBL), is introduced in this design to achieve switching-independent energy dissipation and provide intrinsic high resistance against DPA attacks. Based on measurements, the AES core achieves a throughput of 16.90Gbps and power consumption of 98mW, exhibiting 720x higher DPA resistance and 30% lower power than a conventional CMOS counterpart implemented on the same die and operated at the same clock frequency. Second, an AES core designed for low-cost and energy-efficient IoT security applications is designed and fabricated in a 65nm CMOS technology. A novel Dual-Rail Flush Logic (DRFL) with switching-independent power profile is used to yield intrinsic resistance against DPA attacks with minimum area and energy consumption. Measurement results show that this 0.048mm2 core achieves energy consumption as low as 1.25pJ/bit, while providing at least 2604x higher DPA resistance over its conventional CMOS counterpart on the same die, marking the smallest, most energy-efficient and most secure full-datapath AES core published to date.PHDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/138791/1/luss_1.pd

Deep Blue Documents at the University of Michigan

Performance-Driven Energy-Efficient VLSI.

Author: Ma Wei-Hsiang
Publication venue
Publication date
Field of study

Today, there are two prevalent platforms in VLSI systems: high-performance and ultra-low power. High-speed designs, usually operating at GHz level, provide the required computation abilities to systems but also consume a large amount of power; microprocessors and signal processing units are examples of this type of designs. For ultra-low power designs, voltage scaling methods are usually used to reduce power consumption and extend battery life. However, circuit delay in ultra-low power designs increases exponentially, as voltage is scaled below Vth, and subthreshold leakage energy also increases in a near-exponential fashion. Many methods have been proposed to address key design challenges on these two platforms, energy consumption in high-performance designs, and performance/reliability in ultra-low power designs. In this thesis, charge-recovery design is explored as a solution targeting both platforms to achieve increased energy efficiency over conventional CMOS designs without compromising performance or reliability. To improve performance while still achieving high energy efficiency for ultra-low power designs, we propose Subthreshold Boost Logic (SBL), a new circuit family that relies on charge-recovery design techniques to achieve order-of-magnitude improvements in operating frequencies, and achieve high energy efficiency compared to conventional subthreshold designs. To demonstrate the performance and energy efficiency of SBL, we present a 14-tap 8-bit finite-impulse response (FIR) filter test-chip fabricated in a 0.13µm process. With a single 0.27V supply, the test-chip achieves its most energy efficient operating point at 20MHz, consuming 15.57pJ per cycle with a recovery rate of 89% and a FoM equal to 17.37 nW/Tap/MHz/InBit/CoeffBit. To reduce energy consumption at multi-GHz level frequencies, we explore the application of resonant-clocking to the design of a 5-bit non-interleaved resonant-clock ash ADC with a sampling rate of 7GS/s. The ADC has been designed in a 65nm bulk CMOS process. An integrated 0.77nH inductor is used to resonate the entire clock distribution network to achieve energy efficient operation. Operating at 5.5GHz, the ADC consumes 28mW, yielding 396fJ per conversion step. The clock network accounts for 10.7% of total power and consumes 54% less energy over CV^2. By comparison, in a typical ash ADC design, 30% of total power is clock-related.Ph.D.Electrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/89779/1/wsma_1.pd

Deep Blue Documents at the University of Michigan