10 research outputs found
Low-Power Design of Digital VLSI Circuits around the Point of First Failure
As an increase of intelligent and self-powered devices is forecasted for our future everyday life, the implementation of energy-autonomous devices that can wirelessly communicate data from sensors is crucial. Even though techniques such as voltage scaling proved to effectively reduce the energy consumption of digital circuits, additional energy savings are still required for a longer battery life. One of the main limitations of essentially any low-energy technique is the potential degradation of the quality of service (QoS). Thus, a thorough understanding of how circuits behave when operated around the point of first failure (PoFF) is key for the effective application of conventional energy-efficient methods as well as for the development of future low-energy techniques. In this thesis, a variety of circuits, techniques, and tools is described to reduce the energy consumption in digital systems when operated either in the safe and conservative exact region, close to the PoFF, or even inside the inexact region.
A straightforward approach to reduce the power consumed by clock distribution while safely operating in the exact region is dual-edge-triggered (DET) clocking. However, the DET approach is rarely taken, primarily due to the perceived complexity of its integration. In this thesis, a fully automated design flow is introduced for applying DET clocking to a conventional single-edge-triggered (SET) design. In addition, the first static true-single-phase-clock DET flip-flop (DET-FF) that completely avoids clock-overlap hazards of DET registers is proposed.
Even though the correct timing of synchronous circuits is ensured in worst-case conditions, the critical path might not always be excited. Thus, dynamic clock adjustment (DCA) has been proposed to trim any available dynamic timing margin by changing the operating clock frequency at runtime. This thesis describes a dynamically-adjustable clock generator (DCG) capable of modifying the period of the produced clock signal on a cycle-by-cycle basis that enables the DCA technique. In addition, a timing-monitoring sequential (TMS) that detects input transitions on either one of the clock phases to enable the selection of the best timing-monitoring strategy at runtime is proposed.
Energy-quality scaling techniques aimat trading lower energy consumption for a small degradation on the QoS whenever approximations can be tolerated. In this thesis, a low-power methodology for the perturbation of baseline coefficients in reconfigurable finite impulse response (FIR) filters is proposed. The baseline coefficients are optimized to reduce the switching activity of the multipliers in the FIR filter, enabling the possibility of scaling the power consumption of the filter at runtime.
The area as well as the leakage power of many system-on-chips is often dominated by embedded memories. Gain-cell embedded DRAM (GC-eDRAM) is a compact, low-power and CMOS-compatible alternative to the conventional static random-access memory (SRAM) when a higher memory density is desired. However, due to GC-eDRAMs relying on many interdependent variables, the adaptation of existing memories and the design of future GCeDRAMs prove to be highly complex tasks. Thus, the first modeling tool that estimates timing, memory availability, bandwidth, and area of GC-eDRAMs for a fast exploration of their design space is proposed in this thesis
Microarchitectural Low-Power Design Techniques for Embedded Microprocessors
With the omnipresence of embedded processing in all forms of electronics today, there is a strong trend towards wireless, battery-powered, portable embedded systems which have to operate under stringent energy constraints. Consequently, low power consumption and high energy efficiency have emerged as the two key criteria for embedded microprocessor design. In this thesis we present a range of microarchitectural low-power design techniques which enable the increase of performance for embedded microprocessors and/or the reduction of energy consumption, e.g., through voltage scaling. In the context of cryptographic applications, we explore the effectiveness of instruction set extensions (ISEs) for a range of different cryptographic hash functions (SHA-3 candidates) on a 16-bit microcontroller architecture (PIC24). Specifically, we demonstrate the effectiveness of light-weight ISEs based on lookup table integration and microcoded instructions using finite state machines for operand and address generation. On-node processing in autonomous wireless sensor node devices requires deeply embedded cores with extremely low power consumption. To address this need, we present TamaRISC, a custom-designed ISA with a corresponding ultra-low-power microarchitecture implementation. The TamaRISC architecture is employed in conjunction with an ISE and standard cell memories to design a sub-threshold capable processor system targeted at compressed sensing applications. We furthermore employ TamaRISC in a hybrid SIMD/MIMD multi-core architecture targeted at moderate to high processing requirements (> 1 MOPS). A range of different microarchitectural techniques for efficient memory organization are presented. Specifically, we introduce a configurable data memory mapping technique for private and shared access, as well as instruction broadcast together with synchronized code execution based on checkpointing. We then study an inherent suboptimality due to the worst-case design principle in synchronous circuits, and introduce the concept of dynamic timing margins. We show that dynamic timing margins exist in microprocessor circuits, and that these margins are to a large extent state-dependent and that they are correlated to the sequences of instruction types which are executed within the processor pipeline. To perform this analysis we propose a circuit/processor characterization flow and tool called dynamic timing analysis. Moreover, this flow is employed in order to devise a high-level instruction set simulation environment for impact-evaluation of timing errors on application performance. The presented approach improves the state of the art significantly in terms of simulation accuracy through the use of statistical fault injection. The dynamic timing margins in microprocessors are then systematically exploited for throughput improvements or energy reductions via our proposed instruction-based dynamic clock adjustment (DCA) technique. To this end, we introduce a 6-stage 32-bit microprocessor with cycle-by-cycle DCA. Besides a comprehensive design flow and simulation environment for evaluation of the DCA approach, we additionally present a silicon prototype of a DCA-enabled OpenRISC microarchitecture fabricated in 28 nm FD-SOI CMOS. The test chip includes a suitable clock generation unit which allows for cycle-by-cycle DCA over a wide range with fine granularity at frequencies exceeding 1 GHz. Measurement results of speedups and power reductions are provided
Recommended from our members
High Performance Local Oscillator Design for Next Generation Wireless Communication
Local Oscillator (LO) is an essential building block in modern wireless radios. In modern wireless radios, LO often serves as a reference of the carrier signal to modulate or demod- ulate the outgoing or incoming data. The LO signal should be a clean and stable source, such that the frequency or timing information of the carrier reference can be well-defined. However, as radio architecture evolves, the importance of LO path design has become much more important than before. Of late, many radio architecture innovations have exploited sophisticated LO generation schemes to meet the ever-increasing demands of wireless radio performances.
The focus of this thesis is to address challenges in the LO path design for next-generation high performance wireless radios. These challenges include (1) Congested spectrum at low radio frequency (RF) below 5GHz (2) Continuing miniaturization of integrated wireless radio, and (3) Fiber-fast (>10Gb/s) mm-wave wireless communication.
The thesis begins with a brief introduction of the aforementioned challenges followed by a discussion of the opportunities projected to overcome these challenges.
To address the challenge of congested spectrum at frequency below 5GHz, novel ra- dio architectures such as cognitive radio, software-defined radio, and full-duplex radio have drawn significant research interest. Cognitive radio is a radio architecture that opportunisti- cally utilize the unused spectrum in an environment to maximize spectrum usage efficiency. Energy-efficient spectrum sensing is the key to implementing cognitive radio. To enable energy-efficient spectrum sensing, a fast-hopping frequency synthesizer is an essential build- ing block to swiftly sweep the carrier frequency of the radio across the available spectrum. Chapter 2 of this thesis further highlights the challenges and trade-offs of the current LO gen-
eration scheme for possible use in sweeping LO-based spectrum analysis. It follows by intro- duction of the proposed fast-hopping LO architecture, its implementation and measurement results of the validated prototype. Chapter 3 proposes an embedded phase-shifting LO-path design for wideband RF self-interference cancellation for full-duplex radio. It demonstrates a synergistic design between the LO path and signal to perform self-interference cancellation.
To address the challenge of continuing miniaturization of integrated wireless radio, ring oscillator-based frequency synthesizer is an attractive candidate due to its compactness. Chapter 4 discussed the difficulty associated with implementing a Phase-Locked Loop (PLL) with ultra-small form-factor. It further proposes the concept sub-sampling PLL with time- based loop filter to address these challenges. A 65nm CMOS prototype and its measurement result are presented for validation of the concept.
In shifting from RF to mm-wave frequencies, the performance of wireless communication links is boosted by significant bandwidth and data-rate expansion. However, the demand for data-rate improvement is out-pacing the innovation of radio architectures. A >10Gb/s mm-wave wireless communication at 60GHz is required by emerging applications such as virtual-reality (VR) headsets, inter-rack data transmission at data center, and Ultra-High- Definition (UHD) TV home entertainment systems. Channel-bonding is considered to be a promising technique for achieving >10Gb/s wireless communication at 60GHz. Chapter 5 discusses the fundamental radio implementation challenges associated with channel-bonding for 60GHz wireless communication and the pros and cons of prior arts that attempted to address these challenges. It is followed by a discussion of the proposed 60GHz channel- bonding receiver, which utilizes only a single PLL and enables both contiguous and non- contiguous channel-bonding schemes.
Finally, Chapter 6 presents the conclusion of this thesis
Digital Intensive Mixed Signal Circuits with In-situ Performance Monitors
University of Minnesota Ph.D. dissertation.November 2016. Major: Electrical/Computer Engineering. Advisor: Chris Kim. 1 computer file (PDF); x, 137 pages.Digital intensive circuit design techniques of different mixed-signal systems such as data converters, clock generators, voltage regulators etc. are gaining attention for the implementation of modern microprocessors and system-on-chips (SoCs) in order to fully utilize the benefits of CMOS technology scaling. Moreover different performance improvement schemes, for example, noise reduction, spur cancellation, linearity improvement etc. can be easily performed in digital domain. In addition to that, increasing speed and complexity of modern SoCs necessitate the requirement of in-situ measurement schemes, primarily for high volume testing. In-situ measurements not only obviate the need for expensive measurement equipments and probing techniques, but also reduce the test time significantly when a large number of chips are required to be tested. Several digital intensive circuit design techniques are proposed in this dissertation along with different in-situ performance monitors for a variety of mixed signal systems. First, a novel beat frequency quantization technique is proposed in a two-step VCO quantizer based ADC implementation for direct digital conversion of low amplitude bio- potential signals. By direct conversion, it alleviates the requirement of the area and power consuming analog-frontend (AFE) used in a conventional ADC designs. This prototype design is realized in a 65nm CMOS technology. Measured SNDR is 44.5dB from a 10mVpp, 300Hz signal and power consumption is only 38μW. Next, three different clock generation circuits, a phase-locked loop (PLL), a multiplying delay-locked loop (MDLL) and a frequency-locked loop (FLL) are presented. First a 0.4-to-1.6GHz sub-sampling fractional-N all digital PLL architecture is discussed that utilizes a D-flip-flop as a digital sub-sampler. Measurement results from a 65nm CMOS test-chip shows 5dB lower phase noise at 100KHz offset frequency, compared to a conventional architecture. The Digital PLL (DPLL) architecture is further extended for a digital MDLL implementation in order to suppress the VCO phase noise beyond the DPLL bandwidth. A zero-offset aperture phase detector (APD) and a digital- to-time converter (DTC) are employed for static phase-offset (SPO) cancellation. A unique in-situ detection circuitry achieves a high resolution SPO measurement in time domain. A 65nm test-chip shows 0.2-to-1.45GHz output frequency range while reducing the phase-noise by 9dB compared to a DPLL. Next, a frequency-to-current converter (FTC) based fractional FLL is proposed for a low accuracy clock generation in an extremely low area for IoT application. High density deep-trench capacitors are used for area reduction. The test-chip is fabricated in a 32nm SOI technology that takes only 0.0054mm2 active area. A high-resolution in-situ period jitter measurement block is also incorporated in this design. Finally, a time based digital low dropout (DLDO) regulator architecture is proposed for fine grain power delivery over a wide load current dynamic range and input/output voltage in order to facilitate dynamic voltage and frequency scaling (DVFS). High- resolution beat frequency detector dynamically adjusts the loop sampling frequency for ripple and settling time reduction due to load transients. A fixed steady-state voltage offset provides inherent active voltage positioning (AVP) for ripple reduction. Circuit simulations in a 65nm technology show more than 90% current efficiency for 100X load current variation, while it can operate for an input voltage range of 0.6V – 1.2V
Circuits and Systems Advances in Near Threshold Computing
Modern society is witnessing a sea change in ubiquitous computing, in which people have embraced computing systems as an indispensable part of day-to-day existence. Computation, storage, and communication abilities of smartphones, for example, have undergone monumental changes over the past decade. However, global emphasis on creating and sustaining green environments is leading to a rapid and ongoing proliferation of edge computing systems and applications. As a broad spectrum of healthcare, home, and transport applications shift to the edge of the network, near-threshold computing (NTC) is emerging as one of the promising low-power computing platforms. An NTC device sets its supply voltage close to its threshold voltage, dramatically reducing the energy consumption. Despite showing substantial promise in terms of energy efficiency, NTC is yet to see widescale commercial adoption. This is because circuits and systems operating with NTC suffer from several problems, including increased sensitivity to process variation, reliability problems, performance degradation, and security vulnerabilities, to name a few. To realize its potential, we need designs, techniques, and solutions to overcome these challenges associated with NTC circuits and systems. The readers of this book will be able to familiarize themselves with recent advances in electronics systems, focusing on near-threshold computing
Cross-Layer Optimization for Power-Efficient and Robust Digital Circuits and Systems
With the increasing digital services demand, performance and power-efficiency
become vital requirements for digital circuits and systems. However, the
enabling CMOS technology scaling has been facing significant challenges of
device uncertainties, such as process, voltage, and temperature variations. To
ensure system reliability, worst-case corner assumptions are usually made in
each design level. However, the over-pessimistic worst-case margin leads to
unnecessary power waste and performance loss as high as 2.2x. Since
optimizations are traditionally confined to each specific level, those safe
margins can hardly be properly exploited.
To tackle the challenge, it is therefore advised in this Ph.D. thesis to
perform a cross-layer optimization for digital signal processing circuits and
systems, to achieve a global balance of power consumption and output quality.
To conclude, the traditional over-pessimistic worst-case approach leads to
huge power waste. In contrast, the adaptive voltage scaling approach saves
power (25% for the CORDIC application) by providing a just-needed supply
voltage. The power saving is maximized (46% for CORDIC) when a more aggressive
voltage over-scaling scheme is applied. These sparsely occurred circuit errors
produced by aggressive voltage over-scaling are mitigated by higher level error
resilient designs. For functions like FFT and CORDIC, smart error mitigation
schemes were proposed to enhance reliability (soft-errors and timing-errors,
respectively). Applications like Massive MIMO systems are robust against lower
level errors, thanks to the intrinsically redundant antennas. This property
makes it applicable to embrace digital hardware that trades quality for power
savings.Comment: 190 page
A 28nm FD-SOI Standard Cell 0.6-1.2V Open-Loop Frequency Multiplier for Low Power SoC Clocking
IEEE International Symposium on Circuits and Systems (ISCAS), Montreal, CANADA, MAY 22-25, 2016International audienceThis paper presents a new design for SoC clocking based on open-loop frequency multiplication. The architecture, fully implemented in 28nm FD-SOI standard cells, achieves frequency tracking within one input reference period making it a promising candidate for Dynamic Voltage and Frequency Scaling (DVFS) schemes. A calibration scheme is implemented for wide voltage range (0.6-1.2V) operation and to offset P&R and variability induced mismatch. Measurements at 0.6V (0.8/0.4V FBB) show a Fmax of 93MHz with a power of 2.91mW/MHz and a jitter of 2.7 % UI
A 28nm FD-SOI Standard Cell 0.6-1.2V Open-Loop Frequency Multiplier for Low Power SoC Clocking
IEEE International Symposium on Circuits and Systems (ISCAS), Montreal, CANADA, MAY 22-25, 2016International audienceThis paper presents a new design for SoC clocking based on open-loop frequency multiplication. The architecture, fully implemented in 28nm FD-SOI standard cells, achieves frequency tracking within one input reference period making it a promising candidate for Dynamic Voltage and Frequency Scaling (DVFS) schemes. A calibration scheme is implemented for wide voltage range (0.6-1.2V) operation and to offset P&R and variability induced mismatch. Measurements at 0.6V (0.8/0.4V FBB) show a Fmax of 93MHz with a power of 2.91mW/MHz and a jitter of 2.7 % UI
Cutting Edge Nanotechnology
The main purpose of this book is to describe important issues in various types of devices ranging from conventional transistors (opening chapters of the book) to molecular electronic devices whose fabrication and operation is discussed in the last few chapters of the book. As such, this book can serve as a guide for identifications of important areas of research in micro, nano and molecular electronics. We deeply acknowledge valuable contributions that each of the authors made in writing these excellent chapters