15 research outputs found
๊ณต์ ๋ณํ์ ๋๊ฐํ ์๋ ์จ๋ ๋ณด์ ์ ํ ๋ฆฌํ๋ ์ฌ์ฉ ๋ชจ๋ฐ์ผ ๋๋จ ์จ๋๊ณ
ํ์๋
ผ๋ฌธ (๋ฐ์ฌ)-- ์์ธ๋ํ๊ต ๋ํ์ : ์ ๊ธฐยท์ปดํจํฐ๊ณตํ๋ถ, 2013. 8. ๊น์ํ.Smaller transistors mean that capacitors are charged less uniformly, which increases the self-refresh current in the DRAMs used in mobile devices. Adaptive self-refresh using an on-chip thermometer can solve this problem. In this thesis, a PVT tolerant on-chip CMOS thermometer specifically designed for controlling the refresh period of a DRAM will be proposed for low power mobile DRAM. Two types of on-chip CMOS thermometer including a novel temperature sensor is proposed, which is implemented in two different DRAM process technologies integrated into mobile LPDDR2 and LPDDR3 products. The on-chip thermometer incorporating in mobile LPDDR2 chip is fabricated in a 44nm DRAM process with a supply of 1.1V. The sensor has a temperature sensitivity of โ3.2mV/ยฐC, over a range of 0ยฐC to 110ยฐC. Its resolution is 1.94ยฐC and is only limited by the 6.2mV step of the associated resistor ladder not by its own design. The high linearity of the sensor permits one-point calibration, after which the errors in 61 sample circuits ranged between โ1.42ยฐC and +2.66ยฐC. The sensor has an active area of 0.001725mm2 and consumes less than 0.36ฮผW on average with a supply of 1.1V.
To improve the overall performance including ultra-low operation voltage, temperature sensitivity, low power consumption, high linearity regardless of process skew variations and high productivity improved by one point calibration, the folded type on-chip thermometer incorporating in mobile LPDDR3 chip which fabricated in a 29nm DRAM process with a supply of 1.1V and 0.8V will be proposed. This folded type sensor exhibits further upgrading properties such as a temperature sensitivity of โ3.2mV/ยฐ[email protected] &โ3.13mV/ยฐC @0.8V, over wide range of -40ยฐC to 110ยฐC. Its resolution is 1.85ยฐ[email protected] & 1.98ยฐ[email protected] and is only limited by the 6.2mV step. The more linearity of folded type sensor permits one-point calibration, after which the errors in 494 sample circuits ranged between โ1.94ยฐC and +1.61ยฐC. The folded type sensor has an active area of 0.001606mm2 and consumes less than 0.19ฮผ[email protected] & 0.14ฮผ[email protected] on average slightly more than unfolded type sensor.ABSTRACT I
CONTENTS III
LIST OF FIGURES V
LIST OF TABLES IX
CHAPTER 1 INTRODUCTION 1
1.1 MOTIVATION 1
1.2 THESIS ORGANIZATION 3
CHAPTER 2 ARCHITECTURE OF THERMOMETER 5
2.1 INTRODUCTION TO ON-CHIP THERMOMETER IN MOBILE DRAM 5
2.2 PROPOSED ON-CHIP CMOS THERMOMETER ARCHITECTURE 17
2.3 TEMPERATURE READOUT PROCEDURE OF PROPOSED ON-CHIP CMOS THERMOMETER 23
2.4 PROPOSED FOLDED TYPE ON-CHIP CMOS THERMOMETER ARCHITECTURE 25
2.5 TEMPERATURE READOUT PROCEDURE OF PROPOSED FOLDED TYPE
ON-CHIP CMOS THERMOMETER 30
2.6 ONE-POINT CALIBRATION METHOD 32
2.7 TEMPERATURE LINEARITY OF TEMPERATURE SENSOR 35
CHAPTER 3 OPERATIONAL PRINCIPLES OF CMOS TEMPERATURE SENSOR IN MOBILE DRAM 39
3.1 PRIOR WORKS OF ON-CHIP THERMOMETER 39
3.2 PROPOSED CMOS TEMPERATURE SENSOR IN MOBILE DRAM 44
3.3 OPERATION PRINCIPLES OF PROPOSED TEMPERATURE SENSOR 48
3.4 PROPOSED FOLDED TYPE TEMPERATURE SENSOR 55
CHAPTER 4 PERIPHERAL CIRCUITS OF THERMOMETER 60
4.1 REGULATOR FOR VLTCSR SUPPLY 61
4.1.1 DC ANALYSIS 62
4.1.2 AC ANALYSIS 63
4.2 RESISTOR DECK 67
4.3 COMPARATOR 68
CHAPTER 5 EXPERIMENTAL RESULTS 70
5.1 ON-CHIP CMOS THERMOMETER IN 44NM CMOS PROCESS
FOR MOBILE LPDDR2 74
5.2 FOLDED TYPE ON-CHIP CMOS THERMOMETER IN 29NM CMOS PROCESS
FOR MOBILE LPDDR3 77
CHAPTER 6 CONCLUSIONS 83
BIBLIOGRAPHY 86
ABSTRACT IN KOREAN 89Docto
์ ์ํ ๋ ๊ฐ์ง ๋ฐฉ๋ฒ์ ํฌํจํ ์ ์ ๋ ฅ ๋ฉ๋ชจ๋ฆฌ ์ปจํธ๋กค๋ฌ์ ์ค๊ณ
ํ์๋
ผ๋ฌธ (๋ฐ์ฌ)-- ์์ธ๋ํ๊ต ๋ํ์ ๊ณต๊ณผ๋ํ ์ ๊ธฐยท์ปดํจํฐ๊ณตํ๋ถ, 2017. 8. ๊น์ํ.and the read margin was enhanced from 0.30UI and 76mV without AF-CTLE to 0.47UI and 80mV to with AF-CTLE. The power efficiency during burst write and read were 5.68pJ/bit and 1.83pJ/bit respectively.A 4266Mb/s/pin LPDDR4 memory controller with an asynchronous feedback continuous-time linear equalizer and an adaptive 3-step eye detection algorithm is presented. The asynchronous feedback continuous-time linear equalizer removes the glitch of DQS without training by applying an offset larger than the noise, and improves read margin by operating as a decision feedback equalizer in DQ path. The adaptive 3-step eye detection algorithm reduces power consumption and black-out time in initialization sequence and retraining in comparison to the 2-dimensional full scanning. In addition, the adaptive 3-step eye detection algorithm can maintain the accuracy by sequentially searching the eye boundaries and initializing the resolution using the binary search method when the eye detection result changes. To achieve high bandwidth, a transmitter and receiver suitable for training are proposed. The transmitter consists of a phase interpolator, a digitally-controlled delay line, a 16:1 serializer, a pre-driver and low-voltage swing terminated logic. The receiver consists of a reference voltage generator, a continuous-time linear equalizer, a phase interpolator, a digitally-controlled delay line, a 1:4 deserializer, and a 4:16 deserializer. The clocking architecture is also designed for low power consumption in idle periods, which are commonly lengthy in mobile applications. A prototype chip was implemented in a 65nm CMOS process with ball grid array package and tested with commodity LPDDR4. The write margin was 0.36UI and 148mVCHAPTER 1 INTRODUCTION 1
1.1 MOTIVATION 1
1.2 THESIS ORGANIZATION 5
CHAPTER 2 LPDDR4 6
2.1 COMPARISON BETWEEN LPDDR3 AND LPDDR4 6
2.2 SOURCE SYNCHRONOUS CLOCKING SCHEME 9
2.3 SIGNALING STANDARDS 11
2.4 MULTIPLE TRAININGS 14
2.5 RE-TRAINING AND RE-INITIALIZATION 16
CHAPTER 3 ADAPTIVE EYE DETECTION 18
3.1 EYE DETECTION 18
3.2 1X2Y3X EYE DETECTION 20
3.3 ADAPTIVE GAIN CONTROL 22
3.4 ADAPTIVE 1X2Y3X EYE DETECTION 24
CHAPTER 4 LPDDR4 MEMORY CONTROLLER 26
4.1 DESIGN PROCEDURE 26
4.2 ARCHITECTURE 30
4.2.1 TRANSMITTER 33
4.2.2 RECEIVER 35
4.2.3 CLOCKING ARCHITECTURE 38
4.3 CIRCUIT IMPLEMENTATION 43
4.3.1 ADPLL WITH MULTI-MODULUS DIVIDER 43
4.3.2 ADDLL WITH TRIANGULAR-MODULATED PI 45
4.3.3 CTLE WITH AUTO-DQS CLEANING 47
4.3.4 DES WITH CLOCK DOMAIN CROSSING 52
4.3.5 LVSTL WITH ZQ CALIBRATION 54
4.3.6 COARSE-FINE DCDL 56
4.4 LINK TRAINING 57
4.4.1 SIMULATION RESULTS 59
CHAPTER 5 MEASUREMENT RESULTS 72
5.1 MEASUREMENT SETUP 72
5.2 MEASUREMENT RESULTS OF SUB-BLOCK 80
5.2.1 ADPLL WITH MULTI-MODULUS DIVIDER 80
5.2.2 ADDLL WITH TRIANGULAR-MODULATED PI 82
5.2.3 COARSE-FINE DCDL 84
5.3 LPDDR4 INTERFACE MEASUREMENT RESULTS 84
CHAPTER 6 CONCLUSION 88
BIBLIOGRAPHY 90Docto
Ultra-low Power Circuits for Internet of Things (IOT)
Miniaturized sensor nodes offer an unprecedented opportunity for the semiconductor industry which led to a rapid development of the application space: the Internet of Things (IoT). IoT is a global infrastructure that interconnects physical and virtual things which have the potential to dramatically improve people's daily lives. One of key aspect that makes IoT special is that the internet is expanding into places that has been ever reachable as device form factor continue to decreases. Extremely small sensors can be placed on plants, animals, humans, and geologic features, and connected to the Internet. Several challenges, however, exist that could possibly slow the development of IoT.
In this thesis, several circuit techniques as well as system level optimizations to meet the challenging power/energy requirement for the IoT design space are described. First, a fully-integrated temperature sensor for battery-operated, ultra-low power microsystems is presented. Sensor operation is based on temperature independent/dependent current sources that are used with oscillators and counters to generate a digital temperature code.
Second, an ultra-low power oscillator designed for wake-up timers in compact wireless sensors is presented. The proposed topology separates the continuous comparator from the oscillation path and activates it only for short period when it is required. As a result, both low power tracking and generation of precise wake-up signal is made possible.
Third, an 8-bit sub-ranging SAR ADC for biomedical applications is discussed that takes an advantage of signal characteristics. ADC uses a moving window and stores the previous MSBs voltage value on a series capacitor to achieve energy saving compared to a conventional approach while maintaining its accuracy.
Finally, an ultra-low power acoustic sensing and object recognition microsystem that uses frequency domain feature extraction and classification is presented. By introducing ultra-low 8-bit SAR-ADC with 50fF input capacitance, power consumption of the frontend amplifier has been reduced to single digit nW-level. Also, serialized discrete Fourier transform (DFT) feature extraction is proposed in a digital back-end, replacing a high-power/area-consuming conventional FFT.PHDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/137157/1/seojeong_1.pd
Current-mode processing based Temperature-to-Digital Converters for MEMS applications
This thesis presents novel Temperature-to-Digital Converters (TDCs) designed and fabricated in CMOS technology. These integrated smart temperature sensing circuits are widely employed in the Micro-Electro-Mechanical Systems (MEMS) field in order to mitigate the impact of the ambient temperature on their performance. In this framework, the increasingly stringent demands of the market have led the cost-effectiveness specification of these compensation solutions to an higher and higher level, directly translating into the
requirement of more and more compact designs (< 0.1 mmยฒ); in addition to this, considering that the great majority of the systems whose thermal drift needs to be compensated is battery supplied, ultra-low energy-per-conversion (< 10 nJ) is another requirement of
primary importance. This thesis provides a detailed description of two different test-chips (mas fuerte and es posible) that have been designed with this orientation and that are the result of three years of research activity; for both devices, the conception, design,
layout and testing phases are all described in detail and are supported by simulation and measurement results.This thesis presents novel Temperature-to-Digital Converters (TDCs) designed and fabricated in CMOS technology. These integrated smart temperature sensing circuits are widely employed in the Micro-Electro-Mechanical Systems (MEMS) field in order to mitigate the impact of the ambient temperature on their performance. In this framework, the increasingly stringent demands of the market have led the cost-effectiveness specification of these compensation solutions to an higher and higher level, directly translating into the
requirement of more and more compact designs (< 0.1 mmยฒ); in addition to this, considering that the great majority of the systems whose thermal drift needs to be compensated is battery supplied, ultra-low energy-per-conversion (< 10 nJ) is another requirement of
primary importance. This thesis provides a detailed description of two different test-chips (mas fuerte and es posible) that have been designed with this orientation and that are the result of three years of research activity; for both devices, the conception, design,
layout and testing phases are all described in detail and are supported by simulation and measurement results
A LPDDR4 MEMORY CONTROLLER DESIGN WITH EYE CENTER DETECTION ALGORITHM
ํ์๋
ผ๋ฌธ (๋ฐ์ฌ)-- ์์ธ๋ํ๊ต ๋ํ์ : ์ ๊ธฐยท์ปดํจํฐ๊ณตํ๋ถ, 2016. 2. ๊น์ํ.The demand for higher bandwidth with reduced power consumption in mobile memory is increasing. In this thesis, architecture of the LPDDR4 memory controller, operated with a LPDDR4 memory, is proposed and designed, and efficient training algorithm, which is appropriate for this architecture, is proposed for memory training and verification.
The operation speed range of the LPDDR4 memory specification is from 533Mbps to 4266Mbps, and the LPDDR4 memory controller is designed to support that range of the LPDDR4 memory. The phase-locked loop in the LPDDR4 memory controller is designed to operate between 1333MHz and 2133MHz. To cover the range of the LPDDR4 memory, the selectable frequency divider is used to provide operation clock. The output frequency of the phase-locked loop with divider is from 266MHz to 2133MHz. The delay-locked loop in the LPDDR4 memory controller is designed to operate between 266MHz and 2133MHz with 180ห phase locking. The delay-locked loop is used each training operation, which is command training, data read and write training. To complete training in each training stage, eye center detection algorithm is used. The circuits for the proposed eye center detection algorithm such as delay line, phase interpolator and reference generator are designed and validated. The proposed 1x2y3x eye center detection algorithm is 23 times faster than conventional two-dimensional eye center detection algorithm and it can be implemented simply.
Using 65nm CMOS process, the proposed LPDDR4 memory controller occupies 12mm2. The verification of the LPDDR4 memory controller is performed with commodity LPDDR4 memory. The verification of all training sequence, which is power on, initializing, boot up, command training, write leveling, read training, write training, is performed in this environment. The low voltage swing terminated logic driver and other several functions, including write leveling and data transmission, are verified at 4266Mbps and the entire LPDDR4 memory controller operations from 566Mbps to 1600Mbps are verified. The proposed eye center detection algorithm is verified from 566Mbps to 2843Mbps.CHAPTER 1 INTRODUCTION 1
1.1 MOTIVATION 1
1.2 INTRODUCTION 5
1.3 THESIS ORGANIZATION 7
CHAPTER 2 LPDDR4 MEMORY CONTROLLER DESIGN 8
2.1 DIFFERENCE BETWEEN LPDDR3 AND LPDDR4 MEMORY 8
2.1.1 ARCHITECTURAL DIFFERENCE BETWEEN LPDDR3 AND LPDDR4 MEMORY 10
2.1.2 SOURCE SYNCHRONOUS MATCHED SCHEME AND UNMATCHED SCHEME 11
2.1.3 LOW VOLTAGE SWING TERMINATED LOGIC DRIVER AND TERMINATION SCHEME 12
2.2 LPDDR4 MEMORY CONTROLLER SPECIFICATION 15
2.3 DESIGN PROCEDURE 18
CHAPTER 3 LPDDR4 MEMORY CONTROLLER ARCHITECTURE BASED ON MEMORY TRAINING 20
3.1 LPDDR4 MEMORY TRAINING SEQUENCE 20
3.2 LPDDR4 MEMORY TRAINING EYE DETECTION ALGORITHM 24
3.2.1 EYE CENTER DETECTION 24
3.2.2 1X2Y3X EYE CENTER DETECTION ALGORITHM 27
3.3. LPDDR4 MEMORY CONTROLLER DESIGN BASED ON MEMORY TRAINING 31
3.3.1 ARCHITECTURE FOR MEMORY BOOT UP AND POWER UP 31
3.3.2 CLOCK PATH ARCHITECTURE AND CLOCK TREE 34
3.3.3 COMMAND TRAINING AND COMMAND PATH ARCHITECTURE 35
3.3.4 WRITE LEVELING AND DATA STROBE TRANSMISSION PATH ARCHITECTURE 39
3.3.5 READ TRAINING AND READ PATH ARCHITECTURE 41
3.3.6 WRITE TRAINING AND WRITE PATH ARCHITECTURE 43
3.3.7 NORMAL READ/WRITE OPERATION AND MARGIN TEST 46
CHAPTER 4 LPDDR4 MEMORY CONTROLLER ARCHITECTURE MODELING AND CIRCUIT DESIGN 48
4.1 OVERALL LPDDR4 MEMORY CONTROLLER ARCHITECTURE MODELING 48
4.2 SIMULATION RESULT OF LPDDR4 MEMORY CONTROLLER MODELING 51
4.3 LPDDR4 MEMORY CONTROLLER CIRCUIT DESIGN 61
4.3.1 PHASE-LOCKED LOOP 61
4.3.2 DELAY-LOCKED LOOP 65
4.3.3 TRANSMITTER OF LPDDR4 MEMORY CONTROLLER: WRITE PATH 70
4.3.4 DE-SERIALIZER WITH CLOCK DOMAIN CROSSING 75
CHAPTER 5 MEASUREMENT RESULT OF LPDDR4 MEMORY CONTROLLER 77
5.1 LPDDR4 MEMORY CONTROLLER MEASUREMENT SETUP 77
5.1.1 LPDDR4 MEMORY CONTROLLER FLOOR PLAN AND LAYOUT 77
5.1.2 PACKAGE AND TEST BOARD 79
5.2 LPDDR4 MEMORY CONTROLLER SUB-BLOCK MEASUREMENT 81
5.2.1 PHASE-LOCKED LOOP 81
5.2.2 DELAY-LOCKED LOOP 83
5.2.3 200PS AND 800PS DELAY LINE 85
5.2.4 VOLTAGE REFERENCE GENERATOR 86
5.2.5 PHASE INTERPOLATOR 87
5.3 LPDDR4 MEMORY SYSTEM OPERATION MEASUREMENT 90
CHAPTER 6 CONCLUSION 93
APPENDIX OPERATION FLOW CHART OF THE PROPOSED LPDDR4 MEMORY CONTROLLER 95
BIBLIOGRAPHY 118
KOREAN ABSTRACT 124Docto
Approximation Opportunities in Edge Computing Hardware : A Systematic Literature Review
With the increasing popularity of the Internet of Things and massive Machine Type Communication technologies, the number of connected devices is rising. However, while enabling valuable effects to our lives, bandwidth and latency constraints challenge Cloud processing of their associated data amounts. A promising solution to these challenges is the combination of Edge and approximate computing techniques that allows for data processing nearer to the user. This paper aims to survey the potential benefits of these paradigmsโ intersection. We provide a state-of-the-art review of circuit-level and architecture-level hardware techniques and popular applications. We also outline essential future research directions.publishedVersionPeer reviewe
Toward realizing power scalable and energy proportional high-speed wireline links
Growing computational demand and proliferation of cloud computing has placed high-speed
serial links at the center stage. Due to saturating energy efficiency improvements over the
last five years, increasing the data throughput comes at the cost of power consumption. Conventionally, serial link power can be reduced by optimizing individual building blocks such as
output drivers, receiver, or clock generation and distribution. However, this approach yields
very limited efficiency improvement. This dissertation takes an alternative approach toward
reducing the serial link power. Instead of optimizing the power of individual building blocks,
power of the entire serial link is reduced by exploiting serial link usage by the applications.
It has been demonstrated that serial links in servers are underutilized. On average, they
are used only 15% of the time, i.e. these links are idle for approximately 85% of the time.
Conventional links consume power during idle periods to maintain synchronization between
the transmitter and the receiver. However, by powering-off the link when idle and powering
it back when needed, power consumption of the serial link can be scaled proportionally to
its utilization. This approach of rapid power state transitioning is known as the rapid-on/off
approach. For the rapid-on/off to be effective, ideally the power-on time, off-state power,
and power state transition energy must all be close to zero. However, in practice, it is very
difficult to achieve these ideal conditions. Work presented in this dissertation addresses these
challenges.
When this research work was started (2011-12), there were only a couple of research papers
available in the area of rapid-on/off links. Systematic study or design of a rapid power state
transitioning in serial links was not available in the literature. Since rapid-on/off with
nanoseconds granularity is not a standard in any wireline communication, even the popular
test equipment does not support testing any such feature, neither any formal measurement methodology was available. All these circumstances made the beginning difficult. However,
these challenges provided a unique opportunity to explore new architectural techniques and
identify trade-offs. The key contributions of this dissertation are as follows.
The first and foremost contribution is understanding the underlying limitations of saturating energy efficiency improvements in serial links and why there is a compelling need to
find alternative ways to reduce the serial link power.
The second contribution is to identify potential power saving techniques and evaluate the
challenges they pose and the opportunities they present.
The third contribution is the design of a 5Gb/s transmitter with a rapid-on/off feature.
The transmitter achieves rapid-on/off capability in voltage mode output driver by using
a fast-digital regulator, and in the clock multiplier by accurate frequency pre-setting and
periodic reference insertion. To ease timing requirements, an improved edge replacement
logic circuit for the clock multiplier is proposed. Mathematical modeling of power-on time
as a function of various circuit parameters is also discussed. The proposed transmitter
demonstrates energy proportional operation over wide variations of link utilization, and is,
therefore, suitable for energy efficient links. Fabricated in 90nm CMOS technology, the
voltage mode driver, and the clock multiplier achieve power-on-time of only 2ns and 10ns,
respectively. This dissertation highlights key trade-off in the clock multiplier architecture,
to achieve fast power-on-lock capability at the cost of jitter performance.
The fourth contribution is the design of a 7GHz rapid-on/off LC-PLL based clock multi-
plier. The phase locked loop (PLL) based multiplier was developed to overcome the limita-
tions of the MDLL based approach. Proposed temperature compensated LC-PLL achieves
power-on-lock in 1ns.
The fifth and biggest contribution of this dissertation is the design of a 7Gb/s embedded
clock transceiver, which achieves rapid-on/off capability in LC-PLL, current-mode transmit-
ter and receiver. It was the first reported design of a complete transceiver, with an embedded
clock architecture, having rapid-on/off capability. Background phase calibration technique in
PLL and CDR phase calibration logic in the receiver enable instantaneous lock on power-on.
The proposed transceiver demonstrates power scalability with a wide range of link utiliza-
tion and, therefore, helps in improving overall system efficiency. Fabricated in 65nm CMOS technology, the 7Gb/s transceiver achieves power-on-lock in less than 20ns. The transceiver
achieves power scaling by 44x (63.7mW-to-1.43mW) and energy efficiency degradation by
only 2.2x (9.1pJ/bit-to-20.5pJ/bit), when the effective data rate (link utilization) changes
by 100x (7Gb/s-to-70Mb/s).
The sixth and final contribution is the design of a temperature sensor to compensate
the frequency drifts due to temperature variations, during long power-off periods, in the
fast power-on-lock LC-PLL. The proposed self-referenced VCO-based temperature sensor
is designed with all digital logic gates and achieves low supply sensitivity. This sensor is
suitable for integration in processor and DRAM environments. The proposed sensor works
on the principle of directly converting temperature information to frequency and finally
to digital bits. A novel sensing technique is proposed in which temperature information
is acquired by creating a threshold voltage difference between the transistors used in the
oscillators. Reduced supply sensitivity is achieved by employing junction capacitance, and
the overhead of voltage regulators and an external ideal reference frequency is avoided. The
effect of VCO phase noise on the sensor resolution is mathematically evaluated. Fabricated
in the 65nm CMOS process, the prototype can operate with a supply ranging from 0.85V
to 1.1V, and it achieves a supply sensitivity of 0.034oC/mV and an inaccuracy of ยฑ0.9oC
and ยฑ2.3oC from 0-100oC after 2-point calibration, with and without static nonlinearity
correction, respectively. It achieves a resolution of 0.3oC, resolution FoM of 0.3(nJ/conv)res2 ,
and measurement (conversion) time of 6.5ฮผs
Recommended from our members
Variation-Tolerant and Voltage-Scalable Integrated Circuits Design
Ultra-low-voltage (ULV) operation where the supply voltage of the digital computing hardware is scaled down to the level near or below transistor threshold voltage (e.g. 300-500mV) is a key technique to achieve high computing energy efficiency. It has enabled many new exciting applications in the field of Internet of Things (IoT) devices and energy-constrained applications such as medical implants, environment sensors, and micro-robots. Ultra-low-voltage (ULV) operation is also commonly used with the emerging architectures that are often non Von-Neumann style to empower energy-efficient cognitive computing.
One the biggest challenge in realizing ULV design is the large circuit delay variability. To guarantee functionality in the worst-case process, voltage, and temperature (PVT) condition, the traditional safety margin approach requires operating at a slower clock frequency or higher supply voltage which significantly limits the achievable energy efficiency of the hardware. To fully claim the energy efficiency of ULV, the large circuit delay variation needs to be adaptively handled. However, the existing adaptive techniques that are optimized for nominal supply voltage operation and traditional Von-Neumann architectures become inefficient for ULV designs and emerging architectures.
This thesis presents adaptive techniques based on timing error detection and correction (EDAC) that are more suitable for the energy-constrained ULV designs and the emerging architectures. The proposed techniques are demonstrated in three test chips: (1) R-Processor: A 0.4V resilient processor with a voltage-scalable and low-overhead in-situ EDAC technique. It achieves 38% energy efficiency improvement or 2.3X throughput improvement as compared to the traditional safety margin approach. (2) A 450mV timing-margin-free waveform sorter for brain computer interface (BCI) microsystem. It achieves 49.3% higher energy efficiency and 35.6% higher throughput than the traditional safety margin approach. (3) Ultra-low-power and robust power-management system which consists of a microprocessor employing ULV EDAC, 63-ratio integrated switched-capacitor DC-DC converter, and a fully-digital error based regulation controller.
In this thesis, we also explore circuits for emerging techniques. The first is temperature sensors for dynamic-thermal-management (DTM). The modern high-performance microprocessors suffer from ever-increasing power densities which has led to reliability concerns and increased cooling costs from excessive heat. In order to monitor and manage the thermal behavior, DTM techniques embed multiple temperature sensors and use its information. The size, accuracy, and voltage-scalability of the sensor are critical for the performance of DTM. Therefore, we propose a temperature sensor that directly senses transistor threshold voltage and the test chip demonstrates 9X smaller area, 3X higher accuracy, and 200mV lower voltage scalability (down to 400mV) than the previous state-of-art.
Another area of exploration is interconnect design for ultra-dynamic-voltage-scaling (UDVS) systems. UDVS has been proposed for applications that require both high performance and high energy efficiency. UDVS can provide peak performance with nominal supply voltage when work load is high. When work load is moderate or low, UDVS systems can switch to ULV operation for higher energy efficiency. One of the critical challenges for developing UDVS systems is the inflexibility in various circuit fabrics that are often optimized for a single supply voltage. One critical example is conventional repeater based long interconnects which suffers from non-optimal performance and energy efficiency in UDVS systems. Therefore, in this thesis, we propose a reconfigurable interconnect design based on regenerators and demonstrate near optimal performance and energy efficiency across the supply voltage of 0.3V and 1V
Portable Computer Technology (PCT) Research and Development Program Phase 2
The subject of this project report, focused on: (1) Design and development of two Advanced Portable Workstation 2 (APW 2) units. These units incorporate advanced technology features such as a low power Pentium processor, a high resolution color display, National Television Standards Committee (NTSC) video handling capabilities, a Personal Computer Memory Card International Association (PCMCIA) interface, and Small Computer System Interface (SCSI) and ethernet interfaces. (2) Use these units to integrate and demonstrate advanced wireless network and portable video capabilities. (3) Qualification of the APW 2 systems for use in specific experiments aboard the Mir Space Station. A major objective of the PCT Phase 2 program was to help guide future choices in computing platforms and techniques for meeting National Aeronautics and Space Administration (NASA) mission objectives. The focus being on the development of optimal configurations of computing hardware, software applications, and network technologies for use on NASA missions
CMOS SPAD-based image sensor for single photon counting and time of flight imaging
The facility to capture the arrival of a single photon, is the fundamental limit to the detection of quantised
electromagnetic radiation. An image sensor capable of capturing a picture with this ultimate optical and
temporal precision is the pinnacle of photo-sensing. The creation of high spatial resolution, single photon
sensitive, and time-resolved image sensors in complementary metal oxide semiconductor (CMOS) technology
offers numerous benefits in a wide field of applications. These CMOS devices will be suitable to replace high
sensitivity charge-coupled device (CCD) technology (electron-multiplied or electron bombarded) with
significantly lower cost and comparable performance in low light or high speed scenarios. For example, with
temporal resolution in the order of nano and picoseconds, detailed three-dimensional (3D) pictures can be
formed by measuring the time of flight (TOF) of a light pulse. High frame rate imaging of single photons can
yield new capabilities in super-resolution microscopy. Also, the imaging of quantum effects such as the
entanglement of photons may be realised.
The goal of this research project is the development of such an image sensor by exploiting single photon
avalanche diodes (SPAD) in advanced imaging-specific 130nm front side illuminated (FSI) CMOS technology.
SPADs have three key combined advantages over other imaging technologies: single photon sensitivity,
picosecond temporal resolution and the facility to be integrated in standard CMOS technology. Analogue
techniques are employed to create an efficient and compact imager that is scalable to mega-pixel arrays. A
SPAD-based image sensor is described with 320 by 240 pixels at a pitch of 8ฮผm and an optical efficiency or
fill-factor of 26.8%. Each pixel comprises a SPAD with a hybrid analogue counting and memory circuit that
makes novel use of a low-power charge transfer amplifier. Global shutter single photon counting images are
captured. These exhibit photon shot noise limited statistics with unprecedented low input-referred noise at an
equivalent of 0.06 electrons.
The CMOS image sensor (CIS) trends of shrinking pixels, increasing array sizes, decreasing read noise, fast
readout and oversampled image formation are projected towards the formation of binary single photon imagers
or quanta image sensors (QIS). In a binary digital image capture mode, the image sensor offers a look-ahead to
the properties and performance of future QISs with 20,000 binary frames per second readout with a bit error
rate of 1.7 x 10-3. The bit density, or cumulative binary intensity, against exposure performance of this image
sensor is in the shape of the famous Hurter and Driffield densitometry curves of photographic film.
Oversampled time-gated binary image capture is demonstrated, capturing 3D TOF images with 3.8cm
precision in a 60cm range