29 research outputs found
Interconnect and Memory Design for Intelligent Mobile System
Technology scaling has driven the transistor to a smaller area, higher performance and lower power consuming which leads us into the mobile and edge computing era. However, the benefits of technology scaling are diminishing today, as the wire delay and energy scales far behind that of the logics, which makes communication more expensive than computation. Moreover, emerging data centric algorithms like deep learning have a growing demand on SRAM capacity and bandwidth. High access energy and huge leakage of the large on-chip SRAM have become the main limiter of realizing an energy efficient low power smart sensor platform. This thesis presents several architecture and circuit solutions to enable intelligent mobile systems, including voltage scalable interconnect scheme, Compute-In-Memory (CIM), low power memory system from edge deep learning processor and an ultra-low leakage stacked voltage domain SRAM for low power smart image signal processor (ISP). Four prototypes are implemented for demonstration and verification. The first two seek the solutions to the slow and high energy global on-chip interconnect: the first prototype proposes a reconfigurable self-timed regenerator based global interconnect scheme to achieve higher performance and energy-efficiency in wide voltage range, while the second one presents a non Von Neumann architecture, a hybrid in-/near-memory Compute SRAM (CRAM), to address the locality issue. The next two works focus on low-power low-leakage SRAM design for Intelligent sensors. The third prototype is a low power memory design for a deep learning processor with 270KB custom SRAM and Non-Uniform Memory Access architecture. The fourth prototype is an ultra-low leakage SRAM for motion-triggered low power smart imager sensor system with voltage domain stacking and a novel array swapping mechanism. The work presented in this dissertation exploits various optimizations in both architecture level (exploiting temporal and spatial locality) and circuit customization to overcome the main challenges in making extremely energy-efficient battery-powered intelligent mobile devices. The impact of the work is significant in the era of Internet-of-Things (IoT) and the age of AI when the mobile computing systems get ubiquitous, intelligent and longer battery life, powered by these proposed solutions.PHDElectrical and Computer EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/155232/1/jiwang_1.pd
Recommended from our members
Variation-Tolerant and Voltage-Scalable Integrated Circuits Design
Ultra-low-voltage (ULV) operation where the supply voltage of the digital computing hardware is scaled down to the level near or below transistor threshold voltage (e.g. 300-500mV) is a key technique to achieve high computing energy efficiency. It has enabled many new exciting applications in the field of Internet of Things (IoT) devices and energy-constrained applications such as medical implants, environment sensors, and micro-robots. Ultra-low-voltage (ULV) operation is also commonly used with the emerging architectures that are often non Von-Neumann style to empower energy-efficient cognitive computing.
One the biggest challenge in realizing ULV design is the large circuit delay variability. To guarantee functionality in the worst-case process, voltage, and temperature (PVT) condition, the traditional safety margin approach requires operating at a slower clock frequency or higher supply voltage which significantly limits the achievable energy efficiency of the hardware. To fully claim the energy efficiency of ULV, the large circuit delay variation needs to be adaptively handled. However, the existing adaptive techniques that are optimized for nominal supply voltage operation and traditional Von-Neumann architectures become inefficient for ULV designs and emerging architectures.
This thesis presents adaptive techniques based on timing error detection and correction (EDAC) that are more suitable for the energy-constrained ULV designs and the emerging architectures. The proposed techniques are demonstrated in three test chips: (1) R-Processor: A 0.4V resilient processor with a voltage-scalable and low-overhead in-situ EDAC technique. It achieves 38% energy efficiency improvement or 2.3X throughput improvement as compared to the traditional safety margin approach. (2) A 450mV timing-margin-free waveform sorter for brain computer interface (BCI) microsystem. It achieves 49.3% higher energy efficiency and 35.6% higher throughput than the traditional safety margin approach. (3) Ultra-low-power and robust power-management system which consists of a microprocessor employing ULV EDAC, 63-ratio integrated switched-capacitor DC-DC converter, and a fully-digital error based regulation controller.
In this thesis, we also explore circuits for emerging techniques. The first is temperature sensors for dynamic-thermal-management (DTM). The modern high-performance microprocessors suffer from ever-increasing power densities which has led to reliability concerns and increased cooling costs from excessive heat. In order to monitor and manage the thermal behavior, DTM techniques embed multiple temperature sensors and use its information. The size, accuracy, and voltage-scalability of the sensor are critical for the performance of DTM. Therefore, we propose a temperature sensor that directly senses transistor threshold voltage and the test chip demonstrates 9X smaller area, 3X higher accuracy, and 200mV lower voltage scalability (down to 400mV) than the previous state-of-art.
Another area of exploration is interconnect design for ultra-dynamic-voltage-scaling (UDVS) systems. UDVS has been proposed for applications that require both high performance and high energy efficiency. UDVS can provide peak performance with nominal supply voltage when work load is high. When work load is moderate or low, UDVS systems can switch to ULV operation for higher energy efficiency. One of the critical challenges for developing UDVS systems is the inflexibility in various circuit fabrics that are often optimized for a single supply voltage. One critical example is conventional repeater based long interconnects which suffers from non-optimal performance and energy efficiency in UDVS systems. Therefore, in this thesis, we propose a reconfigurable interconnect design based on regenerators and demonstrate near optimal performance and energy efficiency across the supply voltage of 0.3V and 1V
High-Speed and Low-Energy On-Chip Communication Circuits.
Continuous technology scaling sharply reduces transistor delays, while fixed-length global wire delays have increased due to less wiring pitch with higher resistance and coupling capacitance. Due to this ever growing gap, long on-chip interconnects pose well-known latency, bandwidth, and energy challenges to high-performance VLSI systems. Repeaters effectively mitigate wire RC effects but do little to improve their energy costs. Moreover, the increased complexity and high level of integration requires higher wire densities, worsening crosstalk noise and power consumption of conventionally repeated interconnects.
Such increasing concerns in global on-chip wires motivate circuits to improve wire performance and energy while reducing the number of repeaters. This work presents circuit techniques and investigation for high-performance and energy-efficient on-chip communication in the aspects of encoding, data compression, self-timed current injection, signal pre-emphasis, low-swing signaling, and technology mapping. The improved bus designs also consider the constraints of robust operation and performance/energy gains across process corners and design space. Measurement results from 5mm links on 65nm and 90nm prototype chips validate 2.5-3X improvement in energy-delay product.Ph.D.Electrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/75800/1/jseo_1.pd
Designing energy-efficient sub-threshold logic circuits using equalization and non-volatile memory circuits using memristors
The very large scale integration (VLSI) community has utilized aggressive complementary metal-oxide semiconductor (CMOS) technology scaling to meet the ever-increasing performance requirements of computing systems. However, as we enter the nanoscale regime, the prevalent process variation effects degrade the CMOS device reliability. Hence, it is increasingly essential to explore emerging technologies which are compatible with the conventional CMOS process for designing highly-dense memory/logic circuits. Memristor technology is being explored as a potential candidate in designing non-volatile memory arrays and logic circuits with high density, low latency and small energy consumption. In this thesis, we present the detailed functionality of multi-bit 1-Transistor 1-memRistor (1T1R) cell-based memory arrays. We present the performance and energy models for an individual 1T1R memory cell and the memory array as a whole. We have considered TiO2- and HfOx-based memristors, and for these technologies there is a sub-10% difference between energy and performance computed using our models and HSPICE simulations. Using a performance-driven design approach, the energy-optimized TiO2-based RRAM array consumes the least write energy (4.06 pJ/bit) and read energy (188 fJ/bit) when storing 3 bits/cell for 100 nsec write and 1 nsec read access times. Similarly, HfOx-based RRAM array consumes the least write energy (365 fJ/bit) and read energy (173 fJ/bit) when storing 3 bits/cell for 1 nsec write and 200 nsec read access times.
On the logic side, we investigate the use of equalization techniques to improve the energy efficiency of digital sequential logic circuits in sub-threshold regime. We first propose the use of a variable threshold feedback equalizer circuit with combinational logic blocks to mitigate the timing errors in digital logic designed in sub-threshold regime. This mitigation of timing errors can be leveraged to reduce the dominant leakage energy by scaling supply voltage or decreasing the propagation delay. At the fixed supply voltage, we can decrease the propagation delay of the critical path in a combinational logic block using equalizer circuits and, correspondingly decrease the leakage energy consumption. For a 8-bit carry lookahead adder designed in UMC 130 nm process, the operating frequency can be increased by 22.87% (on average), while reducing the leakage energy by 22.6% (on average) in the sub-threshold regime. Overall, the feedback equalization technique provides up to 35.4% lower energy-delay product compared to the conventional non-equalized logic. We also propose a tunable adaptive feedback equalizer circuit that can be used with sequential digital logic to mitigate the process variation effects and reduce the dominant leakage energy component in sub-threshold digital logic circuits. For a 64-bit adder designed in 130 nm our proposed approach can reduce the normalized delay variation of the critical path delay from 16.1% to 11.4% while reducing the energy-delay product by 25.83% at minimum energy supply voltage. In addition, we present detailed energy-performance models of the adaptive feedback equalizer circuit. This work serves as a foundation for the design of robust, energy-efficient digital logic circuits in sub-threshold regime
Demonstrating effective all-optical processing in ultrafast data networks using semiconductor optical amplifiers
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references.The demand for bandwidth in worldwide data networks continues to increase due to growing Internet use and high-bandwidth applications such as video. All-optical signal processing is one promising technique for providing the necessary capacity and offers payload transparency, power consumption which scales efficiently with increasing bit rates, reduced processing latency, and ultrafast performance. In this thesis, we focus on using semiconductor optical amplifier-based logic gates to address both routing and regeneration needs in ultrafast data networks. To address routing needs, we demonstrate a scalable, multi-packet all-optical header processing unit operating at a line rate of 40 Gb/s. For this experiment, we used the ultrafast nonlinear interferometer (UNI) gate, a discrete optical logic gate which has been demonstrated at speeds of 100 Gb/s for bit-wise switching. However, for all-optical switching to become a reality, integration is necessary to significantly reduce the cost of manufacturing, installation, and operation. One promising integrated all-optical logic gate is the semiconductor optical amplifier Mach-Zehnder interferometer (SOA-MZI). This gate has previously been demonstrated capable of up to 80 Gb/s bit-wise switching operation. To enable simple installation and operation of this gate, we developed a performance optimization method which can quickly and accurately pinpoint the optimal operating point of the switch. This eliminates the need for a time-intensive search over a large parameter space and significantly simplifies the operation of the switch. With this method, we demonstrate the ability of a single SOA-MZI logic gate to regenerate ultrafast pulses over 100 passes and 10,000 km in a regenerative loop. Ultimately, all-optical logic gates must be integrated on a single low-cost platform and demonstrated in cascaded, multi-gate operation for increased functionality.(cont.) This requires low-loss monolithic integration. Our approach to this involves an asymmetric twin waveguide (ATG) design. This design also has the potential for high-yields as a result of a high tolerance for fabrication errors. We present our characterization results of ATG waveguides and proposals for future improvements.by Jade P. Wang.Ph.D
A 2.5 Gb/s SONET clock and data recovery macro cell
Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1994.Includes bibliographical references (leaves 109-110).by Andrew Holden Dickson.M.S
Recommended from our members
Collective Pulse Dynamics: A New Timing Circuit Strategy
This work presents a novel CMOS behavior of self stabilization of ring oscillators using collective dynamics. It shows that phase error correction can occur in ring oscillators over multiple cycles without an external reference via the collective dynamics of pulses. In time domain this shows up as timing stability improvement in oscillators. Different timing stability metrics were analyzed to determine the correct methodology to analyze this stability improvement. Behavioral models were made to capture the effects of local dynamics and its collective effects. These models were shown to have a good correlation with the HSPICE circuit simulations and measured values. Multiple oscillator topologies and architectures were fabricated to test the model, behavior and subsequent analysis. Different pulse amplifier based ring oscillators show trends similar to that predicted by simulations and empirical relationships developed using behavioral simulations. Further transmission line stabilized traveling wave version of the pulse oscillators show a higher stability improvement. This work opens up a new design space in the timing circuits design where all the other conventional tricks are still applicable. It also opens up an application space due to the timing stability improvement in the order of 1000 cycles where conventional ADC’s and TDC’s work. Finally this work eases up the constraints of loop filter and source phase noise when oscillators are operated in a phase locked loop