55 research outputs found
Ultra low-power fault-tolerant SRAM design in 90nm CMOS technology
With the increment of mobile, biomedical and space applications, digital systems with
low-power consumption are required. As a main part in digital systems, low-power memories are
especially desired. Reducing the power supply voltages to sub-threshold region is one of the
effective approaches for ultra low-power applications. However, the reduced Static Noise
Margin (SNM) of Static Random Access Memory (SRAM) imposes great challenges to the subthreshold SRAM design. The conventional 6-transistor SRAM cell does not function properly at sub-threshold supply voltage range because it has no enough noise margin for reliable operation. In order to achieve ultra low-power at sub-threshold operation, previous research work has demonstrated that the read and write decoupled scheme is a good solution to the reduced SNM problem. A Dual Interlocked Storage Cell (DICE) based SRAM cell was proposed to eliminate the drawback of conventional DICE cell during read operation. This cell can mitigate the singleevent effects, improve the stability and also maintain the low-power characteristic of subthreshold SRAM, In order to make the proposed SRAM cell work under different power supply voltages from 0.3 V to 0.6 V, an improved replica sense scheme was applied to produce a reference control signal, with which the optimal read time could be achieved. In this thesis, a 2K ~8 bits SRAM test chip was designed, simulated and fabricated in 90nm CMOS technology provided by ST Microelectronics. Simulation results suggest that the operating frequency at VDD = 0.3 V is up to 4.7 MHz with power dissipation 6.0 ÆĂW, while it is 45.5 MHz at VDD = 0.6 V dissipating 140 ÆĂW. However, the area occupied by a single cell is larger than that by conventional SRAM due to additional transistors used. The main contribution of this thesis project is that we proposed a new design that could simultaneously solve the ultra low-power and radiation-tolerance problem in large capacity memory design
Recommended from our members
Variation-Tolerant and Voltage-Scalable Integrated Circuits Design
Ultra-low-voltage (ULV) operation where the supply voltage of the digital computing hardware is scaled down to the level near or below transistor threshold voltage (e.g. 300-500mV) is a key technique to achieve high computing energy efficiency. It has enabled many new exciting applications in the field of Internet of Things (IoT) devices and energy-constrained applications such as medical implants, environment sensors, and micro-robots. Ultra-low-voltage (ULV) operation is also commonly used with the emerging architectures that are often non Von-Neumann style to empower energy-efficient cognitive computing.
One the biggest challenge in realizing ULV design is the large circuit delay variability. To guarantee functionality in the worst-case process, voltage, and temperature (PVT) condition, the traditional safety margin approach requires operating at a slower clock frequency or higher supply voltage which significantly limits the achievable energy efficiency of the hardware. To fully claim the energy efficiency of ULV, the large circuit delay variation needs to be adaptively handled. However, the existing adaptive techniques that are optimized for nominal supply voltage operation and traditional Von-Neumann architectures become inefficient for ULV designs and emerging architectures.
This thesis presents adaptive techniques based on timing error detection and correction (EDAC) that are more suitable for the energy-constrained ULV designs and the emerging architectures. The proposed techniques are demonstrated in three test chips: (1) R-Processor: A 0.4V resilient processor with a voltage-scalable and low-overhead in-situ EDAC technique. It achieves 38% energy efficiency improvement or 2.3X throughput improvement as compared to the traditional safety margin approach. (2) A 450mV timing-margin-free waveform sorter for brain computer interface (BCI) microsystem. It achieves 49.3% higher energy efficiency and 35.6% higher throughput than the traditional safety margin approach. (3) Ultra-low-power and robust power-management system which consists of a microprocessor employing ULV EDAC, 63-ratio integrated switched-capacitor DC-DC converter, and a fully-digital error based regulation controller.
In this thesis, we also explore circuits for emerging techniques. The first is temperature sensors for dynamic-thermal-management (DTM). The modern high-performance microprocessors suffer from ever-increasing power densities which has led to reliability concerns and increased cooling costs from excessive heat. In order to monitor and manage the thermal behavior, DTM techniques embed multiple temperature sensors and use its information. The size, accuracy, and voltage-scalability of the sensor are critical for the performance of DTM. Therefore, we propose a temperature sensor that directly senses transistor threshold voltage and the test chip demonstrates 9X smaller area, 3X higher accuracy, and 200mV lower voltage scalability (down to 400mV) than the previous state-of-art.
Another area of exploration is interconnect design for ultra-dynamic-voltage-scaling (UDVS) systems. UDVS has been proposed for applications that require both high performance and high energy efficiency. UDVS can provide peak performance with nominal supply voltage when work load is high. When work load is moderate or low, UDVS systems can switch to ULV operation for higher energy efficiency. One of the critical challenges for developing UDVS systems is the inflexibility in various circuit fabrics that are often optimized for a single supply voltage. One critical example is conventional repeater based long interconnects which suffers from non-optimal performance and energy efficiency in UDVS systems. Therefore, in this thesis, we propose a reconfigurable interconnect design based on regenerators and demonstrate near optimal performance and energy efficiency across the supply voltage of 0.3V and 1V
Thermal-Aware Networked Many-Core Systems
Advancements in IC processing technology has led to the innovation and growth happening in the consumer electronics sector and the evolution of the IT infrastructure supporting this exponential growth. One of the most difficult obstacles to this growth is the removal of large amount of heatgenerated by the processing and communicating nodes on the system. The scaling down of technology and the increase in power density is posing a direct and consequential effect on the rise in temperature. This has resulted in the increase in cooling budgets, and affects both the life-time reliability and performance of the system. Hence, reducing on-chip temperatures has become a major design concern for modern microprocessors.
This dissertation addresses the thermal challenges at different levels for both 2D planer and 3D stacked systems. It proposes a self-timed thermal monitoring strategy based on the liberal use of on-chip thermal sensors. This makes use of noise variation tolerant and leakage current based thermal sensing for monitoring purposes. In order to study thermal management issues from early design stages, accurate thermal modeling and analysis at design time is essential. In this regard, spatial temperature profile of the global Cu nanowire for on-chip interconnects has been analyzed. It presents a 3D thermal model of a multicore system in order to investigate the effects of hotspots and the placement of silicon die layers, on the thermal performance of a modern ip-chip package. For a 3D stacked system, the primary design goal is to maximise the performance within the given power and thermal envelopes. Hence, a thermally efficient routing strategy for 3D NoC-Bus hybrid architectures has been proposed to mitigate on-chip temperatures by herding most of the switching activity to the die which is closer to heat sink. Finally, an exploration of various thermal-aware placement approaches for both the 2D and 3D stacked systems has been presented. Various thermal models have been developed and thermal control metrics have been extracted. An efficient thermal-aware application mapping algorithm for a 2D NoC has been presented. It has been shown that the proposed mapping algorithm reduces the effective area reeling under high temperatures when compared to the state of the art.Siirretty Doriast
Fault and Defect Tolerant Computer Architectures: Reliable Computing With Unreliable Devices
This research addresses design of a reliable computer from unreliable device technologies. A system architecture is developed for a fault and defect tolerant (FDT) computer. Trade-offs between different techniques are studied and yield and hardware cost models are developed. Fault and defect tolerant designs are created for the processor and the cache memory. Simulation results for the content-addressable memory (CAM)-based cache show 90% yield with device failure probabilities of 3 x 10(-6), three orders of magnitude better than non fault tolerant caches of the same size. The entire processor achieves 70% yield with device failure probabilities exceeding 10(-6). The required hardware redundancy is approximately 15 times that of a non-fault tolerant design. While larger than current FT designs, this architecture allows the use of devices much more likely to fail than silicon CMOS. As part of model development, an improved model is derived for NAND Multiplexing. The model is the first accurate model for small and medium amounts of redundancy. Previous models are extended to account for dependence between the inputs and produce more accurate results
Manufacturability Aware Design.
The aim of this work is to provide solutions that optimize the tradeoffs among design, manufacturability, and cost of ownership posed by technology scaling and sub-wavelength lithography. These solutions may take the form of robust circuit designs, cost-effective resolution technologies, accurate modeling considering process variations, and design rules assessment.
We first establish a framework for assessing the impact of process variation on circuit performance, product value and return on investment on alternative processes. Key features include comprehensive modeling and different handling on die-to-die and within-die variation, accurate models of correlations of variation, realistic and quantified projection to future process nodes, and performance sensitivity analysis to improved control of individual device parameter and variation sources.
Then we describe a novel minimum cost of correction methodology which determines the level of correction of each layout feature such that the prescribed parametric yield is attained with minimum RET (Resolution Enhancement Technology) cost. This timing driven OPC (Optical Proximity Correction) insertion flow uses a mathematical programming based slack budgeting algorithm to determine OPC level for all polysilicon gate geometries. Designs adopting this methodology show up to 20% MEBES (Manufacturing Electron Beam Exposure System) data volume reduction and 39% OPC runtime improvement.
When the systematic correction residual errors become unavoidable, we analyze their impact on a state-of-art microprocessor's speedpath skew. A platform is created for diagnosing and improving OPC quality on gates with specific functionality such as critical gates or matching transistors. Significant changes in full-chip timing analysis indicate the necessity of a post-OPC performance verification design flow.
Finally, we quantify the performance, manufacturability and mask cost impact of globally applying several common restrictive design rules. Novel approaches such as locally adapting FDRs (flexible design rules) based on image parameters range, and DRC Plus (preferred design rule enforcement with 2D pattern matching) are also described.Ph.D.Electrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/57676/2/jiey_1.pd
Application aware performance, power consumption, and reliability tradeoff
There has been an unprecedented increase in the drive for microprocessor performance. This drive is motivated by the increase in software complexity, opportunity to solve previously unattempted problems especially in scientific domain, and a need to crunch the ever growing `Big Data\u27 to enable a multitude of technological advances to benefit mankind. A consequence of these phenomena is the ever increasing transistor count in deployed computing systems.
Although technology scaling leads to lower power consumption per transistor, the overall system level power consumption is on the rise. This leads to a variety of power supply related issues. As the chip die area is not increasing significantly, and the supply voltage reduction is not keeping on par with the reduction in device dimensions, an increase in power density is observed. This manifests as an increased temperature profile on the chip floorplan. A rise in temperature necessitates aggressive and costly cooling mechanisms adding to the design complexity and manufacturing efforts. It also triggers various failure mechanisms leading to reduction in the expected chip lifetime/reliability.
Given the conflicting trends in Performance, Power consumption, and chip Reliability (PPR), it is imperative to balance them in a fine-grained fashion to meet system level goals and expectations. Sole dependence on the advancements in manufacturing technology is no longer sufficient. Alternate venues for PPR management are being increasingly paid attention to.
On the other hand, the PPR demands are usually time dependent. For example, the constraint on power consumption in a green data center is dictated by the energy reserve. The demand on performance in a cloud based platform depends on the agreed Quality of Service (QOS) requirements. The reliability of a microprocessor is dependent on the deployment domain.
The goal of our research is to address the issue of growing microprocessor power consumption subject to performance and/or reliability goals. Through our developed schemes, we tailor the execution context to match application requirements. This leads to judicious use of power while adhering to aforementioned constraints. It is to be noted that the actual demands on performance, power consumption, and reliability are highly variant, and depend upon executing applications and operating conditions. As such, we develop schemes to cater to these variant demands.
To meet these demands efficiently, the solutions developed are tailored to current hardware-software interaction characteristics. Two techniques that are very relevant in this area, namely dynamic voltage and frequency scaling (DVFS) and microarchitectural adaptation, are leveraged to produce expected PPR characteristics when executing a wide variety of tasks.
In this dissertation, we demonstrate how the expected chip lifetime can be augmented in a real-time setting using DVFS while paying heed to performance constraints modeled as QoS requirements. Individual tasks in a task queue are assigned specific voltage and frequency pairs to utilize for their execution. This assignment is empowered by knowledge of application-wise hardware-software interactions to reach solutions that are tailored to the current execution scenario. Our observations indicate that a 2 to 18 fold improvement in chip lifetime can be expected by the utilization of the schemes we develop in this regard. Capitalizing on the power of microarchitectural adaptation, we further improve chip lifetime expectations 2-8 times, based on the failure mechanism investigated. This increase in expected chip lifetime directly translates to reduction of both operational and replacement costs.
We also provide mechanisms to co-manage performance and power consumption constraints. Comprehensive microarchitectural adaptation space is very complex and its usage thus leads to significant runtime overhead. To tackle this, we devote a fair bit of attention to its pruning so as to narrow down on and utilize only the most effective adaptations. A two stage adaptation process is provided to a) improve optimality of the solutions delivered, and b) to keep the runtime overhead in check. We observe that our schemes provide 20\% higher normalized energy efficiency compared to the state of the art techniques proposed, while using just a very small fraction of the configuration space. We also find that our schemes effectively cater to a wide variety of demands on performance and power consumption, providing the necessary hardware characteristics within 10\% bound.
Since only the most useful configuration space is retained for adaptation, occurrence of a fault that prohibits the usage of a certain adaptive control can lead to the inability to satisfy a subset of hardware demands. A detailed analysis has been carried out to understand how the remaining active configurations can preserve the expected hardware behavior. To a good extent, we observe that the system behavior under a failure closely tracks (with less than 5\% tracking error) the obtainable behavior without the presence of the fault.
We believe that application tailored schemes for PPR management become increasingly relevant as the microprocessor design advancements saturate in the future. They will be extremely relevant to extract every possible ounce of performance while confirming to constraints on power consumption and reliability. Given the effectiveness of our schemes, we are confident that such schemes are applicable in different markets like embedded computing, desktop computing, cloud platforms and high performance computing. Insights drawn from our research will guide chip designers in the provision of effective adaptive controls to combat increasing demands on PPR characteristics
Design-for-delay-testability techniques for high-speed digital circuits
The importance of delay faults is enhanced by the ever increasing clock rates and decreasing geometry sizes of nowadays' circuits. This thesis focuses on the development of Design-for-Delay-Testability (DfDT) techniques for high-speed circuits and embedded cores. The rising costs of IC testing and in particular the costs of Automatic Test Equipment are major concerns for the semiconductor industry. To reverse the trend of rising testing costs, DfDT is\ud
getting more and more important
Predicting power scalability in a reconfigurable platform
This thesis focuses on the evolution of digital hardware systems. A reconfigurable platform is proposed and analysed based on thin-body, fully-depleted silicon-on-insulator Schottky-barrier transistors with metal gates and silicide source/drain (TBFDSBSOI). These offer the potential for simplified processing that will allow them to reach ultimate nanoscale gate dimensions. Technology CAD was used to show that the threshold voltage in TBFDSBSOI devices will be controllable by gate potentials that scale down with the channel dimensions while remaining within appropriate gate reliability limits. SPICE simulations determined that the magnitude of the threshold shift predicted by TCAD software would be sufficient to control the logic configuration of a simple, regular array of these TBFDSBSOI transistors as well as to constrain its overall subthreshold power growth. Using these devices, a reconfigurable platform is proposed based on a regular 6-input, 6-output NOR LUT block in which the logic and configuration functions of the array are mapped onto separate gates of the double-gate device. A new analytic model of the relationship between power (P), area (A) and performance (T) has been developed based on a simple VLSI complexity metric of the form ATÏ = constant. As Ï defines the performance âreturnâ gained as a result of an increase in area, it also represents a bound on the architectural options available in power-scalable digital systems. This analytic model was used to determine that simple computing functions mapped to the reconfigurable platform will exhibit continuous power-area-performance scaling behavior. A number of simple arithmetic circuits were mapped to the array and their delay and subthreshold leakage analysed over a representative range of supply and threshold voltages, thus determining a worse-case range for the device/circuit-level parameters of the model. Finally, an architectural simulation was built in VHDL-AMS. The frequency scaling described by Ï, combined with the device/circuit-level parameters predicts the overall power and performance scaling of parallel architectures mapped to the array
- âŠ