56 research outputs found
Low Power Processor Architectures and Contemporary Techniques for Power Optimization – A Review
The technological evolution has increased the number of transistors for a given die area significantly and increased the switching speed from few MHz to GHz range. Such inversely proportional decline in size and boost in performance consequently demands shrinking of supply voltage and effective power dissipation in chips with millions of transistors. This has triggered substantial amount of research in power reduction techniques into almost every aspect of the chip and particularly the processor cores contained in the chip. This paper presents an overview of techniques for achieving the power efficiency mainly at the processor core level but also visits related domains such as buses and memories. There are various processor parameters and features such as supply voltage, clock frequency, cache and pipelining which can be optimized to reduce the power consumption of the processor. This paper discusses various ways in which these parameters can be optimized. Also, emerging power efficient processor architectures are overviewed and research activities are discussed which should help reader identify how these factors in a processor contribute to power consumption. Some of these concepts have been already established whereas others are still active research areas. © 2009 ACADEMY PUBLISHER
Recommended from our members
On thermal sensor calibration and software techniques for many-core thermal management
The high power density of a many-core processor results in increased temperature which negatively impacts system reliability and performance. Dynamic thermal management applies thermal-aware techniques at run time to avoid overheating using temperature information collected from on-chip thermal sensors. Temperature sensing and thermal control schemes are two critical technologies for successfully maintaining thermal safety. In this dissertation, on-line thermal sensor calibration schemes are developed to provide accurate temperature information.
Software-based dynamic thermal management techniques are proposed using calibrated thermal sensors. Due to process variation and silicon aging, on-chip thermal sensors require periodic calibration before use in DTM. However, the calibration cost for thermal sensors can be prohibitively high as the number of on-chip sensors increases. Linear models which are suitable for on-line calculation are employed to estimate temperatures at multiple sensor locations using performance counters. The estimated temperature and the actual sensor thermal profile show a very high similarity with correlation coefficient ~0.9 for SPLASH2 and SPEC2000 benchmarks.
A calibration approach is proposed to combine potentially inaccurate temperature values obtained from two sources: thermal sensor readings and temperature estimations. A data fusion strategy based on Bayesian inference, which combines information from these two sources, is demonstrated. The result shows the strategy can effectively recalibrate sensor readings in response to inaccuracies caused by process variation and environmental noise. The average absolute error of the corrected sensor temperature readings is
A dynamic task allocation strategy is proposed to address localized overheating in many-core systems. Our approach employs reinforcement learning, a dynamic machine learning algorithm that performs task allocation based on current temperatures and a prediction regarding which assignment will minimize the peak temperature. Our results show that the proposed technique is fast (scheduling performed in \u3c1 \u3ems) and can efficiently reduce peak temperature by up to 8 degree C in a 49-core processor (6% on average) versus a leading competing task allocation approach for a series of SPLASH-2 benchmarks. Reinforcement learning has also been applied to 3D integrated circuits to allocate tasks with thermal awareness
Phase Noise in CMOS Phase-Locked Loop Circuits
Phase-locked loops (PLLs) have been widely used in mixed-signal integrated circuits. With the continuously increasing demand of market for high speed, low noise devices, PLLs are playing a more important role in communications. In this dissertation, phase noise and jitter performances are investigated in different types of PLL designs. Hot carrier and negative bias temperature instability effects are analyzed from simulations and experiments. Phase noise of a CMOS phase-locked loop as a frequency synthesizer circuit is modeled from the superposition of noises from its building blocks: voltage-controlled oscillator, frequency divider, phase-frequency detector, loop filter and auxiliary input reference clock. A linear time invariant model with additive noise sources in frequency domain is presented to analyze the phase noise. The modeled phase noise results are compared with the corresponding experimentally measured results on phase-locked loop chips fabricated in 0.5 m n-well CMOS process. With the scaling of CMOS technology and the increase of electrical field, MOS transistors have become very sensitive to hot carrier effect (HCE) and negative bias temperature instability (NBTI). These two reliability issues pose challenges to designers for designing of chips in deep submicron CMOS technologies. A new strategy of switchable CMOS phase-locked loop frequency synthesizer is proposed to increase its tuning range. The switchable PLL which integrates two phase-locked loops with different tuning frequencies are designed and fabricated in 0.5 µm CMOS process to analyze the effects under HCE and NBTI. A 3V 1.2 GHz programmable phase-locked loop frequency synthesizer is designed in 0.5 μm CMOS technology. The frequency synthesizer is implemented using LC voltage-controlled oscillator (VCO) and a low power dual-modulus prescaler. The LC VCO working range is from 900MHz to 1.4GHz. Current mode logic (CML) is used in designing high speed D flip-flop in the dual-modulus prescaler circuits for low power consumption. The power consumption of the PLL chip is under 30mW. Fully differential LC VCO is used to provide high oscillation frequency. A new design of LC VCO using carbon nanotube (CNT) wire inductor has been proposed. The PLL design using CNT-LC VCO shows significant improvement in phase noise due to high-Q LC circuit
Recommended from our members
Electrical recommendations and formulas for metal fill in radio-frequency integrated circuits
With increasing transistor operating frequencies, interconnects and passive devices are becoming performance limiters in integrated circuit (IC) designs. To combat this, the interconnect layers above the active silicon are trending toward low-κ dielectrics and Cu metallization. The use of these new materials has popularized chemical mechanical polishing (CMP) to planarize the several interconnect layers. Unfortunately, the mechanical trade-offs of CMP require metal pattern density uniformity and additional dummy metal shapes fill in regions of low density. These metal fills act as parasitics that increase the capacitances in interconnects and passive devices – hindering their performance.
This work analyzes and optimizes the parasitic capacitive impact of rectangular metal fills on key passive components. Our systematic analysis of fills below a metal-insulator-metal (MIM) capacitor reveals an optimal design: large, square fills with lengths roughly 40% of MIM capacitors plate length. We fabricated such a MIM capacitor in a 250nm process showing a reduction in the substrate capacitance by half (compared to default tiling). Fill’s impact on interconnects, such as transmission lines, is also investigated. A detailed study of schemes that use grounded fills as shielding between interconnects informs an optimal grounding strategy. The strategy provides maximal isolation while minimizing capacitive loading. In fact, compared to no fills the addition of our metal fill shield increases loading by 58% while providing 58dB more isolation between example interconnects fabricated in a 130nm process.
The capacitive impact of adding metal fills is found to be more significant as process dimensions shrink. In a 65nm process the inter-level dielectric constant is 3.5, but the addition of 50% density fills causes the effective dielectric constant to be 5.5. A semi-empirical, closed-form formula is developed to calculate this effective dielectric constant. The formula is accurate to within <1% for a wide range of metal fill densities, sizes, aspect ratios, and process dimensions. This is a significant improvement over state-of-the-art formulas which are found to be accurate to within ~10%. Our high accuracy is maintained when applied to multiple layers with and without staggering. Moreover, we successfully apply the formula to calculating ground/substrate capacitances of MIM capacitors, microstrip transmission lines, and a spiral inductor. This may speed up the calculation by hundredfold or even thousandfold. Results are compared to fabricated MIM capacitors and microstrips. Calculations and measurements match to within <5% for the capacitors and <2% for the microstrips
Within-Die Delay Variation Measurement And Analysis For Emerging Technologies Using An Embedded Test Structure
Both random and systematic within-die process variations (PV) are growing more severe with shrinking geometries and increasing die size. Escalation in the variations in delay and power with reductions in feature size places higher demands on the accuracy of variation models. Their availability can be used to improve yield, and the corresponding profitability and product quality of the fabricated integrated circuits (ICs). Sources of within-die variations include optical source limitations, and layout-based systematic effects (pitch, line-width variability, and microscopic etch loading). Unfortunately, accurate models of within-die PVs are becoming more difficult to derive because of their increasingly sensitivity to design-context. Embedded test structures (ETS) continue to play an important role in the development of models of PVs and as a mechanism to improve correlations between hardware and models. Variations in path delays are increasing with scaling, and are increasingly affected by neighborhood\u27 interactions. In order to fully characterize within-die variations, delays must be measured in the context of actual core-logic macros. Doing so requires the use of an embedded test structure, as opposed to traditional scribe line test structures such as ring oscillators (RO). Accurate measurements of within-die variations can be used, e.g., to better tune models to actual hardware (model-to-hardware correlations). In this research project, I propose an embedded test structure called REBEL (Regional dELay BEhavior) that is designed to measure path delays in a minimally invasive fashion; and its architecture measures the path delays more accurately. Design for manufacture-ability (DFM) analysis is done on the on 90 nm ASIC chips and 28nm Zynq 7000 series FPGA boards. I present ASIC results on within-die path delay variations in a floating-point unit (FPU) fabricated in IBM\u27s 90 nm technology, with 5 pipeline stages, used as a test vehicle in chip experiments carried out at nine different temperature/voltage (TV) corners. Also experimental data has been analyzed for path delay variations in short vs long paths. FPGA results on within-die variation and die-to-die variations on Advanced Encryption System (AES) using single pipelined stage are also presented. Other analysis that have been performed on the calibrated path delays are Flip Flop propagation delays for both rising and falling edge (tpHL and tpLH), uncertainty analysis, path distribution analysis, short versus long path variations and mid-length path within-die variation. I also analyze the impact on delay when the chips are subjected to industrial-level temperature and voltage variations. From the experimental results, it has been established that the proposed REBEL provides capabilities similar to an off-chip logic analyzer, i.e., it is able to capture the temporal behavior of the signal over time, including any static and dynamic hazards that may occur on the tested path. The ASIC results further show that path delays are correlated to the launch-capture (LC) interval used to time them. Therefore, calibration as proposed in this work must be carried out in order to obtain an accurate analysis of within-die variations. Results on ASIC chips show that short paths can vary up to 35% on average, while long paths vary up to 20% at nominal temperature and voltage. A similar trend occurs for within-die variations of mid-length paths where magnitudes reduced to 20% and 5%, respectively. The magnitude of delay variations in both these analyses increase as temperature and voltage are changed to increase performance. The high level of within-die delay variations are undesirable from a design perspective, but they represent a rich source of entropy for applications that make use of \u27secrets\u27 such as authentication, hardware metering and encryption. Physical unclonable functions (PUFs) are a class of primitives that leverage within-die-variations as a means of generating random bit strings for these types of applications, including hardware security and trust. Zynq FPGAs Die-to-Die and within-die variation study shows that on average there is 5% of within-Die variation and the range of die-to-Die variation can go upto 3ns. The die-to-Die variations can be explored in much further detail to study the variations spatial dependance. Additionally, I also carried out research in the area data mining to cater for big data by focusing the work on decision tree classification (DTC) to speed-up the classification step in hardware implementation. For this purpose, I devised a pipelined architecture for the implementation of axis parallel binary decision tree classification for meeting up with the requirements of execution time and minimal resource usage in terms of area. The motivation for this work is that analyzing larger data-sets have created abundant opportunities for algorithmic and architectural developments, and data-mining innovations, thus creating a great demand for faster execution of these algorithms, leading towards improving execution time and resource utilization. Decision trees (DT) have since been implemented in software programs. Though, the software implementation of DTC is highly accurate, the execution times and the resource utilization still require improvement to meet the computational demands in the ever growing industry. On the other hand, hardware implementation of DT has not been thoroughly investigated or reported in detail. Therefore, I propose a hardware acceleration of pipelined architecture that incorporates the parallel approach in acquiring the data by having parallel engines working on different partitions of data independently. Also, each engine is processing the data in a pipelined fashion to utilize the resources more efficiently and reduce the time for processing all the data records/tuples. Experimental results show that our proposed hardware acceleration of classification algorithms has increased throughput, by reducing the number of clock cycles required to process the data and generate the results, and it requires minimal resources hence it is area efficient. This architecture also enables algorithms to scale with increasingly large and complex data sets. We developed the DTC algorithm in detail and explored techniques for adapting it to a hardware implementation successfully. This system is 3.5 times faster than the existing hardware implementation of classification.\u2
Modeling, design, and characterization of through vias in silicon and glass interposers
Advancements in very large scale integration (VLSI) technology have led to unprecedented transistor and interconnect scaling. Further miniaturization by traditional IC scaling in future planar CMOS technology faces significant challenges. Stacking of ICs (3D IC) using three dimensional (3D) integration technology helps in significantly reducing wiring lengths, interconnect latency and power dissipation while reducing the size of the chip and enhancing performance. Interposer technology with ultra-fine pitch interconnections needs to be developed to support the huge I/O connection requirement for packaging 3D ICs. Through vias in stacked silicon ICs and interposers are the key components of a 3D system.
The objective of this dissertation is to model through vias in 3D silicon and glass interposers and, to address power and high-speed signal integrity issues in 3D interposers considering silicon biasing effects.
An equivalent circuit model of the through via in silicon interposer (Si TPV) has been proposed considering the bias voltage dependent Metal-Oxide-Semiconductor (MOS) capacitance effect. Important design guidelines and optimizations are proposed for Si TPVs used in the signal delivery network, power delivery network (PDN), and as variable capacitors.
Through vias in glass interposers (Glass TPVs) are modeled, designed and simulated by using electromagnetic field solvers. Signal and power integrity analyses are performed for silicon and glass interposers. PDN design is proposed by utilizing the MOS capacitance of the Si TPVs for decoupling.PhDCommittee Chair: Tummala, Rao; Committee Co-Chair: Swaminathan, Madhavan; Committee Member: Lim, Sung Kyu; Committee Member: Mukhopadhyay, Saibal; Committee Member: Sitaraman, Suresh; Committee Member: Sundaram, Venk
Above-IC RF MEMS devices for communication applications
Wireless communications are showing an explosive growth in emerging consumer and military applications of radiofrequency (RF), microwave, and millimeter-wave circuits and systems. Applications include wireless personal connectivity (Bluetooth), wireless local area networks (WLAN), mobile communication systems (GSM, GPRS, UMTS, CDMA), satellite communications and automotive electronics. Future cell phones and ground communication systems as well as communication satellites will require more and more sophisticated technologies. The increasing demand for size and weight reduction, cost savings, low power consumption, increased frequency and higher functionality and reconfigurability as part of multiband and multistandard operation is necessitating the use of highly integrated RF front-end circuits. Chip scaling has made a major contribution to this goal, but today a situation has been reached where the presence of numerous off-chip passive RF components imposes a critical bottleneck to further integration and miniaturization of wireless transceivers. Microelectromechanical systems (MEMS) technology is a rapidly emerging enabling technology that is intended to replace the discrete passives by their integrated counterparts. In this thesis, an original metal surface micromachining process, which is compatible with CMOS post-processing, for above-IC integration of RF MEMS tunable capacitors and suspended inductors is presented. A detailed study on SF6 inductively coupled plasma (ICP) releasing has been performed in order to ascertain the optimal process parameters. This study has emphasized the fact that temperature plays an important role in this process by limiting silicon dioxide etching. Moreover, the optimized recipe has been found to be independent of the sacrificial layer used (amorphous or polycrystalline silicon) and its thickness. Using this recipe, 15.6 µm/min Si underetch rate with high Si: SiO2 selectivity (> 20000: 1) has been obtained. Single-air-gap and double-air-gap parallel-plate MEMS tunable capacitors have been designed, fabricated and characterized in the pF range, from 1 MHz to 13.5 GHz. It has been shown that an optimized design of the suspended membrane and direct symmetrical current feed at both ports can significantly improve the quality factor and increase the self-resonant frequency, pushing it to 12 GHz and beyond. The maximum capacitance tuning range obtained for a single-air-gap capacitor is 29% for a bias voltage of 20 V. The maximum capacitance tuning range obtained for a double-air-gap capacitor is 207% for a bias voltage of 70 V. The post-processing of X-FAB BiCMOS wafers has been successfully demonstrated to fabricate monolithically integrated VCOs with above-IC MEMS LC tank. Comparing a suspended inductor and the X-FAB inductor with the same design, it has been shown that increasing the thickness of the spiral from 2.3 to 4 µm and having the spiral suspended 3 µm above the passivation layers lead to an improvement factor of 2 for the peak quality factor and a shift of the self-resonant frequency beyond 15 GHz. No significant variation on bipolar and MOS transistors characteristics due to the post-processing has been observed and we conclude that the variation due to post-processing is in the same range as the wafer-to-wafer variation. Based on our metal surface micromachining process, coplanar waveguide (CPW) MEMS shunt capacitive switches and variable true-time delay lines (V-TTDLs) have been designed, fabricated and characterized in the 1 - 20 GHz range. A novel MEMS device architecture: the SG-MOSFET, which combines a solid-state MOS transistor and a metal suspended gate has been proposed as DC current switch. The corresponding fabrication process using polysilicon as a sacrificial layer has been developed to release metal gate suspended over gate oxide by SF6 plasma. Very abrupt current switches have been demonstrated with subthreshold slope better than 10 mV/decade (better than the theoretical solid-state bulk or SOI MOSFET limit of 60 mV/decade) and ultra-low gate leakage (less than 0.001 pA/µm2) due to the air-gap
- …