Search CORE

1,136 research outputs found

Near-optimal thermal monitoring framework for many-core systems on chip

Author: Atienza Alonso David
Chebira Amina
Ranieri Juri
Vetterli Martin
Vincenzi Alessandro
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 19/06/2014
Field of study

Chip designers place on-chip thermal sensors to measure local temperatures, thus preventing thermal runaway situations in many-core processing architectures. However, the quality of the thermal reconstruction is directly dependent on the number of placed sensors, which should be minimized, while guaranteeing full detection of all the worst case temperature gradient. In this paper, we present an entire framework for the thermal management of complex many-core architectures, such that we can precisely recover the thermal distribution from a minimal number of sensors. The proposed sensor placement algo- rithm is guaranteed to reduce the impact of noisy measurements on the reconstructed thermal distribution. We achieve significant improvements compared to the state of the art, in terms of both computational complexity and reconstruction precision. For example, if we consider a 64 cores SoC with 64 noisy sensors (σ^2 = 4), we achieve an average reconstruction error of 1.5C, that is less than the half of what previous state-of-the-art methods achieve. We also study the practical limits of the proposed method and show that we do not need realistic workloads to learn the model and efficiently place the sensors. In fact, we show that the reconstruction error is not significantly increased if we randomly generate the power-traces of the components or if we have just a part of the correct workload

Infoscience - École polytechnique fédérale de Lausanne

Thermal profiling of homogeneous multi-core processors using sensor mini-networks

Author: Dellaquila Katherine
Publication venue: RIT Scholar Works
Publication date: 01/08/2010
Field of study

With large-scale integration and high power density in current generation microprocessors, thermal management is becoming a critical component of system design. Specifically, accurate thermal monitoring using on-die sensors is vital for system reliability and recovery. Achieving an accurate thermal profile of a system with an optimal number of sensors is integral for thermal management. This work focuses on a sensor placement mechanism and an on-chip sensor mini-network to combine temperatures from multiple sensors to determine the full thermal profile of a chip. The sensor placement mechanism proposed in this work uses non-uniform subsampling of thermal maps with k-means clustering. Using this sensing technique with cubic interpolation, an 8-core architecture thermal map was successfully recovered with an average error improvement of 90% over sensor placement via basic k-means clustering. All the simulations were run using HotSpot 5.0 modeling Alpha 21364 processor as a baseline core. The sensor mini-network using both differential encoding and distributed source coding was analyzed on a 1024-core architecture. Distributed source coding compression required fewer transmissions than differential encoding and reduced the number of transmitted bits by 36% over a sensor mini-network with no compression

RIT Scholar Works

Recommended from our members

On thermal sensor calibration and software techniques for many-core thermal management

Author: Lu Shiting
Publication venue: ScholarWorks@UMass Amherst
Publication date: 09/11/2015
Field of study

The high power density of a many-core processor results in increased temperature which negatively impacts system reliability and performance. Dynamic thermal management applies thermal-aware techniques at run time to avoid overheating using temperature information collected from on-chip thermal sensors. Temperature sensing and thermal control schemes are two critical technologies for successfully maintaining thermal safety. In this dissertation, on-line thermal sensor calibration schemes are developed to provide accurate temperature information. Software-based dynamic thermal management techniques are proposed using calibrated thermal sensors. Due to process variation and silicon aging, on-chip thermal sensors require periodic calibration before use in DTM. However, the calibration cost for thermal sensors can be prohibitively high as the number of on-chip sensors increases. Linear models which are suitable for on-line calculation are employed to estimate temperatures at multiple sensor locations using performance counters. The estimated temperature and the actual sensor thermal profile show a very high similarity with correlation coefficient ~0.9 for SPLASH2 and SPEC2000 benchmarks. A calibration approach is proposed to combine potentially inaccurate temperature values obtained from two sources: thermal sensor readings and temperature estimations. A data fusion strategy based on Bayesian inference, which combines information from these two sources, is demonstrated. The result shows the strategy can effectively recalibrate sensor readings in response to inaccuracies caused by process variation and environmental noise. The average absolute error of the corrected sensor temperature readings is A dynamic task allocation strategy is proposed to address localized overheating in many-core systems. Our approach employs reinforcement learning, a dynamic machine learning algorithm that performs task allocation based on current temperatures and a prediction regarding which assignment will minimize the peak temperature. Our results show that the proposed technique is fast (scheduling performed in \u3c1 \u3ems) and can efficiently reduce peak temperature by up to 8 degree C in a 49-core processor (6% on average) versus a leading competing task allocation approach for a series of SPLASH-2 benchmarks. Reinforcement learning has also been applied to 3D integrated circuits to allocate tasks with thermal awareness

ScholarWorks@UMass Amherst

Energy-Efficient and Reliable Computing in Dark Silicon Era

Author: Haghbayan Mohammad-Hashem
Publication venue: fi=Turku Centre for Computer Science|en=Turku Centre for Computer Science|
Publication date: 09/01/2018
Field of study

Dark silicon denotes the phenomenon that, due to thermal and power constraints, the fraction of transistors that can operate at full frequency is decreasing in each technology generation. Moore’s law and Dennard scaling had been backed and coupled appropriately for five decades to bring commensurate exponential performance via single core and later muti-core design. However, recalculating Dennard scaling for recent small technology sizes shows that current ongoing multi-core growth is demanding exponential thermal design power to achieve linear performance increase. This process hits a power wall where raises the amount of dark or dim silicon on future multi/many-core chips more and more. Furthermore, from another perspective, by increasing the number of transistors on the area of a single chip and susceptibility to internal defects alongside aging phenomena, which also is exacerbated by high chip thermal density, monitoring and managing the chip reliability before and after its activation is becoming a necessity. The proposed approaches and experimental investigations in this thesis focus on two main tracks: 1) power awareness and 2) reliability awareness in dark silicon era, where later these two tracks will combine together. In the first track, the main goal is to increase the level of returns in terms of main important features in chip design, such as performance and throughput, while maximum power limit is honored. In fact, we show that by managing the power while having dark silicon, all the traditional benefits that could be achieved by proceeding in Moore’s law can be also achieved in the dark silicon era, however, with a lower amount. Via the track of reliability awareness in dark silicon era, we show that dark silicon can be considered as an opportunity to be exploited for different instances of benefits, namely life-time increase and online testing. We discuss how dark silicon can be exploited to guarantee the system lifetime to be above a certain target value and, furthermore, how dark silicon can be exploited to apply low cost non-intrusive online testing on the cores. After the demonstration of power and reliability awareness while having dark silicon, two approaches will be discussed as the case study where the power and reliability awareness are combined together. The first approach demonstrates how chip reliability can be used as a supplementary metric for power-reliability management. While the second approach provides a trade-off between workload performance and system reliability by simultaneously honoring the given power budget and target reliability

UTUPub

Recommended from our members

PROCESSOR TEMPERATURE AND RELIABILITY ESTIMATION USING ACTIVITY COUNTERS

Author: Chhablani Mayank
Publication venue: ScholarWorks@UMass Amherst
Publication date: 23/03/2016
Field of study

With the advent of technology scaling lifetime reliability is an emerging threat in high-performance and deadline-critical systems. High on-chip thermal gradients accelerates localised thermal elevations (hotspots) which increases the aging rate of the semiconductor devices. As a result, reliable operation of the processors has become a challenging task. Therefore, cost effective schemes for estimating temperature and reliability are crucial. In this work we present a reliability estimation scheme that is based on a light-weight temperature estimation technique that monitors hardware events. Unlike previously pro- posed hardware counter-based approaches, our approach involves a linear-temporal-feedback estimator, taking into account the effects of thermal inertia. The proposed approach shows an average absolute error of We then present a counter-based technique to estimate the thermal accelerated aging factor (TAAF), which is an indicator of lifetime reliability. Results demonstrate that the estimation error is within [−3, +5]

ScholarWorks@UMass Amherst

Thermal-Aware Networked Many-Core Systems

Author: Vaddina Kameswar Rao
Publication venue: Turku Centre for Computer Science
Publication date: 23/05/2014
Field of study

Advancements in IC processing technology has led to the innovation and growth happening in the consumer electronics sector and the evolution of the IT infrastructure supporting this exponential growth. One of the most difficult obstacles to this growth is the removal of large amount of heatgenerated by the processing and communicating nodes on the system. The scaling down of technology and the increase in power density is posing a direct and consequential effect on the rise in temperature. This has resulted in the increase in cooling budgets, and affects both the life-time reliability and performance of the system. Hence, reducing on-chip temperatures has become a major design concern for modern microprocessors. This dissertation addresses the thermal challenges at different levels for both 2D planer and 3D stacked systems. It proposes a self-timed thermal monitoring strategy based on the liberal use of on-chip thermal sensors. This makes use of noise variation tolerant and leakage current based thermal sensing for monitoring purposes. In order to study thermal management issues from early design stages, accurate thermal modeling and analysis at design time is essential. In this regard, spatial temperature profile of the global Cu nanowire for on-chip interconnects has been analyzed. It presents a 3D thermal model of a multicore system in order to investigate the effects of hotspots and the placement of silicon die layers, on the thermal performance of a modern ip-chip package. For a 3D stacked system, the primary design goal is to maximise the performance within the given power and thermal envelopes. Hence, a thermally efficient routing strategy for 3D NoC-Bus hybrid architectures has been proposed to mitigate on-chip temperatures by herding most of the switching activity to the die which is closer to heat sink. Finally, an exploration of various thermal-aware placement approaches for both the 2D and 3D stacked systems has been presented. Various thermal models have been developed and thermal control metrics have been extracted. An efficient thermal-aware application mapping algorithm for a 2D NoC has been presented. It has been shown that the proposed mapping algorithm reduces the effective area reeling under high temperatures when compared to the state of the art.Siirretty Doriast

UTUPub

Overview of Intelligent Signal Processing Systems

Author: Chris Gwo Giun Lee
Kun-Chih (Jimmy) Chen
Wen-Hsiao Peng
Publication venue: Now Publishers
Publication date: 01/01/2023
Field of study

Directory of Open Access Journals

Energy and reliability challenges in next generation devices: integrated software solutions

Author
Publication venue: Università degli Studi di Cagliari
Publication date: 24/02/2011
Field of study

Archivio istituzionale della ricerca - Università di Cagliari

Adaptive Knobs for Resource Efficient Computing

Author: Kanduri Anil
Publication venue: fi=Turku Centre for Computer Science|en=Turku Centre for Computer Science|
Publication date: 13/12/2018
Field of study

Performance demands of emerging domains such as artificial intelligence, machine learning and vision, Internet-of-things etc., continue to grow. Meeting such requirements on modern multi/many core systems with higher power densities, fixed power and energy budgets, and thermal constraints exacerbates the run-time management challenge. This leaves an open problem on extracting the required performance within the power and energy limits, while also ensuring thermal safety. Existing architectural solutions including asymmetric and heterogeneous cores and custom acceleration improve performance-per-watt in specific design time and static scenarios. However, satisfying applications’ performance requirements under dynamic and unknown workload scenarios subject to varying system dynamics of power, temperature and energy requires intelligent run-time management. Adaptive strategies are necessary for maximizing resource efficiency, considering i) diverse requirements and characteristics of concurrent applications, ii) dynamic workload variation, iii) core-level heterogeneity and iv) power, thermal and energy constraints. This dissertation proposes such adaptive techniques for efficient run-time resource management to maximize performance within fixed budgets under unknown and dynamic workload scenarios. Resource management strategies proposed in this dissertation comprehensively consider application and workload characteristics and variable effect of power actuation on performance for pro-active and appropriate allocation decisions. Specific contributions include i) run-time mapping approach to improve power budgets for higher throughput, ii) thermal aware performance boosting for efficient utilization of power budget and higher performance, iii) approximation as a run-time knob exploiting accuracy performance trade-offs for maximizing performance under power caps at minimal loss of accuracy and iv) co-ordinated approximation for heterogeneous systems through joint actuation of dynamic approximation and power knobs for performance guarantees with minimal power consumption. The approaches presented in this dissertation focus on adapting existing mapping techniques, performance boosting strategies, software and dynamic approximations to meet the performance requirements, simultaneously considering system constraints. The proposed strategies are compared against relevant state-of-the-art run-time management frameworks to qualitatively evaluate their efficacy

UTUPub