Search CORE

167 research outputs found

Thermal-Aware Networked Many-Core Systems

Author: Vaddina Kameswar Rao
Publication venue: Turku Centre for Computer Science
Publication date: 23/05/2014
Field of study

Advancements in IC processing technology has led to the innovation and growth happening in the consumer electronics sector and the evolution of the IT infrastructure supporting this exponential growth. One of the most difficult obstacles to this growth is the removal of large amount of heatgenerated by the processing and communicating nodes on the system. The scaling down of technology and the increase in power density is posing a direct and consequential effect on the rise in temperature. This has resulted in the increase in cooling budgets, and affects both the life-time reliability and performance of the system. Hence, reducing on-chip temperatures has become a major design concern for modern microprocessors. This dissertation addresses the thermal challenges at different levels for both 2D planer and 3D stacked systems. It proposes a self-timed thermal monitoring strategy based on the liberal use of on-chip thermal sensors. This makes use of noise variation tolerant and leakage current based thermal sensing for monitoring purposes. In order to study thermal management issues from early design stages, accurate thermal modeling and analysis at design time is essential. In this regard, spatial temperature profile of the global Cu nanowire for on-chip interconnects has been analyzed. It presents a 3D thermal model of a multicore system in order to investigate the effects of hotspots and the placement of silicon die layers, on the thermal performance of a modern ip-chip package. For a 3D stacked system, the primary design goal is to maximise the performance within the given power and thermal envelopes. Hence, a thermally efficient routing strategy for 3D NoC-Bus hybrid architectures has been proposed to mitigate on-chip temperatures by herding most of the switching activity to the die which is closer to heat sink. Finally, an exploration of various thermal-aware placement approaches for both the 2D and 3D stacked systems has been presented. Various thermal models have been developed and thermal control metrics have been extracted. An efficient thermal-aware application mapping algorithm for a 2D NoC has been presented. It has been shown that the proposed mapping algorithm reduces the effective area reeling under high temperatures when compared to the state of the art.Siirretty Doriast

UTUPub

Dynamic thermal management in chip multiprocessor systems

Author: Liu Chih-Chun
Publication venue
Publication date: 15/05/2009
Field of study

Recently, processor power density has been increasing at an alarming rate result- ing in high on-chip temperature. Higher temperature increases current leakage and causes poor reliability. In our research, we ¯rst propose a Predictive Dynamic Ther- mal Management (PDTM) based on Application-based Thermal Model (ABTM) and Core-based Thermal Model (CBTM) in the multicore systems. Based on predicted temperature from ABTM and CBTM, the proposed PDTM can maintain the system temperature below a desired level by moving the running application from the possi- ble overheated core to the future coolest core (migration) and reducing the processor resources (priority scheduling) within multicore systems. Furthermore, we present the Thermal Correlative Thermal Management (TCDTM), which incorporates three main components: Statistical Workload Estimation (SWE), Future Temperature Estima- tion Model (FTEM) and Temperature-Aware Thread Controller (TATC), to model the thermal correlation e®ect and distinguish the thermal contributions from appli- cations with di®erent workload behaviors at run time in the CMP systems. The pro- posed PDTM and TCDTM enable the exploration of the tradeo® between throughput and fairness in temperature-constrained multicore systems

Texas A&M Repository

Vector support for multicore processors with major emphasis on configurable multiprocessors

Author: Yang Hongyan
Publication venue: Digital Commons @ NJIT
Publication date: 31/05/2008
Field of study

It recently became increasingly difficult to build higher speed uniprocessor chips because of performance degradation and high power consumption. The quadratically increasing circuit complexity forbade the exploration of more instruction-level parallelism (JLP). To continue raising the performance, processor designers then focused on thread-level parallelism (TLP) to realize a new architecture design paradigm. Multicore processor design is the result of this trend. It has proven quite capable in performance increase and provides new opportunities in power management and system scalability. But current multicore processors do not provide powerful vector architecture support which could yield significant speedups for array operations while maintaining arealpower efficiency. This dissertation proposes and presents the realization of an FPGA-based prototype of a multicore architecture with a shared vector unit (MCwSV). FPGA stands for Filed-Programmable Gate Array. The idea is that rather than improving only scalar or TLP performance, some hardware budget could be used to realize a vector unit to greatly speedup applications abundant in data-level parallelism (DLP). To be realistic, limited by the parallelism in the application itself and by the compiler\u27s vectorizing abilities, most of the general-purpose programs can only be partially vectorized. Thus, for efficient resource usage, one vector unit should be shared by several scalar processors. This approach could also keep the overall budget within acceptable limits. We suggest that this type of vector-unit sharing be established in future multicore chips. The design, implementation and evaluation of an MCwSV system with two scalar processors and a shared vector unit are presented for FPGA prototyping. The MicroBlaze processor, which is a commercial IP (Intellectual Property) core from Xilinx, is used as the scalar processor; in the experiments the vector unit is connected to a pair of MicroBlaze processors through standard bus interfaces. The overall system is organized in a decoupled and multi-banked structure. This organization provides substantial system scalability and better vector performance. For a given area budget, benchmarks from several areas show that the MCwSV system can provide significant performance increase as compared to a multicore system without a vector unit. However, a MCwSV system with two MicroBlazes and a shared vector unit is not always an optimized system configuration for various applications with different percentages of vectorization. On the other hand, the MCwSV framework was designed for easy scalability to potentially incorporate various numbers of scalar/vector units and various function units. Also, the flexibility inherent to FPGAs can aid the task of matching target applications. These benefits can be taken into account to create optimized MCwSV systems for various applications. So the work eventually focused on building an architecture design framework incorporating performance and resource management for application-specific MCwSV (AS-MCwSV) systems. For embedded system design, resource usage, power consumption and execution latency are three metrics to be used in design tradeoffs. The product of these metrics is used here to choose the MCwSV system with the smallest value

Digital Commons @ New Jersey Institute of Technology (NJIT)

A Multi-core Testbed on Desktop Computer for Research on Power/Thermal Aware Resource Management

Author: Dierivot Ashley
Publication venue: FIU Digital Commons
Publication date: 06/06/2014
Field of study

Our goal is to develop a flexible, customizable, and practical multi-core testbed based on an Intel desktop computer that can be utilized to assist the theoretical research on power/thermal aware resource management in design of computer systems. By integrating different modules, i.e. thread mapping/scheduling, processor/core frequency and voltage variation, temperature/power measurement, and run-time performance collection, into a systematic and unified framework, our testbed can bridge the gap between the theoretical study and practical implementation. The effectiveness for our system was validated using appropriately selected benchmarks. The importance of this research is that it complements the current theoretical research by validating the theoretical results in practical scenarios, which are closer to that in the real world. In addition, by studying the discrepancies of results of theoretical study and their applications in real world, the research also aids in identifying new research problems and directions

DigitalCommons@Florida International University

Portable, scalable, per-core power estimation for intelligent resource management

Author: Bhadauria M
Cesati M
Gioiosa R
Goel B
McKee SA
Singh K
Publication venue: IEEE
Publication date: 01/01/2010
Field of study

Performance, power, and temperature are now all first-order design constraints. Balancing power efficiency, thermal constraints, and performance requires some means to convey data about real-time power consumption and temperature to intelligent resource managers. Resource managers can use this information to meet performance goals, maintain power budgets, and obey thermal constraints. Unfortunately, obtaining the required machine introspection is challenging. Most current chips provide no support for per-core power monitoring, and when support exists, it is not exposed to software. We present a methodology for deriving per-core power models using sampled performance counter values and temperature sensor readings. We develop application-independent models for four different (four- to eight-core) platforms, validate their accuracy, and show how they can be used to guide scheduling decisions in power-aware resource managers. Model overhead is negligible, and estimations exhibit 1.1%-5.2% per-suite median error on the NAS, SPEC OMP, and SPEC 2006 benchmarks (and 1.2%-4.4% overall)

Crossref

Chalmers Research

ART

INVESTIGATING POWER MANAGEMENT SCHEMES IN OUT-OF-ORDER MICROPROCESSORS

Author: Cakmakci Yaman
Publication venue
Publication date: 01/08/2018
Field of study

The University of Manchester - Institutional Repository

Dynamic thermal management in chip multiprocessor systems

Author: Liu Chih-Chun
Publication venue
Publication date: 15/05/2009
Field of study

Texas A&M Repository

Design and Analysis of Dynamic Thermal Management in Chip Multiprocessors (CMPs)

Author: Yeo In Choon
Publication venue
Publication date
Field of study

Chip Multiprocessors (CMPs) have been prevailing in the modern microprocessor market. As the significant heat is converted by the ever-increasing power density and current leakage, the raised operating temperature in a chip has already threatened the system?s reliability and led the thermal control to be one of the most important issues needed to be addressed immediately in chip designs. Due to the cost and complexity of designing thermal packaging, many Dynamic Thermal Management (DTM) schemes have been widely adopted in modern processors. In this study, we focus on developing a simple and accurate thermal model, which provides a scheduling decision for running tasks. And we show how to design an efficient DTM scheme with negligible performance overhead. First, we propose an efficient DTM scheme for multimedia applications that tackles the thermal control problem in a unified manner. A DTM scheme for multimedia applications makes soft realtime scheduling decisions based on statistical characteristics of multimedia applications. Specifically, we model application execution characteristics as the probability distribution of the number of cycles required to decode frames. Our DTM scheme for multimedia applications has been implemented on Linux in two mobile processors providing variable clock frequencies in an Intel Pentium-M processor and an Intel Atom processor. In order to evaluate the performance of the proposed DTM scheme, we exploit two major codecs, MPEG-4 and H.264/AVC based on various frame resolutions. Our results show that our DTM scheme for multimedia applications lowers the overall temperature by 4 degrees C and the peak temperature by 6 degrees C (up to 10 degrees C), while maintaining frame drop ratio under 5% compared to existing DTM schemes for multimedia applications. Second, we propose a lightweight online workload estimation using the cumulative distribution function and architectural information via Performance Monitoring Counters (PMC) to observe the processes dynamic workload behaviors. We also present an accurate thermal model for CMP architectures to analyze the thermal correlation effects by profiling the thermal impacts from neighboring cores under the specific workload. Hence, according to the estimated workload characteristics and thermal correlation effects, we can estimate the future temperature of each core more accurately. We implement a DTM scheme considering workload characteristics and thermal correlation effects on real machines, an Intel Quad-Core Q6600 system and Dell PowerEdge 2950 (dual Intel Xeon E5310 Quad-Core) system, running applications ranging from multimedia applications to several benchmarks. Experiments results show that our DTM scheme reduces the peak temperature by 8% with 0.54% performance overhead compared to Linux Standard Scheduler, while existing DTM schemes reduce peak temperature by 4% with up to 50% performance overhead

Texas A&M Repository

Recommended from our members

Physically Equivalent Intelligent Systems for Reasoning Under Uncertainty at Nanoscale

Author: Khasanvis Santosh
Publication venue: ScholarWorks@UMass Amherst
Publication date: 09/11/2015
Field of study

Machines today lack the inherent ability to reason and make decisions, or operate in the presence of uncertainty. Machine-learning methods such as Bayesian Networks (BNs) are widely acknowledged for their ability to uncover relationships and generate causal models for complex interactions. However, their massive computational requirement, when implemented on conventional computers, hinders their usefulness in many critical problem areas e.g., genetic basis of diseases, macro finance, text classification, environment monitoring, etc. We propose a new non-von Neumann technology framework purposefully architected across all layers for solving these problems efficiently through physical equivalence, enabled by emerging nanotechnology. The architecture builds on a probabilistic information representation and multi-domain mixed-signal circuit style, and is tightly coupled to a nanoscale physical layer that spans magnetic and electrical domains. Based on bottom-up device-circuit-architecture simulations, we show up to four orders of magnitude performance improvement (using computational resolution of 0.1) vs. best-of-breed multi-core machines with 100 processors, for BNs with about a million variables. Smaller problem sizes of ~100 variables can be realized at 20 mW power consumption and very low area around a few tenths of a mm2. Our vision is to enable solving complex Bayesian problems in real time, as well as enable intelligence capabilities at a small scale everywhere, ushering in a new era of machine intelligence

ScholarWorks@UMass Amherst