1,395 research outputs found

    Temperature Regulation in Multicore Processors Using Adjustable-Gain Integral Controllers

    Full text link
    This paper considers the problem of temperature regulation in multicore processors by dynamic voltage-frequency scaling. We propose a feedback law that is based on an integral controller with adjustable gain, designed for fast tracking convergence in the face of model uncertainties, time-varying plants, and tight computing-timing constraints. Moreover, unlike prior works we consider a nonlinear, time-varying plant model that trades off precision for simple and efficient on-line computations. Cycle-level, full system simulator implementation and evaluation illustrates fast and accurate tracking of given temperature reference values, and compares favorably with fixed-gain controllers.Comment: 8 pages, 6 figures, IEEE Conference on Control Applications 2015, Accepted Versio

    Measurement, Modeling, and Characterization for Power-Aware Computing

    Get PDF
    Societyโ€™s increasing dependence on information technology has resulted in the deployment of vast compute resources. The energy costs of operating these resources coupled with environmental concerns have made power-aware computingone of the primary challenges for the IT sector. Making energy-efficient computing a rule rather than an exception requires that researchers and system designers use the right set of techniques and tools. These involve measuring,modeling, and characterizing the energy consumption of computers at varying degrees of granularity.In this thesis, we present techniques to measure power consumption of computer systems at various levels. We compare them for accuracy and sensitivityand discuss their effectiveness. We test Intelโ€™s hardware power model for estimation accuracy and show that it is fairly accurate for estimating energy consumption when sampled at the temporal granularity of more than tens ofmilliseconds.We present a methodology to estimate per-core processor power consumption using performance counter and temperature-based power modeling and validate it across multiple platforms. We show our model exhibits negligible computationoverhead, and the median estimation errors ranges from 0.3% to 10.1% for applications from SPEC2006, SPEC-OMP and NAS benchmarks. We test the usefulness of the model in a meta-scheduler to enforce power constraint on a system.Finally, we perform a detailed performance and energy characterization of Intelโ€™s Restricted Transactional Memory (RTM). We use TinySTM software transactional memory (STM) system to benchmark RTMโ€™s performance against competing STM alternatives. We use microbenchmarks and STAMP benchmarksuite to compare RTM versus STM performance and energy behavior. We quantify the RTM hardware limitations that affect its success rate. We show that RTM performs better than TinySTM when working-set fits inside the cache and that RTM is better at handling high contention workloads

    Power and Thermal Management of System-on-Chip

    Get PDF

    Architectural level delay and leakage power modelling of manufacturing process variation

    Get PDF
    PhD ThesisThe effect of manufacturing process variations has become a major issue regarding the estimation of circuit delay and power dissipation, and will gain more importance in the future as device scaling continues in order to satisfy market place demands for circuits with greater performance and functionality per unit area. Statistical modelling and analysis approaches have been widely used to reflect the effects of a variety of variational process parameters on system performance factor which will be described as probability density functions (PDFs). At present most of the investigations into statistical models has been limited to small circuits such as a logic gate. However, the massive size of present day electronic systems precludes the use of design techniques which consider a system to comprise these basic gates, as this level of design is very inefficient and error prone. This thesis proposes a methodology to bring the effects of process variation from transistor level up to architectural level in terms of circuit delay and leakage power dissipation. Using a first order canonical model and statistical analysis approach, a statistical cell library has been built which comprises not only the basic gate cell models, but also more complex functional blocks such as registers, FIFOs, counters, ALUs etc. Furthermore, other sensitive factors to the overall system performance, such as input signal slope, output load capacitance, different signal switching cases and transition types are also taken into account for each cell in the library, which makes it adaptive to an incremental circuit design. The proposed methodology enables an efficient analysis of process variation effects on system performance with significantly reduced computation time compared to the Monte Carlo simulation approach. As a demonstration vehicle for this technique, the delay and leakage power distributions of a 2-stage asynchronous micropipeline circuit has been simulated using this cell library. The experimental results show that the proposed method can predict the delay and leakage power distribution with less than 5% error and at least 50,000 times faster computation time compare to 5000-sample SPICE based Monte Carlo simulation. The methodology presented here for modelling process variability plays a significant role in Design for Manufacturability (DFM) by quantifying the direct impact of process variations on system performance. The advantages of being able to undertake this analysis at a high level of abstraction and thus early in the design cycle are two fold. First, if the predicted effects of process variation render the circuit performance to be outwith specification, design modifications can be readily incorporated to rectify the situation. Second, knowing what the acceptable limits of process variation are to maintain design performance within its specification, informed choices can be made regarding the implementation technology and manufacturer selected to fabricate the design

    Quality estimation and optimization of adaptive stereo matching algorithms for smart vehicles

    Get PDF
    Stereo matching is a promising approach for smart vehicles to find the depth of nearby objects. Transforming a traditional stereo matching algorithm to its adaptive version has potential advantages to achieve the maximum quality (depth accuracy) in a best-effort manner. However, it is very challenging to support this adaptive feature, since (1) the internal mechanism of adaptive stereo matching (ASM) has to be accurately modeled, and (2) scheduling ASM tasks on multiprocessors to generate the maximum quality is difficult under strict real-time constraints of smart vehicles. In this article, we propose a framework for constructing an ASM application and optimizing its output quality on smart vehicles. First, we empirically convert stereo matching into ASM by exploiting its inherent characteristics of disparityโ€“cycle correspondence and introduce an exponential quality model that accurately represents the qualityโ€“cycle relationship. Second, with the explicit quality model, we propose an efficient quadratic programming-based dynamic voltage/frequency scaling (DVFS) algorithm to decide the optimal operating strategy, which maximizes the output quality under timing, energy, and temperature constraints. Third, we propose two novel methods to efficiently estimate the parameters of the quality model, namely location similarity-based feature point thresholding and street scenario-confined CNN prediction. Results show that our DVFS algorithm achieves at least 1.61 times quality improvement compared to the state-of-the-art techniques, and average parameter estimation for the quality model achieves 96.35% accuracy on the straight road

    Improving processor efficiency through thermal modeling and runtime management of hybrid cooling strategies

    Full text link
    One of the main challenges in building future high performance systems is the ability to maintain safe on-chip temperatures in presence of high power densities. Handling such high power densities necessitates novel cooling solutions that are significantly more efficient than their existing counterparts. A number of advanced cooling methods have been proposed to address the temperature problem in processors. However, tradeoffs exist between performance, cost, and efficiency of those cooling methods, and these tradeoffs depend on the target system properties. Hence, a single cooling solution satisfying optimum conditions for any arbitrary system does not exist. This thesis claims that in order to reach exascale computing, a dramatic improvement in energy efficiency is needed, and achieving this improvement requires a temperature-centric co-design of the cooling and computing subsystems. Such co-design requires detailed system-level thermal modeling, design-time optimization, and runtime management techniques that are aware of the underlying processor architecture and application requirements. To this end, this thesis first proposes compact thermal modeling methods to characterize the complex thermal behavior of cutting-edge cooling solutions, mainly Phase Change Material (PCM)-based cooling, liquid cooling, and thermoelectric cooling (TEC), as well as hybrid designs involving a combination of these. The proposed models are modular and they enable fast and accurate exploration of a large design space. Comparisons against multi-physics simulations and measurements on testbeds validate the accuracy of our models (resulting in less than 1C error on average) and demonstrate significant reductions in simulation time (up to four orders of magnitude shorter simulation times). This thesis then introduces temperature-aware optimization techniques to maximize energy efficiency of a given system as a whole (including computing and cooling energy). The proposed optimization techniques approach the temperature problem from various angles, tackling major sources of inefficiency. One important angle is to understand the application power and performance characteristics and to design management techniques to match them. For workloads that require short bursts of intense parallel computation, we propose using PCM-based cooling in cooperation with a novel Adaptive Sprinting technique. By tracking the PCM state and incorporating this information during runtime decisions, Adaptive Sprinting utilizes the PCM heat storage capability more efficiently, achieving 29\% performance improvement compared to existing sprinting policies. In addition to the application characteristics, high heterogeneity in on-chip heat distribution is an important factor affecting efficiency. Hot spots occur on different locations of the chip with varying intensities; thus, designing a uniform cooling solution to handle worst-case hot spots significantly reduces the cooling efficiency. The hybrid cooling techniques proposed as part of this thesis address this issue by combining the strengths of different cooling methods and localizing the cooling effort over hot spots. Specifically, the thesis introduces LoCool, a cooling system optimizer that minimizes cooling power under temperature constraints for hybrid-cooled systems using TECs and liquid cooling. Finally, the scope of this work is not limited to existing advanced cooling solutions, but it also extends to emerging technologies and their potential benefits and tradeoffs. One such technology is integrated flow cell array, where fuel cells are pumped through microchannels, providing both cooling and on-chip power generation. This thesis explores a broad range of design parameters including maximum chip temperature, leakage power, and generated power for flow cell arrays in order to maximize the benefits of integrating this technology with computing systems. Through thermal modeling and runtime management techniques, and by exploring the design space of emerging cooling solutions, this thesis provides significant improvements in processor energy efficiency.2018-07-09T00:00:00

    ๋กœ์ง ๋ฐ ํ”ผ์ง€์ปฌ ํ•ฉ์„ฑ์—์„œ์˜ ํƒ€์ด๋ฐ ๋ถ„์„๊ณผ ์ตœ์ ํ™”

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€, 2020. 8. ๊น€ํƒœํ™˜.Timing analysis is one of the necessary steps in the development of a semiconductor circuit. In addition, it is increasingly important in the advanced process technologies due to various factors, including the increase of processโ€“voltageโ€“temperature variation. This dissertation addresses three problems related to timing analysis and optimization in logic and physical synthesis. Firstly, most static timing analysis today are based on conventional fixed flip-flop timing models, in which every flip-flop is assumed to have a fixed clock-to-Q delay. However, setup and hold skews affect the clock-to-Q delay in reality. In this dissertation, I propose a mathematical formulation to solve the problem and apply it to the clock skew scheduling problems as well as to the analysis of a given circuit, with a scalable speedup technique. Secondly, near-threshold computing is one of the promising concepts for energy-efficient operation of VLSI systems, but wide performance variation and nonlinearity to process variations block the proliferation. To cope with this, I propose a holistic hardware performance monitoring methodology for accurate timing prediction in a near-threshold voltage regime and advanced process technology. Lastly, an asynchronous circuit is one of the alternatives to the conventional synchronous style, and asynchronous pipeline circuit especially attractive because of its small design effort. This dissertation addresses the synthesis problem of lightening two-phase bundled-data asynchronous pipeline controllers, in which delay buffers are essential for guaranteeing the correct handshaking operation but incurs considerable area increase.ํƒ€์ด๋ฐ ๋ถ„์„์€ ๋ฐ˜๋„์ฒด ํšŒ๋กœ ๊ฐœ๋ฐœ ํ•„์ˆ˜ ๊ณผ์ • ์ค‘ ํ•˜๋‚˜๋กœ, ์ตœ์‹  ๊ณต์ •์ผ์ˆ˜๋ก ๊ณต์ •-์ „์••-์˜จ๋„ ๋ณ€์ด ์ฆ๊ฐ€๋ฅผ ํฌํ•จํ•œ ๋‹ค์–‘ํ•œ ์š”์ธ์œผ๋กœ ํ•˜์—ฌ๊ธˆ ๊ทธ ์ค‘์š”์„ฑ์ด ์ปค์ง€๊ณ  ์žˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋กœ์ง ๋ฐ ํ”ผ์ง€์ปฌ ํ•ฉ์„ฑ๊ณผ ๊ด€๋ จํ•˜์—ฌ ์„ธ ๊ฐ€์ง€ ํƒ€์ด๋ฐ ๋ถ„์„ ๋ฐ ์ตœ์ ํ™” ๋ฌธ์ œ์— ๋Œ€ํ•ด ๋‹ค๋ฃฌ๋‹ค. ์ฒซ์งธ๋กœ, ์˜ค๋Š˜๋‚  ๋Œ€๋ถ€๋ถ„์˜ ์ •์  ํƒ€์ด๋ฐ ๋ถ„์„์€ ๋ชจ๋“  ํ”Œ๋ฆฝ-ํ”Œ๋กญ์˜ ํด๋Ÿญ-์ถœ๋ ฅ ๋”œ๋ ˆ์ด๊ฐ€ ๊ณ ์ •๋œ ๊ฐ’์ด๋ผ๋Š” ๊ฐ€์ •์„ ๋ฐ”ํƒ•์œผ๋กœ ์ด๋ฃจ์–ด์กŒ๋‹ค. ํ•˜์ง€๋งŒ ์‹ค์ œ ํด๋Ÿญ-์ถœ๋ ฅ ๋”œ๋ ˆ์ด๋Š” ํ•ด๋‹น ํ”Œ๋ฆฝ-ํ”Œ๋กญ์˜ ์…‹์—… ๋ฐ ํ™€๋“œ ์Šคํ์— ์˜ํ–ฅ์„ ๋ฐ›๋Š”๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ด๋Ÿฌํ•œ ํŠน์„ฑ์„ ์ˆ˜ํ•™์ ์œผ๋กœ ์ •๋ฆฌํ•˜์˜€์œผ๋ฉฐ, ์ด๋ฅผ ํ™•์žฅ ๊ฐ€๋Šฅํ•œ ์†๋„ ํ–ฅ์ƒ ๊ธฐ๋ฒ•๊ณผ ๋”๋ถˆ์–ด ์ฃผ์–ด์ง„ ํšŒ๋กœ์˜ ํƒ€์ด๋ฐ ๋ถ„์„ ๋ฐ ํด๋Ÿญ ์Šคํ ์Šค์ผ€์ฅด๋ง ๋ฌธ์ œ์— ์ ์šฉํ•˜์˜€๋‹ค. ๋‘˜์งธ๋กœ, ์œ ์‚ฌ ๋ฌธํ„ฑ ์—ฐ์‚ฐ์€ ์ดˆ๊ณ ์ง‘์  ํšŒ๋กœ ๋™์ž‘์˜ ์—๋„ˆ์ง€ ํšจ์œจ์„ ๋Œ์–ด ์˜ฌ๋ฆด ์ˆ˜ ์žˆ๋‹ค๋Š” ์ ์—์„œ ๊ฐ๊ด‘๋ฐ›์ง€๋งŒ, ํฐ ํญ์˜ ์„ฑ๋Šฅ ๋ณ€์ด ๋ฐ ๋น„์„ ํ˜•์„ฑ ๋•Œ๋ฌธ์— ๋„๋ฆฌ ํ™œ์šฉ๋˜๊ณ  ์žˆ์ง€ ์•Š๋‹ค. ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์œ ์‚ฌ ๋ฌธํ„ฑ ์ „์•• ์˜์—ญ ๋ฐ ์ตœ์‹  ๊ณต์ • ๋…ธ๋“œ์—์„œ ๋ณด๋‹ค ์ •ํ™•ํ•œ ํƒ€์ด๋ฐ ์˜ˆ์ธก์„ ์œ„ํ•œ ํ•˜๋“œ์›จ์–ด ์„ฑ๋Šฅ ๋ชจ๋‹ˆํ„ฐ๋ง ๋ฐฉ๋ฒ•๋ก  ์ „๋ฐ˜์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ๋น„๋™๊ธฐ ํšŒ๋กœ๋Š” ๊ธฐ์กด ๋™๊ธฐ ํšŒ๋กœ์˜ ๋Œ€์•ˆ ์ค‘ ํ•˜๋‚˜๋กœ, ๊ทธ ์ค‘์—์„œ๋„ ๋น„๋™๊ธฐ ํŒŒ์ดํ”„๋ผ์ธ ํšŒ๋กœ๋Š” ๋น„๊ต์  ์ ์€ ์„ค๊ณ„ ๋…ธ๋ ฅ๋งŒ์œผ๋กœ๋„ ๊ตฌํ˜„ ๊ฐ€๋Šฅํ•˜๋‹ค๋Š” ์žฅ์ ์ด ์žˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” 2์œ„์ƒ ๋ฌถ์Œ ๋ฐ์ดํ„ฐ ํ”„๋กœํ† ์ฝœ ๊ธฐ๋ฐ˜ ๋น„๋™๊ธฐ ํŒŒ์ดํ”„๋ผ์ธ ์ปจํŠธ๋กค๋Ÿฌ ์ƒ์—์„œ, ์ •ํ™•ํ•œ ํ•ธ๋“œ์…ฐ์ดํ‚น ํ†ต์‹ ์„ ์œ„ํ•ด ์‚ฝ์ž…๋œ ๋”œ๋ ˆ์ด ๋ฒ„ํผ์— ์˜ํ•œ ๋ฉด์  ์ฆ๊ฐ€๋ฅผ ์™„ํ™”ํ•  ์ˆ˜ ์žˆ๋Š” ํ•ฉ์„ฑ ๊ธฐ๋ฒ•์„ ์ œ์‹œํ•˜์˜€๋‹ค.1 INTRODUCTION 1 1.1 Flexible Flip-Flop Timing Model 1 1.2 Hardware Performance Monitoring Methodology 4 1.3 Asynchronous Pipeline Controller 10 1.4 Contributions of this Dissertation 15 2 ANALYSIS AND OPTIMIZATION CONSIDERING FLEXIBLE FLIP-FLOP TIMING MODEL 17 2.1 Preliminaries 17 2.1.1 Terminologies 17 2.1.2 Timing Analysis 20 2.1.3 Clock-to-Q Delay Surface Modeling 21 2.2 Clock-to-Q Delay Interval Analysis 22 2.2.1 Derivation 23 2.2.2 Additional Constraints 26 2.2.3 Analysis: Finding Minimum Clock Period 28 2.2.4 Optimization: Clock Skew Scheduling 30 2.2.5 Scalable Speedup Technique 33 2.3 Experimental Results 37 2.3.1 Application to Minimum Clock Period Finding 37 2.3.2 Application to Clock Skew Scheduling 39 2.3.3 Efficacy of Scalable Speedup Technique 43 2.4 Summary 44 3 HARDWARE PERFORMANCE MONITORING METHODOLOGY AT NTC AND ADVANCED TECHNOLOGY NODE 45 3.1 Overall Flow of Proposed HPM Methodology 45 3.2 Prerequisites to HPM Methodology 47 3.2.1 BEOL Process Variation Modeling 47 3.2.2 Surrogate Model Preparation 49 3.3 HPM Methodology: Design Phase 52 3.3.1 HPM2PV Model Construction 52 3.3.2 Optimization of Monitoring Circuits Configuration 54 3.3.3 PV2CPT Model Construction 58 3.4 HPM Methodology: Post-Silicon Phase 60 3.4.1 Transfer Learning in Silicon Characterization Step 60 3.4.2 Procedures in Volume Production Phase 61 3.5 Experimental Results 62 3.5.1 Experimental Setup 62 3.5.2 Exploration of Monitoring Circuits Configuration 64 3.5.3 Effectiveness of Monitoring Circuits Optimization 66 3.5.4 Considering BEOL PVs and Uncertainty Learning 68 3.5.5 Comparison among Different Prediction Flows 69 3.5.6 Effectiveness of Prediction Model Calibration 71 3.6 Summary 73 4 LIGHTENING ASYNCHRONOUS PIPELINE CONTROLLER 75 4.1 Preliminaries and State-of-the-Art Work 75 4.1.1 Bundled-data vs. Dual-rail Asynchronous Circuits 75 4.1.2 Two-phase vs. Four-phase Bundled-data Protocol 76 4.1.3 Conventional State-of-the-Art Pipeline Controller Template 77 4.2 Delay Path Sharing for Lightening Pipeline Controller Template 78 4.2.1 Synthesizing Sharable Delay Paths 78 4.2.2 Validating Logical Correctness for Sharable Delay Paths 80 4.2.3 Reformulating Timing Constraints of Controller Template 81 4.2.4 Minimally Allocating Delay Buffers 87 4.3 In-depth Pipeline Controller Template Synthesis with Delay Path Reusing 88 4.3.1 Synthesizing Delay Path Units 88 4.3.2 Validating Logical Correctness of Delay Path Units 89 4.3.3 Updating Timing Constraints for Delay Path Units 91 4.3.4 In-depth Synthesis Flow Utilizing Delay Path Units 95 4.4 Experimental Results 99 4.4.1 Environment Setup 99 4.4.2 Piecewise Linear Modeling of Delay Path Unit Area 99 4.4.3 Comparison of Power, Performance, and Area 102 4.5 Summary 107 5 CONCLUSION 109 5.1 Chapter 2 109 5.2 Chapter 3 110 5.3 Chapter 4 110 Abstract (In Korean) 127Docto
    • โ€ฆ
    corecore