134 research outputs found

    Lifetime reliability of multi-core systems: modeling and applications.

    Get PDF
    Huang, Lin.Thesis (M.Phil.)--Chinese University of Hong Kong, 2011.Includes bibliographical references (leaves 218-232).Abstracts in English and Chinese.Abstract --- p.iAcknowledgement --- p.ivChapter 1 --- Introduction --- p.1Chapter 1.1 --- Preface --- p.1Chapter 1.2 --- Background --- p.5Chapter 1.3 --- Contributions --- p.6Chapter 1.3.1 --- Lifetime Reliability Modeling --- p.6Chapter 1.3.2 --- Simulation Framework --- p.7Chapter 1.3.3 --- Applications --- p.9Chapter 1.4 --- Thesis Outline --- p.10Chapter I --- Modeling --- p.12Chapter 2 --- Lifetime Reliability Modeling --- p.13Chapter 2.1 --- Notation --- p.13Chapter 2.2 --- Assumption --- p.16Chapter 2.3 --- Introduction --- p.16Chapter 2.4 --- Related Work --- p.19Chapter 2.5 --- System Model --- p.21Chapter 2.5.1 --- Reliability of A Surviving Component --- p.22Chapter 2.5.2 --- Reliability of a Hybrid k-out-of-n:G System --- p.26Chapter 2.6 --- Special Cases --- p.31Chapter 2.6.1 --- Case I: Gracefully Degrading System --- p.31Chapter 2.6.2 --- Case II: Standby Redundant System --- p.33Chapter 2.6.3 --- Case III: l-out-of-3:G System with --- p.34Chapter 2.7 --- Numerical Results --- p.37Chapter 2.7.1 --- Experimental Setup --- p.37Chapter 2.7.2 --- Experimental Results and Discussion --- p.40Chapter 2.8 --- Conclusion --- p.43Chapter 2.9 --- Appendix --- p.44Chapter II --- Simulation Framework --- p.47Chapter 3 --- AgeSim: A Simulation Framework --- p.48Chapter 3.1 --- Introduction --- p.48Chapter 3.2 --- Preliminaries and Motivation --- p.51Chapter 3.2.1 --- Prior Work on Lifetime Reliability Analysis of Processor- Based Systems --- p.51Chapter 3.2.2 --- Motivation of This Work --- p.53Chapter 3.3 --- The Proposed Framework --- p.54Chapter 3.4 --- Aging Rate Calculation --- p.57Chapter 3.4.1 --- Lifetime Reliability Calculation --- p.58Chapter 3.4.2 --- Aging Rate Extraction --- p.60Chapter 3.4.3 --- Discussion on Representative Workload --- p.63Chapter 3.4.4 --- Numerical Validation --- p.65Chapter 3.4.5 --- Miscellaneous --- p.66Chapter 3.5 --- Lifetime Reliability Model for MPSoCs with Redundancy --- p.68Chapter 3.6 --- Case Studies --- p.70Chapter 3.6.1 --- Dynamic Voltage and Frequency Scaling --- p.71Chapter 3.6.2 --- Burst Task Arrival --- p.75Chapter 3.6.3 --- Task Allocation on Multi-Core Processors --- p.77Chapter 3.6.4 --- Timeout Policy on Multi-Core Processors with Gracefully Degrading Redundancy --- p.78Chapter 3.7 --- Conclusion --- p.79Chapter 4 --- Evaluating Redundancy Schemes --- p.83Chapter 4.1 --- Introduction --- p.83Chapter 4.2 --- Preliminaries and Motivation --- p.85Chapter 4.2.1 --- Failure Mechanisms --- p.85Chapter 4.2.2 --- Related Work and Motivation --- p.86Chapter 4.3 --- Proposed Analytical Model for the Lifetime Reliability of Proces- sor Cores --- p.88Chapter 4.3.1 --- "Impact of Temperature, Voltage, and Frequency" --- p.88Chapter 4.3.2 --- Impact of Workloads --- p.92Chapter 4.4 --- Lifetime Reliability Analysis for Multi-core Processors with Vari- ous Redundancy Schemes --- p.95Chapter 4.4.1 --- Gracefully Degrading System (GDS) --- p.95Chapter 4.4.2 --- Processor Rotation System (PRS) --- p.97Chapter 4.4.3 --- Standby Redundant System (SRS) --- p.98Chapter 4.4.4 --- Extension to Heterogeneous System --- p.99Chapter 4.5 --- Experimental Methodology --- p.101Chapter 4.5.1 --- Workload Description --- p.102Chapter 4.5.2 --- Temperature Distribution Extraction --- p.102Chapter 4.5.3 --- Reliability Factors --- p.103Chapter 4.6 --- Results and Discussions --- p.103Chapter 4.6.1 --- Wear-out Rate Computation --- p.103Chapter 4.6.2 --- Comparison on Lifetime Reliability --- p.105Chapter 4.6.3 --- Comparison on Performance --- p.110Chapter 4.6.4 --- Comparison on Expected Computation Amount --- p.112Chapter 4.7 --- Conclusion --- p.118Chapter III --- Applications --- p.119Chapter 5 --- Task Allocation and Scheduling for MPSoCs --- p.120Chapter 5.1 --- Introduction --- p.120Chapter 5.2 --- Prior Work and Motivation --- p.122Chapter 5.2.1 --- IC Lifetime Reliability --- p.122Chapter 5.2.2 --- Task Allocation and Scheduling for MPSoC Designs --- p.124Chapter 5.3 --- Proposed Task Allocation and Scheduling Strategy --- p.126Chapter 5.3.1 --- Problem Definition --- p.126Chapter 5.3.2 --- Solution Representation --- p.128Chapter 5.3.3 --- Cost Function --- p.129Chapter 5.3.4 --- Simulated Annealing Process --- p.130Chapter 5.4 --- Lifetime Reliability Computation for MPSoC Embedded Systems --- p.133Chapter 5.5 --- Efficient MPSoC Lifetime Approximation --- p.138Chapter 5.5.1 --- Speedup Technique I - Multiple Periods --- p.139Chapter 5.5.2 --- Speedup Technique II - Steady Temperature --- p.139Chapter 5.5.3 --- Speedup Technique III - Temperature Pre- calculation --- p.140Chapter 5.5.4 --- Speedup Technique IV - Time Slot Quantity Control --- p.144Chapter 5.6 --- Experimental Results --- p.144Chapter 5.6.1 --- Experimental Setup --- p.144Chapter 5.6.2 --- Results and Discussion --- p.146Chapter 5.7 --- Conclusion and Future Work --- p.152Chapter 6 --- Energy-Efficient Task Allocation and Scheduling --- p.154Chapter 6.1 --- Introduction --- p.154Chapter 6.2 --- Preliminaries and Problem Formulation --- p.157Chapter 6.2.1 --- Related Work --- p.157Chapter 6.2.2 --- Problem Formulation --- p.159Chapter 6.3 --- Analytical Models --- p.160Chapter 6.3.1 --- Performance and Energy Models for DVS-Enabled Pro- cessors --- p.160Chapter 6.3.2 --- Lifetime Reliability Model --- p.163Chapter 6.4 --- Proposed Algorithm for Single-Mode Embedded Systems --- p.165Chapter 6.4.1 --- Task Allocation and Scheduling --- p.165Chapter 6.4.2 --- Voltage Assignment for DVS-Enabled Processors --- p.168Chapter 6.5 --- Proposed Algorithm for Multi-Mode Embedded Systems --- p.169Chapter 6.5.1 --- Feasible Solution Set --- p.169Chapter 6.5.2 --- Searching Procedure for a Single Mode --- p.171Chapter 6.5.3 --- Feasible Solution Set Identification --- p.171Chapter 6.5.4 --- Multi-Mode Combination --- p.177Chapter 6.6 --- Experimental Results --- p.178Chapter 6.6.1 --- Experimental Setup --- p.178Chapter 6.6.2 --- Case Study --- p.180Chapter 6.6.3 --- Sensitivity Analysis --- p.181Chapter 6.6.4 --- Extensive Results --- p.183Chapter 6.7 --- Conclusion --- p.185Chapter 7 --- Customer-Aware Task Allocation and Scheduling --- p.186Chapter 7.1 --- Introduction --- p.186Chapter 7.2 --- Prior Work and Problem Formulation --- p.188Chapter 7.2.1 --- Related Work and Motivation --- p.188Chapter 7.2.2 --- Problem Formulation --- p.191Chapter 7.3 --- Proposed Design-Stage Task Allocation and Scheduling --- p.192Chapter 7.3.1 --- Solution Representation and Moves --- p.193Chapter 7.3.2 --- Cost Function --- p.196Chapter 7.3.3 --- Impact of DVFS --- p.198Chapter 7.4 --- Proposed Algorithm for Online Adjustment --- p.200Chapter 7.4.1 --- Reliability Requirement for Online Adjustment --- p.201Chapter 7.4.2 --- Analytical Model --- p.203Chapter 7.4.3 --- Overall Flow --- p.204Chapter 7.5 --- Experimental Results --- p.205Chapter 7.5.1 --- Experimental Setup --- p.205Chapter 7.5.2 --- Results and Discussion --- p.207Chapter 7.6 --- Conclusion --- p.211Chapter 7.7 --- Appendix --- p.211Chapter 8 --- Conclusion and Future Work --- p.214Chapter 8.1 --- Conclusion --- p.214Chapter 8.2 --- Future Work --- p.215Bibliography --- p.23

    Dynamic Energy and Thermal Management of Multi-Core Mobile Platforms: A Survey

    Get PDF
    Multi-core mobile platforms are on rise as they enable efficient parallel processing to meet ever-increasing performance requirements. However, since these platforms need to cater for increasingly dynamic workloads, efficient dynamic resource management is desired mainly to enhance the energy and thermal efficiency for better user experience with increased operational time and lifetime of mobile devices. This article provides a survey of dynamic energy and thermal management approaches for multi-core mobile platforms. These approaches do either proactive or reactive management. The upcoming trends and open challenges are also discussed

    Design Space Exploration and Resource Management of Multi/Many-Core Systems

    Get PDF
    The increasing demand of processing a higher number of applications and related data on computing platforms has resulted in reliance on multi-/many-core chips as they facilitate parallel processing. However, there is a desire for these platforms to be energy-efficient and reliable, and they need to perform secure computations for the interest of the whole community. This book provides perspectives on the aforementioned aspects from leading researchers in terms of state-of-the-art contributions and upcoming trends

    Energy-Efficient, Reliable and QoS-Aware Task Mapping on Cyber-Physical Systems

    Get PDF
    Cyber-Physical Systems (CPS) usually consist of a set of embedded systems (CPS nodes) connected through wireless communication, providing multiple functionalities that support different types of applications. During CPS deployment, application tasks are mapped on the CPS nodes with the objective of enhancing real-time performance, energy efficiency, and execution reliability. To satisfy these requirements, effective task mapping approaches should be designed based on different types of tasks, platforms, applications, and system requirements. In this paper, we provide a comprehensive survey regarding the task mapping methods in CPS

    Self-adaptivity of applications on network on chip multiprocessors: the case of fault-tolerant Kahn process networks

    Get PDF
    Technology scaling accompanied with higher operating frequencies and the ability to integrate more functionality in the same chip has been the driving force behind delivering higher performance computing systems at lower costs. Embedded computing systems, which have been riding the same wave of success, have evolved into complex architectures encompassing a high number of cores interconnected by an on-chip network (usually identified as Multiprocessor System-on-Chip). However these trends are hindered by issues that arise as technology scaling continues towards deep submicron scales. Firstly, growing complexity of these systems and the variability introduced by process technologies make it ever harder to perform a thorough optimization of the system at design time. Secondly, designers are faced with a reliability wall that emerges as age-related degradation reduces the lifetime of transistors, and as the probability of defects escaping post-manufacturing testing is increased. In this thesis, we take on these challenges within the context of streaming applications running in network-on-chip based parallel (not necessarily homogeneous) systems-on-chip that adopt the no-remote memory access model. In particular, this thesis tackles two main problems: (1) fault-aware online task remapping, (2) application-level self-adaptation for quality management. For the former, by viewing fault tolerance as a self-adaptation aspect, we adopt a cross-layer approach that aims at graceful performance degradation by addressing permanent faults in processing elements mostly at system-level, in particular by exploiting redundancy available in multi-core platforms. We propose an optimal solution based on an integer linear programming formulation (suitable for design time adoption) as well as heuristic-based solutions to be used at run-time. We assess the impact of our approach on the lifetime reliability. We propose two recovery schemes based on a checkpoint-and-rollback and a rollforward technique. For the latter, we propose two variants of a monitor-controller- adapter loop that adapts application-level parameters to meet performance goals. We demonstrate not only that fault tolerance and self-adaptivity can be achieved in embedded platforms, but also that it can be done without incurring large overheads. In addressing these problems, we present techniques which have been realized (depending on their characteristics) in the form of a design tool, a run-time library or a hardware core to be added to the basic architecture

    A Survey and Comparative Study of Hard and Soft Real-time Dynamic Resource Allocation Strategies for Multi/Many-core Systems

    Get PDF
    Multi-/many-core systems are envisioned to satisfy the ever-increasing performance requirements of complex applications in various domains such as embedded and high-performance computing. Such systems need to cater to increasingly dynamic workloads, requiring efficient dynamic resource allocation strategies to satisfy hard or soft real-time constraints. This article provides an extensive survey of hard and soft real-time dynamic resource allocation strategies proposed since the mid-1990s and highlights the emerging trends for multi-/many-core systems. The survey covers a taxonomy of the resource allocation strategies and considers their various optimization objectives, which have been used to provide comprehensive comparison. The strategies employ various principles, such as market and biological concepts, to perform the optimizations. The trend followed by the resource allocation strategies, open research challenges, and likely emerging research directions have also been provided

    Energy-Efficient and Reliable Computing in Dark Silicon Era

    Get PDF
    Dark silicon denotes the phenomenon that, due to thermal and power constraints, the fraction of transistors that can operate at full frequency is decreasing in each technology generation. Moore’s law and Dennard scaling had been backed and coupled appropriately for five decades to bring commensurate exponential performance via single core and later muti-core design. However, recalculating Dennard scaling for recent small technology sizes shows that current ongoing multi-core growth is demanding exponential thermal design power to achieve linear performance increase. This process hits a power wall where raises the amount of dark or dim silicon on future multi/many-core chips more and more. Furthermore, from another perspective, by increasing the number of transistors on the area of a single chip and susceptibility to internal defects alongside aging phenomena, which also is exacerbated by high chip thermal density, monitoring and managing the chip reliability before and after its activation is becoming a necessity. The proposed approaches and experimental investigations in this thesis focus on two main tracks: 1) power awareness and 2) reliability awareness in dark silicon era, where later these two tracks will combine together. In the first track, the main goal is to increase the level of returns in terms of main important features in chip design, such as performance and throughput, while maximum power limit is honored. In fact, we show that by managing the power while having dark silicon, all the traditional benefits that could be achieved by proceeding in Moore’s law can be also achieved in the dark silicon era, however, with a lower amount. Via the track of reliability awareness in dark silicon era, we show that dark silicon can be considered as an opportunity to be exploited for different instances of benefits, namely life-time increase and online testing. We discuss how dark silicon can be exploited to guarantee the system lifetime to be above a certain target value and, furthermore, how dark silicon can be exploited to apply low cost non-intrusive online testing on the cores. After the demonstration of power and reliability awareness while having dark silicon, two approaches will be discussed as the case study where the power and reliability awareness are combined together. The first approach demonstrates how chip reliability can be used as a supplementary metric for power-reliability management. While the second approach provides a trade-off between workload performance and system reliability by simultaneously honoring the given power budget and target reliability

    Towards Self-adaptive MPSoC Systems with Adaptivity Throttling

    Get PDF

    Multiprocessor System-on-Chips based Wireless Sensor Network Energy Optimization

    Get PDF
    Wireless Sensor Network (WSN) is an integrated part of the Internet-of-Things (IoT) used to monitor the physical or environmental conditions without human intervention. In WSN one of the major challenges is energy consumption reduction both at the sensor nodes and network levels. High energy consumption not only causes an increased carbon footprint but also limits the lifetime (LT) of the network. Network-on-Chip (NoC) based Multiprocessor System-on-Chips (MPSoCs) are becoming the de-facto computing platform for computationally extensive real-time applications in IoT due to their high performance and exceptional quality-of-service. In this thesis a task scheduling problem is investigated using MPSoCs architecture for tasks with precedence and deadline constraints in order to minimize the processing energy consumption while guaranteeing the timing constraints. Moreover, energy-aware nodes clustering is also performed to reduce the transmission energy consumption of the sensor nodes. Three distinct problems for energy optimization are investigated given as follows: First, a contention-aware energy-efficient static scheduling using NoC based heterogeneous MPSoC is performed for real-time tasks with an individual deadline and precedence constraints. An offline meta-heuristic based contention-aware energy-efficient task scheduling is developed that performs task ordering, mapping, and voltage assignment in an integrated manner. Compared to state-of-the-art scheduling our proposed algorithm significantly improves the energy-efficiency. Second, an energy-aware scheduling is investigated for a set of tasks with precedence constraints deploying Voltage Frequency Island (VFI) based heterogeneous NoC-MPSoCs. A novel population based algorithm called ARSH-FATI is developed that can dynamically switch between explorative and exploitative search modes at run-time. ARSH-FATI performance is superior to the existing task schedulers developed for homogeneous VFI-NoC-MPSoCs. Third, the transmission energy consumption of the sensor nodes in WSN is reduced by developing ARSH-FATI based Cluster Head Selection (ARSH-FATI-CHS) algorithm integrated with a heuristic called Novel Ranked Based Clustering (NRC). In cluster formation parameters such as residual energy, distance parameters, and workload on CHs are considered to improve LT of the network. The results prove that ARSH-FATI-CHS outperforms other state-of-the-art clustering algorithms in terms of LT.University of Derby, Derby, U
    • …
    corecore