321 research outputs found

    A Survey and Comparative Study of Hard and Soft Real-time Dynamic Resource Allocation Strategies for Multi/Many-core Systems

    Get PDF
    Multi-/many-core systems are envisioned to satisfy the ever-increasing performance requirements of complex applications in various domains such as embedded and high-performance computing. Such systems need to cater to increasingly dynamic workloads, requiring efficient dynamic resource allocation strategies to satisfy hard or soft real-time constraints. This article provides an extensive survey of hard and soft real-time dynamic resource allocation strategies proposed since the mid-1990s and highlights the emerging trends for multi-/many-core systems. The survey covers a taxonomy of the resource allocation strategies and considers their various optimization objectives, which have been used to provide comprehensive comparison. The strategies employ various principles, such as market and biological concepts, to perform the optimizations. The trend followed by the resource allocation strategies, open research challenges, and likely emerging research directions have also been provided

    A Survey of Prediction and Classification Techniques in Multicore Processor Systems

    Get PDF
    In multicore processor systems, being able to accurately predict the future provides new optimization opportunities, which otherwise could not be exploited. For example, an oracle able to predict a certain application\u27s behavior running on a smart phone could direct the power manager to switch to appropriate dynamic voltage and frequency scaling modes that would guarantee minimum levels of desired performance while saving energy consumption and thereby prolonging battery life. Using predictions enables systems to become proactive rather than continue to operate in a reactive manner. This prediction-based proactive approach has become increasingly popular in the design and optimization of integrated circuits and of multicore processor systems. Prediction transforms from simple forecasting to sophisticated machine learning based prediction and classification that learns from existing data, employs data mining, and predicts future behavior. This can be exploited by novel optimization techniques that can span across all layers of the computing stack. In this survey paper, we present a discussion of the most popular techniques on prediction and classification in the general context of computing systems with emphasis on multicore processors. The paper is far from comprehensive, but, it will help the reader interested in employing prediction in optimization of multicore processor systems

    Analysis, design and management of multimedia multi- processor systems

    Get PDF
    Ph.DNUS-TU/E JOINT PH.D. PROGRAMM

    A Survey of Fault-Tolerance Techniques for Embedded Systems from the Perspective of Power, Energy, and Thermal Issues

    Get PDF
    The relentless technology scaling has provided a significant increase in processor performance, but on the other hand, it has led to adverse impacts on system reliability. In particular, technology scaling increases the processor susceptibility to radiation-induced transient faults. Moreover, technology scaling with the discontinuation of Dennard scaling increases the power densities, thereby temperatures, on the chip. High temperature, in turn, accelerates transistor aging mechanisms, which may ultimately lead to permanent faults on the chip. To assure a reliable system operation, despite these potential reliability concerns, fault-tolerance techniques have emerged. Specifically, fault-tolerance techniques employ some kind of redundancies to satisfy specific reliability requirements. However, the integration of fault-tolerance techniques into real-time embedded systems complicates preserving timing constraints. As a remedy, many task mapping/scheduling policies have been proposed to consider the integration of fault-tolerance techniques and enforce both timing and reliability guarantees for real-time embedded systems. More advanced techniques aim additionally at minimizing power and energy while at the same time satisfying timing and reliability constraints. Recently, some scheduling techniques have started to tackle a new challenge, which is the temperature increase induced by employing fault-tolerance techniques. These emerging techniques aim at satisfying temperature constraints besides timing and reliability constraints. This paper provides an in-depth survey of the emerging research efforts that exploit fault-tolerance techniques while considering timing, power/energy, and temperature from the real-time embedded systemsโ€™ design perspective. In particular, the task mapping/scheduling policies for fault-tolerance real-time embedded systems are reviewed and classified according to their considered goals and constraints. Moreover, the employed fault-tolerance techniques, application models, and hardware models are considered as additional dimensions of the presented classification. Lastly, this survey gives deep insights into the main achievements and shortcomings of the existing approaches and highlights the most promising ones

    A Survey of Techniques For Improving Energy Efficiency in Embedded Computing Systems

    Full text link
    Recent technological advances have greatly improved the performance and features of embedded systems. With the number of just mobile devices now reaching nearly equal to the population of earth, embedded systems have truly become ubiquitous. These trends, however, have also made the task of managing their power consumption extremely challenging. In recent years, several techniques have been proposed to address this issue. In this paper, we survey the techniques for managing power consumption of embedded systems. We discuss the need of power management and provide a classification of the techniques on several important parameters to highlight their similarities and differences. This paper is intended to help the researchers and application-developers in gaining insights into the working of power management techniques and designing even more efficient high-performance embedded systems of tomorrow

    Real-Time Task Migration for Dynamic Resource Management in Many-Core Systems

    Get PDF

    Temporal analysis and scheduling of hard real-time radios running on a multi-processor

    Get PDF
    On a multi-radio baseband system, multiple independent transceivers must share the resources of a multi-processor, while meeting each its own hard real-time requirements. Not all possible combinations of transceivers are known at compile time, so a solution must be found that either allows for independent timing analysis or relies on runtime timing analysis. This thesis proposes a design flow and software architecture that meets these challenges, while enabling features such as independent transceiver compilation and dynamic loading, and taking into account other challenges such as ease of programming, efficiency, and ease of validation. We take data flow as the basic model of computation, as it fits the application domain, and several static variants (such as Single-Rate, Multi-Rate and Cyclo-Static) have been shown to possess strong analytical properties. Traditional temporal analysis of data flow can provide minimum throughput guarantees for a self-timed implementation of data flow. Since transceivers may need to guarantee strictly periodic execution and meet latency requirements, we extend the analysis techniques to show that we can enforce strict periodicity for an actor in the graph; we also provide maximum latency analysis techniques for periodic, sporadic and bursty sources. We propose a scheduling strategy and an automatic scheduling flow that enable the simultaneous execution of multiple transceivers with hard-realtime requirements, described as Single-Rate Data Flow (SRDF) graphs. Each transceiver has its own execution rate and starts and stops independently from other transceivers, at times unknown at compile time, on a multiprocessor. We show how to combine scheduling and mapping decisions with the input application data flow graph to generate a worst-case temporal analysis graph. We propose algorithms to find a mapping per transceiver in the form of clusters of statically-ordered actors, and a budget for either a Time Division Multiplex (TDM) or Non-Preemptive Non-Blocking Round Robin (NPNBRR) scheduler per cluster per transceiver. The budget is computed such that if the platform can provide it, then the desired minimum throughput and maximum latency of the transceiver are guaranteed, while minimizing the required processing resources. We illustrate the use of these techniques to map a combination of WLAN and TDS-CDMA receivers onto a prototype Software-Defined Radio platform. The functionality of transceivers for standards with very dynamic behavior โ€“ such as WLAN โ€“ cannot be conveniently modeled as an SRDF graph, since SRDF is not capable of expressing variations of actor firing rules depending on the values of input data. Because of this, we propose a restricted, customized data flow model of computation, Mode-Controlled Data Flow (MCDF), that can capture the data-value dependent behavior of a transceiver, while allowing rigorous temporal analysis, and tight resource budgeting. We develop a number of analysis techniques to characterize the temporal behavior of MCDF graphs, in terms of maximum latencies and throughput. We also provide an extension to MCDF of our scheduling strategy for SRDF. The capabilities of MCDF are then illustrated with a WLAN 802.11a receiver model. Having computed budgets for each transceiver, we propose a way to use these budgets for run-time resource mapping and admissibility analysis. During run-time, at transceiver start time, the budget for each cluster of statically-ordered actors is allocated by a resource manager to platform resources. The resource manager enforces strict admission control, to restrict transceivers from interfering with each otherโ€™s worst-case temporal behaviors. We propose algorithms adapted from Vector Bin-Packing to enable the mapping at start time of transceivers to the multi-processor architecture, considering also the case where the processors are connected by a network on chip with resource reservation guarantees, in which case we also find routing and resource allocation on the network-on-chip. In our experiments, our resource allocation algorithms can keep 95% of the system resources occupied, while suffering from an allocation failure rate of less than 5%. An implementation of the framework was carried out on a prototype board. We present performance and memory utilization figures for this implementation, as they provide insights into the costs of adopting our approach. It turns out that the scheduling and synchronization overhead for an unoptimized implementation with no hardware support for synchronization of the framework is 16.3% of the cycle budget for a WLAN receiver on an EVP processor at 320 MHz. However, this overhead is less than 1% for mobile standards such as TDS-CDMA or LTE, which have lower rates, and thus larger cycle budgets. Considering that clock speeds will increase and that the synchronization primitives can be optimized to exploit the addressing modes available in the EVP, these results are very promising

    ๋งค๋‹ˆ์ฝ”์–ด ๊ฐ€์†๊ธฐ์˜ ๊ฒฐํ•จ์„ ๊ณ ๋ คํ•œ ํƒœ์Šคํฌ ๋งคํ•‘ ๋ฐ ์ž์› ๊ด€๋ฆฌ ๊ธฐ๋ฒ•

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2014. 8. ํ•˜์ˆœํšŒ.๊ธฐ์ˆ ์ด ๋ฐœ์ „ํ•จ์— ๋”ฐ๋ผ ํ•˜๋‚˜์˜ ์นฉ ์•ˆ์— ์ง‘์ ๋˜๋Š” ํ”„๋กœ์„ธ์„œ์˜ ๊ฐฏ์ˆ˜๊ฐ€ ์ ์  ์ฆ๊ฐ€ํ•˜๊ฒŒ ๋˜์—ˆ๋‹ค. ๋˜ํ•œ, ์‘์šฉ๋“ค์˜ ๋ณด๋‹ค ๋†’์€ ์—ฐ์‚ฐ ๋Šฅ๋ ฅ์— ๋Œ€ํ•œ ์š”๊ตฌ๋กœ ์ธํ•ด ๋งค๋‹ˆ์ฝ”์–ด ๊ฐ€์†๊ธฐ๋Š” ์‹œ์Šคํ…œ-์˜จ-์นฉ์—์„œ ์ค‘์š”ํ•œ ์—ฐ์‚ฐ ์žฅ์น˜๊ฐ€ ๋˜์—ˆ๋‹ค. ์‹œ์Šคํ…œ์˜ ์ƒํƒœ๊ฐ€ ์—ฌ๋Ÿฌ๊ฐ€์ง€ ์š”์ธ์— ์˜ํ•ด ๋™์ ์œผ๋กœ ๋ณ€ํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ์‹œ์Šคํ…œ ์ˆ˜ํ–‰์ค‘์— ๊ทธ๋Ÿฌํ•œ ๊ฐ€์†๊ธฐ๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ๋‹ค๋ฃจ๋Š” ๊ฒƒ์€ ๋งค์šฐ ์–ด๋ ค์šด ๋ฌธ์ œ์ด๋‹ค. ์‹œ์Šคํ…œ ์ˆ˜์ค€์—์„œ๋Š” ์‘์šฉ๋“ค์ด ์‚ฌ์šฉ์ž์˜ ์š”๊ตฌ์— ๋”ฐ๋ผ ์‹œ์ž‘ ๋˜๋Š” ์ข…๋ฃŒ๊ฐ€ ๋˜๊ณ , ์‘์šฉ ๋ ˆ๋ฒจ์—์„œ๋Š” ์‘์šฉ ์ž์ฒด์˜ ๋™์ž‘์ด ์ž…๋ ฅ ๋ฐ์ดํƒ€๋‚˜ ์ˆ˜ํ–‰๋ชจ๋“œ์— ๋”ฐ๋ผ ๋™์ ์œผ๋กœ ๋ณ€ํ•˜๊ฒŒ ๋œ๋‹ค. ์•„ํ‚คํ…์ฒ˜ ์ˆ˜์ค€์—์„œ๋Š” ํ”„๋กœ์„ธ์„œ์˜ ์˜๊ตฌ ๊ณ ์žฅ์œผ๋กœ ์ธํ•ด ํ•˜๋“œ์›จ์–ด ์ปดํฌ๋„ŒํŠธ์˜ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ์ƒํ™ฉ์ด ๋ณ€ํ•˜๊ฒŒ ๋œ๋‹ค. ๋ณธ ํ•™์œ„๋…ผ๋ฌธ์—์„œ๋Š” ๊ฐ€์†๊ธฐ๋ฅผ ๋‹ค๋ฃจ๋Š”๋ฐ ์žˆ์–ด์„œ์˜ ์œ„์™€ ๊ฐ™์€ ์–ด๋ ค์›€๋“ค์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์„ธ๊ฐ€์ง€ ๊ธฐ๋ฒ•์„ ์ œ์‹œํ•˜์˜€๋‹ค. ์ฒซ๋ฒˆ์งธ ๊ธฐ๋ฒ•์€ ํ”„๋กœ์„ธ์„œ์˜ ์˜๊ตฌ ๊ณ ์žฅ์ด ๋ฐœ์ƒํ•˜์˜€์„ ๋•Œ, ์ „์ฒด ์‘์šฉ๋“ค์„ ์‹œ๊ฐ„ ์ œ์•ฝ ํ•˜์— ์ฒ˜๋ฆฌ๋Ÿ‰์˜ ์ €ํ•˜๋ฅผ ์ตœ์†Œํ™”ํ•˜๋ฉฐ ์žฌ์Šค์ผ€์ฅด์„ ํ•˜๋Š” ๊ฒƒ์ด๋‹ค. ์ตœ์ ์˜ ์žฌ์Šค์ผ€์ฅด ๊ฒฐ๊ณผ๋“ค์€ ์ง„ํ™” ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ด์šฉํ•˜์—ฌ ์ปดํŒŒ์ผ ์‹œ์—, ๊ฐ๊ฐ์˜ ํ”„๋กœ์„ธ์„œ ๊ณ ์žฅ ์ƒํ™ฉ์— ๋”ฐ๋ผ ์ค€๋น„๊ฐ€ ๋œ๋‹ค. ์ˆ˜ํ–‰ ์‹œ๊ฐ„์— ํ”„๋กœ์„ธ์„œ ๊ณ ์žฅ์ด ๊ฐ์ง€๋˜๋ฉด, ์ •์ƒ์ ์œผ๋กœ ๋™์ž‘ํ•˜๋Š” ํ”„๋กœ์„ธ์„œ๋“ค์ด ์ €์žฅ๋œ ์Šค์ผ€์ฅด์„ ๊ฐ€์ง€๊ณ  ํƒœ์Šคํฌ ์ด์ฃผ๋ฅผ ์ˆ˜ํ–‰ํ•œ ํ›„ ํƒœ์Šคํฌ๋“ค์˜ ๋‚˜๋จธ์ง€ ์ˆ˜ํ–‰์„ ์ง€์†ํ•œ๋‹ค. ์ด ๊ธฐ๋ฒ•์—์„œ๋Š” ๋˜ํ•œ ๋” ์ข‹์€ ์„ฑ๋Šฅ์„ ์–ป๊ธฐ ์œ„ํ•ด, ์„ ์ , ๋น„์„ ์  ๋ฐ ์œตํ•ฉ ์ด์ฃผ ์ •์ฑ…์ด ์ œ์•ˆ๋˜์—ˆ๋‹ค. ์ œ์•ˆ๋œ ๊ธฐ๋ฒ•์˜ ๊ฐ€๋Šฅ์„ฑ์€ ์‹ค์ œ ๋””์ง€ํ„ธ ์‹ ํ˜ธ์ฒ˜๋ฆฌ ์‘์šฉ๋“ค๊ณผ ์ž„์˜๋กœ ์ƒ์„ฑ๋œ ์‘์šฉ๋“ค์— ๋Œ€ํ•ด ์‹œ๊ฐ„์ œ์•ฝ๊ณผ ๋‹ค์–‘ํ•œ ํ”„๋กœ์„ธ์„œ ๊ณ ์žฅ ์ƒํ™ฉ์— ๋Œ€ํ•ด ๊ฒ€์ฆ๋˜์—ˆ๋‹ค. ๋‘ ๋ฒˆ์งธ๋กœ ์ œ์•ˆ๋œ ๊ธฐ๋ฒ•์€ ๋ณตํ•ฉ ์ž์› ๊ด€๋ฆฌ ๊ธฐ๋ฒ•์œผ๋กœ, ์ฒซ๋ฒˆ์งธ ๊ธฐ๋ฒ•์—์„œ ๋‹ค๋ฃฌ ํ”„๋กœ์„ธ์„œ ์˜๊ตฌ๊ณ ์žฅ ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ, ๋™๊ธฐํ™” ๋ฐ์ดํƒ€-ํ๋ฆ„ ๊ทธ๋ž˜ํ”„๋กœ ๊ธฐ์ˆ ๋œ ์—ฌ๋Ÿฌ ์‘์šฉ๋“ค๊ณผ ์‘์šฉ๋“ค์˜ ๋™์  ์–‘์ƒ์„ ๋‹ค๋ฃจ๋Š” ๊ฒƒ๊นŒ์ง€๋กœ ํ™•์žฅ์ด ๋œ ๊ฒƒ์ด๋‹ค. ์ œ์•ˆ๋œ ๊ธฐ๋ฒ•์—์„œ๋Š”, ์šฐ์„  ์„ค๊ณ„ ์ˆ˜์ค€์—์„œ ํ• ๋‹น๋˜๋Š” ํ”„๋กœ์„ธ์„œ์˜ ๊ฐฏ์ˆ˜๋ฅผ ๋ณ€ํ™”์‹œ์ผœ๊ฐ€๋ฉด์„œ ๋™๊ธฐํ™”๋œ ๋ฐ์ดํƒ€-ํ๋ฆ„ ๊ทธ๋ž˜ํ”„๋“ค์˜ ์ฒ˜๋ฆฌ๋Ÿ‰์ด ์ตœ๋Œ€๋กœ ์–ป์–ด์ง€๋Š” ๋งคํ•‘ ๊ฒฐ๊ณผ๋“ค์„ ์–ป๋Š”๋‹ค. ๊ทธ๋ฆฌ๊ณ ๋‚˜์„œ ์ˆ˜ํ–‰ ์‹œ๊ฐ„์—๋Š” ๋ฏธ๋ฆฌ ๊ณ„์‚ฐ๋œ ๋งคํ•‘ ์ •๋ณด๋“ค์„ ๊ฐ€์ง€๊ณ  ์ˆ˜ํ–‰์ค‘์ธ ์‘์šฉ๋“ค์˜ ๋งคํ•‘์„, ๋™์ ์ธ ์‹œ์Šคํ…œ ๋ณ€ํ™”๊ฐ€ ๋ฐœ์ƒํ•  ๋•Œ๋งˆ๋‹ค ์ ์šฉํ•˜๊ฒŒ ๋œ๋‹ค. ์ œ์•ˆ๋œ ์ž์› ๊ด€๋ฆฌ ๊ธฐ๋ฒ•์€ Noxim์ด๋ผ๋Š” ๋„คํŠธ์›Œํฌ-์˜จ-์นฉ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ ์œ„์—์„œ ๊ตฌํ˜„์ด ๋˜์—ˆ์œผ๋ฉฐ, ์‹คํ—˜ ๊ฒฐ๊ณผ๋“ค์€ ์ œ์•ˆ๋œ ๊ธฐ๋ฒ•์ด ์ตœ์‹ ์˜ ๋‹ค๋ฅธ ๊ธฐ๋ฒ•๋“ค๊ณผ ๋น„๊ตํ•˜์—ฌ ๋” ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์˜€๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ๋Š”, ์‹œ์Šคํ…œ์˜ ์„ฑ๋Šฅ์„ ์‹œ์Šคํ…œ-์˜จ-์นฉ ์ œ์ž‘ ์ด์ „์— ๋ณด๋‹ค ์ •ํ™•ํ•˜๊ฒŒ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด์„œ, ๋‘ ๋ฒˆ์งธ ๊ธฐ๋ฒ•์„ ๊ตฌํ˜„ํ•œ ์†Œํ”„ํŠธ์›จ์–ด ํ”Œ๋žซํผ์ด ๋งค๋‹ˆ์ฝ”์–ด ์•„ํ‚คํ…์ฒ˜๋ฅผ ๋Œ€์ƒ์œผ๋กœ ์ œ์•ˆ๋˜์—ˆ๋‹ค. ๊ธฐ์กด์˜ ๋งค๋‹ˆ์ฝ”์–ด ์•„ํ‚คํ…์ฒ˜๋ฅผ ๋Œ€์ƒ์œผ๋กœ ํ•œ ์—ฐ๊ตฌ๋“ค์€ ์ฃผ๋กœ ์ƒ์œ„ ์ˆ˜์ค€์˜ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ์„ฑ๋Šฅ์„ ์ธก์ •ํ•˜์˜€๊ธฐ ๋•Œ๋ฌธ์—, ์‹ค์ œ ์„ฑ๋Šฅ๊ณผ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์„ฑ๋Šฅ์ด ์–ผ๋งˆ๋‚˜ ์ฐจ์ด๊ฐ€ ๋‚ ์ง€๋ฅผ ์ •ํ™•ํ•˜๊ฒŒ ์•Œ ์ˆ˜๊ฐ€ ์—†์—ˆ๋‹ค. ์ด๋Ÿฌํ•œ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ์†Œํ”„ํŠธ์›จ์–ด ํ”Œ๋žซํผ๊ณผ, ๊ฐ€์ƒ ํ”„๋กœํ† ํƒ€์ดํ•‘ ์‹œ์Šคํ…œ ๋ฐ ์ œ์˜จ ์—๋ฎฌ๋ ˆ์ด์…˜ ์‹œ์Šคํ…œ์—์„œ์˜ ํ”Œ๋žซํผ ๊ตฌํ˜„ ๋ฐฉ๋ฒ•์ด ์ œ์•ˆ์ด ๋˜์—ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์‹ค์ œ ์‹œ์Šคํ…œ ๊ตฌํ˜„์„ ํ†ตํ•˜์—ฌ ์ œ์•ˆ๋œ ๋ณตํ•ฉ ์ž์› ๊ด€๋ฆฌ ๊ธฐ๋ฒ•์—์„œ์˜ ๋‹ค์–‘ํ•œ ๋™์  ๋น„์šฉ๋“ค์ด ์ •ํ™•ํ•˜๊ฒŒ ์ถ”์‚ฐ์ด ๋  ์ˆ˜ ์žˆ์—ˆ๋‹ค. ์‹คํ—˜์—์„œ๋Š” ์ œ์•ˆ๋œ ์†Œํ”„ํŠธ์›จ์–ด ๊ธฐ๋ฒ•์ด ํƒœ์Šคํฌ๋“ค์˜ ๋™์  ๋งคํ•‘๊ณผ ์ฒดํฌ-ํฌ์ธํŒ…์„ ํ†ตํ•œ ํ”„๋กœ์„ธ์„œ ์˜๊ตฌ ๊ณ ์žฅ์„ ํšจ๊ณผ์ ์œผ๋กœ ๊ฐ๋‚ดํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์˜€๋‹ค.Owing to the incessant technology improvement, the number of processors integrated into a single chip increases consistently, integrating more and more applications. Also, demand for higher computing capability for applications makes a many-core accelerator become an important computing resource in a system-on-chip. Efficient handling of the accelerator at run-time, however, is very challenging because the system status is subject to change dynamically by various factors. At the system level, the set of applications running concurrently may change according to user request. At the application level, the application behavior may change dynamically depending on input data or operation mode. At the architecture level, hardware resource availability may vary since hardware components may experience transient or permanent failures. In this thesis, to resolve the difficulties in handling many-core accelerator, three techniques are proposed. The first technique is the re-scheduling of the entire application to minimize throughput degradation under a latency constraint when a permanent processor failure occurs. Sub-optimal re-scheduling results using a genetic algorithm for each scenario of processor failures are obtained at compile-time. If a failure is detected at run-time, the live processors obtain the saved schedule, perform task transfer, and execute the remaining tasks of the current iteration. In this technique, preemptive and non-preemptive migration policies and a hybrid policy are proposed to obtain better performance. The viability of the proposed technique with real-life DSP applications as well as randomly generated graphs under timing constraints and random fault scenarios are shown through experiments. The second technique is a hybrid resource management scheme, expanded version of the first technique that also handles multi-applications specified as SDF graph and their relevant dynamisms such as application/task arrivals/ends as well as processor permanent failures. In the proposed technique, at design-time, throughput-maximized mappings of each SDF graph by varying the number of allocated processors are determined. Then, at run-time, the pre-computed mapping information is exploited to adjust the mapping of active applications to the processors without user intervention on the system status change. The proposed resource management is evaluated through intensive experiments with an in-house simulator built on top of Noxim, a Network-on-Chip simulator. Experimental results show the enhanced adaptability to dynamic system status change compared to other state-of-the-art approaches. Finally, the software platform for a homogeneous many-core architecture that implements the second technique is proposed to evaluate the system performance more accurately before SoC fabrication. Existing approaches usually use a high-level simulation model to estimate the performance without knowing how much actual performance will be deviated from the estimation. To overcome the limitation, the software platform is proposed and implementation details on a virtual prototyping system and on an emulation system realized with an Intel Xeon-Phi coprocessor are presented. Actual implementation enables us to investigate the overheads involved in the hybrid resource management technique in detail, which was not possible in high-level simulation. Experimental results confirm that the proposed software platform adapts to the dynamic workload variation effectively by dynamic mapping of tasks and tolerate unexpected core failures by check-pointing.Abstract i Contents iv List of Figures viii List of Tables xii Chapter 1 Introduction 1 1.1 Motivation . . . . . . . . . . . . 1 1.2 Contribution . . . . . . . . . . . . 5 1.3 Thesis Organization . . . . . . . . . . . 7 Chapter 2 Preliminaries 8 2.1 Application Model . . . . . . . . . . 8 2.2 Architecture Model . . . . . . . . . . 13 2.3 Fault Model . . . . . . . . . . . . 15 2.4 Thesis Overview . . . . . . . . . . . 15 Chapter 3 Fault-aware Task Mapping 17 3.1 Introduction . . . . . . . . . . . . 17 3.2 Related Work . . . . . . . . . . . . 20 3.2.1 Static Approach . . . . . . . . . . 21 3.2.2 Dynamic Approach . . . . . . . . . . 22 3.3 Proposed Task Remapping/Rescheduling Technique . . 23 3.3.1 Remapping Technique . . . . . . . . 23 3.3.2 Rescheduling Technique . . . . . . . . 31 3.4 Experiments . . . . . . . . . . . . . 38 3.4.1 Remapping Results . . . . . . . . 38 3.4.2 Rescheduling Results . . . . . . . . 46 Chapter 4 Fault-aware Resource Management 53 4.1 Introduction . . . . . . . . . . . . 53 4.2 Related Work . . . . . . . . . . . . 54 4.2.1 Static Approach . . . . . . . . . . 55 4.2.2 Dynamic Approach . . . . . . . . . 55 4.2.3 Hybrid Approach . . . . . . . . . . 57 4.2.4 Summary . . . . . . . . . . . . 57 4.3 Background . . . . . . . . . . . . . 58 4.3.1 Energy Model . . . . . . . . . . . 59 4.3.2 Notation . . . . . . . . . . . . 60 4.4 Proposed Resource Management Technique . . . . 61 4.4.1 Motivational Example . . . . . . . . . 61 4.4.2 Overall Procedure . . . . . . . . . . 65 4.4.3 Design-time Analysis . . . . . . . . . 66 4.4.4 Run-time Mapping . . . . . . . . . . 67 4.5 Experiments . . . . . . . . . . . . . 74 4.5.1 Setup . . . . . . . . . . . . . . 74 4.5.2 Analysis of Run-time Overheads . . . . . . 75 4.5.3 Comparison with Other Approaches . . . . 79 Chapter 5 Software Platform for Resource Management 86 5.1 Introduction . . . . . . . . . . . . 86 5.2 Related Work . . . . . . . . . . . . 87 5.3 Overall Structure . . . . . . . . . . . . 88 5.4 Components of Software Platform . . . . . . 89 5.4.1 Application API Layer . . . . . . . . . 89 5.4.2 Communication Interface Module . . . . . 92 5.4.3 Host Interface Layer . . . . . . . . . 93 5.4.4 Memory Management Module . . . . . . 94 5.4.5 Design-time Analysis . . . . . . . . . 94 5.4.6 Slave Manager . . . . . . . . . . . 98 5.5 Software Platform Implementation . . . . . . 99 5.5.1 Scheduling Information . . . . . . . . 100 5.5.2 Function Migration and Execution . . . . . 101 5.5.3 Function Migration and Execution . . . . . 102 5.6 Virtual Prototyping System . . . . . . . . 105 5.7 Xeon Emulation System . . . . . . . . . 106 5.8 Experiments . . . . . . . . . . . . . 107 5.8.1 Setup . . . . . . . . . . . . . . 107 5.8.2 Experiments on the Virtual Prototyping System . . 108 5.8.3 Experiments on the Xeon Emulation System . . . 111 Chapter 6 Conclusion 116 Bibliography 119 Abstract in Korean 130Docto
    • โ€ฆ
    corecore