65 research outputs found

    ์ด์ข… ๋ฉ€ํ‹ฐ ์ฝ”์–ด ํ”„๋กœ์„ธ์„œ์—์„œ SDF/L ๊ทธ๋ž˜ํ”„ ์Šค์ผ€์ค„๋ง ๊ธฐ๋ฒ•

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(์„์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2021.8. Ha Soonhoi.Although dataflow models are known to thrive at exploiting task-level parallelism of an application, it is difficult to exploit the parallelism of data. Data-level parallelism can be represented well with loop structures, but these structures are not explicitly specified in most existing dataflow models. SDF/L model was introduced to overcome this shortcoming by specifying the loop structures explicitly in a hierarchical fashion. To the best of our knowledge however, scheduling of SDF/L graph onto heterogeneous processors has not been considered in any previous work. In this dissertation, we introduce a scheduling technique of an application represented by the SDF/L model onto heterogeneous processors. In the proposed method, we explore the mapping of tasks using an evolutionary meta-heuristic and schedule hierarchically in a bottom-up fashion, creating parallel loop schedules at lower levels first and then re-using them when constructing the schedule at a higher level. To verify the efficiency of the proposed scheduling methodology, we apply it to benchmark examples and randomly generated SDF/L graphs.๋ฐ์ดํ„ฐํ”Œ๋กœ์šฐ ๋ชจ๋ธ์€ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์˜ ํƒœ์Šคํฌ๋ฅผ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌํ•  ๋•Œ ์ข‹์€ ๋ชจ๋ธ๋กœ ์•Œ๋ ค์ ธ ์žˆ์ง€๋งŒ ๋ฐ์ดํ„ฐ๋ฅผ ๋ณ‘๋ ฌ๋กœ ์ฒ˜๋ฆฌํ•˜๋Š” ๋ฐ์— ํ™œ์šฉํ•˜๊ธฐ๋Š” ์–ด๋ ต๋‹ค. ๋ฐ์ดํ„ฐ ์ˆ˜์ค€ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ๋Š” ๋ฃจํ”„ ๊ตฌ์กฐ๋ฅผ ํ†ตํ•ด ํ‘œํ˜„๋  ์ˆ˜ ์žˆ์œผ๋‚˜ ๊ธฐ์กด ๋ฐ์ดํ„ฐํ”Œ๋กœ์šฐ ๋ชจ๋ธ์—์„œ ๋ช…์‹œ์ ์œผ๋กœ ๋ฃจํ”„ ๊ตฌ์กฐ๋Š” ๋ช…์„ธํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ์—†์—ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋‹จ์ ์„ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•ด ๊ณ„์ธต์  ๊ตฌ์กฐ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋ฃจํ”„ ๊ตฌ์กฐ๋ฅผ ๋ช…์‹œ์ ์œผ๋กœ ๋ช…์„ธํ•  ์ˆ˜ ์žˆ๋Š” SDF/L ๋ชจ๋ธ์ด ์ œ์•ˆ๋˜์—ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ด๊ธฐ์ข… ํ”„๋กœ์„ธ์„œ์— ๋Œ€ํ•œ SDF/L ๊ทธ๋ž˜ํ”„์˜ ์Šค์ผ€์ค„๋ง์€ ์ด์ „๊นŒ์ง€ ๊ณ ๋ ค๋˜์ง€ ์•Š์€ ๊ฒƒ์œผ๋กœ ํŒŒ์•…๋œ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” SDF/L ๋ชจ๋ธ๋กœ ํ‘œํ˜„๋˜๋Š” ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์„ ์ด๊ธฐ์ข… ํ”„๋กœ์„ธ์„œ์— ๋Œ€ํ•˜์—ฌ ์Šค์ผ€์ค„๋งํ•˜๋Š” ๊ธฐ๋ฒ•์„ ์†Œ๊ฐœํ•œ๋‹ค. ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์—์„œ๋Š” ๋จผ์ € ์ง„ํ™”์  ๋ฉ”ํƒ€ ํœด๋ฆฌ์Šคํ‹ฑ์„ ์‚ฌ์šฉํ•˜์—ฌ ํƒœ์Šคํฌ ๋งคํ•‘์„ ํƒ์ƒ‰ํ•œ๋‹ค. ์ดํ›„ ํ•˜์œ„ ์ˆ˜์ค€์—์„œ ๋ณ‘๋ ฌ ๋ฃจํ”„ ์Šค์ผ€์ค„์„ ๋งŒ๋“  ๋‹ค์Œ ์ƒ์œ„ ์ˆ˜์ค€์—์„œ ์Šค์ผ€์ค„ ๊ตฌ์„ฑํ•  ๋•Œ ์žฌ์‚ฌ์šฉํ•˜๋Š” ์ƒํ–ฅ์‹์˜ ๊ณ„์ธต์  ํƒœ์Šคํฌ ์Šค์ผ€์ค„๋ง์„ ์ˆ˜ํ–‰ํ•œ๋‹ค. ์ œ์•ˆํ•˜๋Š” ์Šค์ผ€์ค„๋ง ๊ธฐ๋ฒ•์˜ ํšจ์œจ์„ฑ์„ ๊ฒ€์ฆํ•˜๊ธฐ ์œ„ํ•ด ๋ฒค์น˜๋งˆํฌ ์˜ˆ์ œ์™€ ๋ฌด์ž‘์œ„๋กœ ์ƒ์„ฑ๋œ SDF/L ๊ทธ๋ž˜ํ”„์— ๊ธฐ๋ฒ•์„ ์ ์šฉํ•˜์˜€๋‹ค.Chapter 1 Introduction 1 Chapter 2 Related Work 6 2.1 SDF Scheduling with Data-level Parallelism 8 2.2 Hierarchical Scheduling 9 Chapter 3 Problem and Challenges 11 3.1 Notations and Problem Description 11 3.2 Challenges 12 Chapter 4 Proposed methodology 15 4.1 Mapping Exploration 15 4.2 Priority Assignment and List Scheduling Heuristic 17 4.3 Hierarchical Scheduling 18 4.4 Complexity 23 Chapter 5 Experiments 24 5.1 Benchmarks 25 5.2 Randomly Generated Graphs 30 Chapter 6 Conclusions 35 Bibliography 37 ์š” ์•ฝ 41์„

    Ordonnancement hybride des applications flots de donnรฉes sur des systรจmes embarquรฉs multi-coeurs

    Get PDF
    Les systรจmes embarquรฉs sont de plus en plus prรฉsents dans l'industrie comme dans la vie quotidienne. Une grande partie de ces systรจmes comprend des applications effectuant du traitement intensif des donnรฉes: elles utilisent de nombreux filtres numรฉriques, oรน les opรฉrations sur les donnรฉes sont rรฉpรฉtitives et ont un contrรดle limitรฉ. Les graphes "flots de donnรฉes", grรขce ร  leur dรฉterminisme fonctionnel inhรฉrent, sont trรจs rรฉpandus pour modรฉliser les systรจmes embarquรฉs connus sous le nom de "data-driven". L'ordonnancement statique et pรฉriodique des graphes flot de donnรฉes a รฉtรฉ largement รฉtudiรฉ, surtout pour deux modรจles particuliers: SDF et CSDF. Dans cette thรจse, on s'intรฉresse plus particuliรจrement ร  l'ordonnancement pรฉriodique des graphes CSDF. Le problรจme consiste ร  identifier des sรฉquences pรฉriodiques infinies d'actionnement des acteurs qui aboutissent ร  des exรฉcutions complรจtes ร  buffers bornรฉs. L'objectif est de pouvoir aborder ce problรจme sous des angles diffรฉrents : maximisation de dรฉbit, minimisation de la latence et minimisation de la capacitรฉ des buffers. La plupart des travaux existants proposent des solutions pour l'optimisation du dรฉbit et nรฉgligent le problรจme d'optimisation de la latence et propose mรชme dans certains cas des ordonnancements qui ont un impact nรฉgatif sur elle afin de conserver les propriรฉtรฉs de pรฉriodicitรฉ. On propose dans cette thรจse un ordonnancement hybride, nommรฉ Self-Timed Pรฉriodique (STP), qui peut conserver les propriรฉtรฉs d'un ordonnancement pรฉriodique et ร  la fois amรฉliorer considรฉrablement sa performance en terme de latence.One of the most important aspects of parallel computing is its close relation to the underlying hardware and programming models. In this PhD thesis, we take dataflow as the basic model of computation, as it fits the streaming application domain. Cyclo-Static Dataflow (CSDF) is particularly interesting because this variant is one of the most expressive dataflow models while still being analyzable at design time. Describing the system at higher levels of abstraction is not sufficient, e.g. dataflow have no direct means to optimize communication channels generally based on shared buffers. Therefore, we need to link the dataflow MoCs used for performance analysis of the programs, the real time task models used for timing analysis and the low-level model used to derive communication times. This thesis proposes a design flow that meets these challenges, while enabling features such as temporal isolation and taking into account other challenges such as predictability and ease of validation. To this end, we propose a new scheduling policy noted Self-Timed Periodic (STP), which is an execution model combining Self-Timed Scheduling (STS) with periodic scheduling. In STP scheduling, actors are no longer strictly periodic but self-timed assigned to periodic levels: the period of each actor under periodic scheduling is replaced by its worst-case execution time. Then, STP retains some of the performance and flexibility of self-timed schedule, in which execution times of actors need only be estimates, and at the same time makes use of the fact that with a periodic schedule we can derive a tight estimation of the required performance metrics

    ์‹ค์‹œ๊ฐ„ ์ž„๋ฒ ๋””๋“œ ์‹œ์Šคํ…œ์„ ์œ„ํ•œ ๋™์  ํ–‰์œ„ ๋ช…์„ธ ๋ฐ ์„ค๊ณ„ ๊ณต๊ฐ„ ํƒ์ƒ‰ ๊ธฐ๋ฒ•

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2016. 8. ํ•˜์ˆœํšŒ.ํ•˜๋‚˜์˜ ์นฉ์— ์ง‘์ ๋˜๋Š” ํ”„๋กœ์„ธ์„œ์˜ ๊ฐœ์ˆ˜๊ฐ€ ๋งŽ์•„์ง€๊ณ , ๋งŽ์€ ๊ธฐ๋Šฅ๋“ค์ด ํ†ตํ•ฉ๋จ์— ๋”ฐ๋ผ, ์—ฐ์‚ฐ์–‘์˜ ๋ณ€ํ™”, ์„œ๋น„์Šค์˜ ํ’ˆ์งˆ, ์˜ˆ์ƒ์น˜ ๋ชปํ•œ ์‹œ์Šคํ…œ ์š”์†Œ์˜ ๊ณ ์žฅ ๋“ฑ๊ณผ ๊ฐ™์€ ๋‹ค์–‘ํ•œ ์š”์†Œ๋“ค์— ์˜ํ•ด ์‹œ์Šคํ…œ์˜ ์ƒํƒœ๊ฐ€ ๋™์ ์œผ๋กœ ๋ณ€ํ™”ํ•˜๊ฒŒ ๋œ๋‹ค. ๋ฐ˜๋ฉด์—, ๋ณธ ๋…ผ๋ฌธ์—์„œ ์ฃผ๋œ ๊ด€์‹ฌ์‚ฌ๋ฅผ ๊ฐ€์ง€๋Š” ์Šค๋งˆํŠธ ํฐ ์žฅ์น˜์—์„œ ์ฃผ๋กœ ์‚ฌ์šฉ๋˜๋Š” ๋น„๋””์˜ค, ๊ทธ๋ž˜ํ”ฝ ์‘์šฉ๋“ค์˜ ๊ฒฝ์šฐ, ๊ณ„์‚ฐ ๋ณต์žก๋„๊ฐ€ ์ง€์†์ ์œผ๋กœ ์ฆ๊ฐ€ํ•˜๊ณ  ์žˆ๋‹ค. ๋”ฐ๋ผ์„œ, ์ด๋ ‡๊ฒŒ ๋™์ ์œผ๋กœ ๋ณ€ํ•˜๋Š” ํ–‰์œ„๋ฅผ ๊ฐ€์ง€๋ฉด์„œ๋„ ๋ณ‘๋ ฌ์„ฑ์„ ๋‚ด์ œํ•œ ๊ณ„์‚ฐ ์ง‘์•ฝ์ ์ธ ์—ฐ์‚ฐ์„ ํฌํ•จํ•˜๋Š” ๋ณต์žกํ•œ ์‹œ์Šคํ…œ์„ ๊ตฌํ˜„ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์ฒด๊ณ„์ ์ธ ์„ค๊ณ„ ๋ฐฉ๋ฒ•๋ก ์ด ๊ณ ๋„๋กœ ์š”๊ตฌ๋œ๋‹ค. ๋ชจ๋ธ ๊ธฐ๋ฐ˜ ๋ฐฉ๋ฒ•๋ก ์€ ๋ณ‘๋ ฌ ์ž„๋ฒ ๋””๋“œ ์†Œํ”„ํŠธ์›จ์–ด ๊ฐœ๋ฐœ์„ ์œ„ํ•œ ๋Œ€ํ‘œ์ ์ธ ๋ฐฉ๋ฒ• ์ค‘ ํ•˜๋‚˜์ด๋‹ค. ํŠนํžˆ, ์‹œ์Šคํ…œ ๋ช…์„ธ, ์ •์  ์„ฑ๋Šฅ ๋ถ„์„, ์„ค๊ณ„ ๊ณต๊ฐ„ ํƒ์ƒ‰, ๊ทธ๋ฆฌ๊ณ  ์ž๋™ ์ฝ”๋“œ ์ƒ์„ฑ๊นŒ์ง€์˜ ๋ชจ๋“  ์„ค๊ณ„ ๋‹จ๊ณ„๋ฅผ ์ง€์›ํ•˜๋Š” ๋ณ‘๋ ฌ ์ž„๋ฒ ๋””๋“œ ์†Œํ”„ํŠธ์›จ์–ด ์„ค๊ณ„ ํ™˜๊ฒฝ์œผ๋กœ์„œ, HOPES ํ”„๋ ˆ์ž„์›Œํฌ๊ฐ€ ์ œ์‹œ๋˜์—ˆ๋‹ค. ๋‹ค๋ฅธ ์„ค๊ณ„ ํ™˜๊ฒฝ๋“ค๊ณผ๋Š” ๋‹ค๋ฅด๊ฒŒ, ์ด๊ธฐ์ข… ๋ฉ€ํ‹ฐํ”„๋กœ์„ธ์„œ ์•„ํ‚คํ…์ฒ˜์—์„œ์˜ ์ผ๋ฐ˜์ ์ธ ์ˆ˜ํ–‰ ๋ชจ๋ธ๋กœ์„œ, ๊ณตํ†ต ์ค‘๊ฐ„ ์ฝ”๋“œ (CIC) ๋ผ๊ณ  ๋ถ€๋ฅด๋Š” ํ”„๋กœ๊ทธ๋ž˜๋ฐ ํ”Œ๋žซํผ์ด๋ผ๋Š” ์ƒˆ๋กœ์šด ๊ฐœ๋…์„ ์†Œ๊ฐœํ•˜์˜€๋‹ค. CIC ํƒœ์Šคํฌ ๋ชจ๋ธ์€ ํ”„๋กœ์„ธ์Šค ๋„คํŠธ์›Œํฌ ๋ชจ๋ธ์— ๊ธฐ๋ฐ˜ํ•˜๊ณ  ์žˆ์ง€๋งŒ, SDF ๋ชจ๋ธ๋กœ ๊ตฌ์ฒดํ™”๋  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์—, ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์ •์  ๋ถ„์„์ด ์šฉ์ดํ•˜๋‹ค๋Š” ์žฅ์ ์„ ๊ฐ€์ง„๋‹ค. ํ•˜์ง€๋งŒ, SDF ๋ชจ๋ธ์€ ์‘์šฉ์˜ ๋™์ ์ธ ํ–‰์œ„๋ฅผ ๋ช…์„ธํ•  ์ˆ˜ ์—†๋‹ค๋Š” ํ‘œํ˜„์ƒ์˜ ์ œ์•ฝ์„ ๊ฐ€์ง„๋‹ค. ์ด๋Ÿฌํ•œ ์ œ์•ฝ์„ ๊ทน๋ณตํ•˜๊ณ , ์‹œ์Šคํ…œ์˜ ๋™์  ํ–‰์œ„๋ฅผ ์‘์šฉ ์™ธ๋ถ€์™€ ๋‚ด๋ถ€๋กœ ๊ตฌ๋ถ„ํ•˜์—ฌ ๋ช…์„ธํ•˜๊ธฐ ์œ„ํ•ด, ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋ฐ์ดํ„ฐ ํ”Œ๋กœ์šฐ์™€ ์œ ํ•œ์ƒํƒœ๊ธฐ (FSM) ๋ชจ๋ธ์— ๊ธฐ๋ฐ˜ํ•˜์—ฌ ํ™•์žฅ๋œ CIC ํƒœ์Šคํฌ ๋ชจ๋ธ์„ ์ œ์•ˆํ•œ๋‹ค. ์ƒ์œ„ ์ˆ˜์ค€์—์„œ๋Š”, ๊ฐ ์‘์šฉ์€ ๋ฐ์ดํ„ฐ ํ”Œ๋กœ์šฐ ํƒœ์Šคํฌ๋กœ ๋ช…์„ธ ๋˜๋ฉฐ, ๋™์  ํ–‰์œ„๋Š” ์‘์šฉ๋“ค์˜ ์ˆ˜ํ–‰์„ ๊ฐ๋…ํ•˜๋Š” ์ œ์–ด ํƒœ์Šคํฌ๋กœ ๋ชจ๋ธ ๋œ๋‹ค. ๋ฐ์ดํ„ฐ ํ”Œ๋กœ์šฐ ํƒœ์Šคํฌ ๋‚ด๋ถ€๋Š”, ์œ ํ•œ์ƒํƒœ๊ธฐ ๊ธฐ๋ฐ˜์˜ SADF ๋ชจ๋ธ๊ณผ ์œ ์‚ฌํ•œ ํ˜•ํƒœ๋กœ ๋™์  ํ–‰์œ„๊ฐ€ ๋ช…์„ธ ๋œ๋‹คSDF ํƒœ์Šคํฌ๋Š” ๋ณต์ˆ˜๊ฐœ์˜ ํ–‰์œ„๋ฅผ ๊ฐ€์งˆ ์ˆ˜ ์žˆ์œผ๋ฉฐ, ๋ชจ๋“œ ์ „ํ™˜๊ธฐ (MTM)์ด๋ผ๊ณ  ๋ถˆ๋ฆฌ๋Š” ์œ ํ•œ ์ƒํƒœ๊ธฐ์˜ ํ…Œ์ด๋ธ” ํ˜•ํƒœ์˜ ๋ช…์„ธ๋ฅผ ํ†ตํ•ด SDF ๊ทธ๋ž˜ํ”„์˜ ๋ชจ๋“œ ์ „ํ™˜ ๊ทœ์น™์„ ๋ช…์„ธ ํ•œ๋‹ค. ์ด๋ฅผ MTM-SDF ๊ทธ๋ž˜ํ”„๋ผ๊ณ  ๋ถ€๋ฅด๋ฉฐ, ๋ณต์ˆ˜ ๋ชจ๋“œ ๋ฐ์ดํ„ฐ ํ”Œ๋กœ์šฐ ๋ชจ๋ธ ์ค‘ ํ•˜๋‚˜๋ผ ๊ตฌ๋ถ„๋œ๋‹ค. ์‘์šฉ์€ ์œ ํ•œํ•œ ํ–‰์œ„ (๋˜๋Š” ๋ชจ๋“œ)๋ฅผ ๊ฐ€์ง€๋ฉฐ, ๊ฐ ํ–‰์œ„ (๋ชจ๋“œ)๋Š” SDF ๊ทธ๋ž˜ํ”„๋กœ ํ‘œํ˜„๋˜๋Š” ๊ฒƒ์„ ๊ฐ€์ •ํ•œ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ๋‹ค์–‘ํ•œ ํ”„๋กœ์„ธ์„œ ๊ฐœ์ˆ˜์— ๋Œ€ํ•ด ๋‹จ์œ„์‹œ๊ฐ„๋‹น ์ฒ˜๋ฆฌ๋Ÿ‰์„ ์ตœ๋Œ€ํ™”ํ•˜๋Š” ์ปดํŒŒ์ผ-์‹œ๊ฐ„ ์Šค์ผ€์ค„๋ง์„ ์ˆ˜ํ–‰ํ•˜๊ณ , ์Šค์ผ€์ค„ ๊ฒฐ๊ณผ๋ฅผ ์ €์žฅํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•œ๋‹ค. ๋˜ํ•œ, ๋ณต์ˆ˜ ๋ชจ๋“œ ๋ฐ์ดํ„ฐ ํ”Œ๋กœ์šฐ ๊ทธ๋ž˜ํ”„๋ฅผ ์œ„ํ•œ ๋ฉ€ํ‹ฐํ”„๋กœ์„ธ์„œ ์Šค์ผ€์ค„๋ง ๊ธฐ๋ฒ•์„ ์ œ์‹œํ•œ๋‹ค. ๋ณต์ˆ˜ ๋ชจ๋“œ ๋ฐ์ดํ„ฐ ํ”Œ๋กœ์šฐ ๊ทธ๋ž˜ํ”„๋ฅผ ์œ„ํ•œ ๋ช‡๋ช‡ ์Šค์ผ€์ค„๋ง ๊ธฐ๋ฒ•๋“ค์ด ์กด์žฌํ•˜์ง€๋งŒ, ๋ชจ๋“œ ์‚ฌ์ด์— ํƒœ์Šคํฌ ์ด์ฃผ๋ฅผ ํ—ˆ์šฉํ•œ ๊ธฐ๋ฒ•๋“ค์€ ์กด์žฌํ•˜์ง€ ์•Š๋Š”๋‹ค. ํ•˜์ง€๋งŒ ํƒœ์Šคํฌ ์ด์ฃผ๋ฅผ ํ—ˆ์šฉํ•˜๊ฒŒ ๋˜๋ฉด ์ž์› ์š”๊ตฌ๋Ÿ‰์„ ์ค„์ผ ์ˆ˜ ์žˆ๋‹ค๋Š” ๋ฐœ๊ฒฌ์„ ํ†ตํ•ด, ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋ชจ๋“œ ์‚ฌ์ด์˜ ํƒœ์Šคํฌ ์ด์ฃผ๋ฅผ ํ—ˆ์šฉํ•˜๋Š” ๋ณต์ˆ˜ ๋ชจ๋“œ ๋ฐ์ดํ„ฐ ํ”Œ๋กœ์šฐ ๊ทธ๋ž˜ํ”„๋ฅผ ์œ„ํ•œ ๋ฉ€ํ‹ฐํ”„๋กœ์„ธ์„œ ์Šค์ผ€์ค„๋ง ๊ธฐ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์œ ์ „ ์•Œ๊ณ ๋ฆฌ์ฆ˜์— ๊ธฐ๋ฐ˜ํ•˜์—ฌ, ์ œ์•ˆํ•˜๋Š” ๊ธฐ๋ฒ•์€ ์ž์› ์š”๊ตฌ๋Ÿ‰์„ ์ตœ์†Œํ™”ํ•˜๊ธฐ ์œ„ํ•ด ๊ฐ ๋ชจ๋“œ์— ํ•ด๋‹นํ•˜๋Š” ๋ชจ๋“  SDF ๊ทธ๋ž˜ํ”„๋ฅผ ๋™์‹œ์— ์Šค์ผ€์ค„ ํ•œ๋‹ค. ์ฃผ์–ด์ง„ ๋‹จ์œ„ ์‹œ๊ฐ„๋‹น ์ฒ˜๋ฆฌ๋Ÿ‰ ์ œ์•ฝ์„ ๋งŒ์กฑ์‹œํ‚ค๊ธฐ ์œ„ํ•ด, ์ œ์•ˆํ•˜๋Š” ๊ธฐ๋ฒ•์€ ๊ฐ ๋ชจ๋“œ ๋ณ„๋กœ ์‹ค์ œ ์ฒ˜๋ฆฌ๋Ÿ‰ ์š”๊ตฌ๋Ÿ‰์„ ๊ณ„์‚ฐํ•˜๋ฉฐ, ์ฒ˜๋ฆฌ๋Ÿ‰์˜ ๋ถˆ๊ทœ์น™์„ฑ์„ ์™„ํ™”ํ•˜๊ธฐ ์œ„ํ•œ ์ถœ๋ ฅ ๋ฒ„ํผ์˜ ํฌ๊ธฐ๋ฅผ ๊ณ„์‚ฐํ•œ๋‹ค. ๋ช…์„ธ๋œ ํƒœ์Šคํฌ ๊ทธ๋ž˜ํ”„์™€ ์Šค์ผ€์ค„ ๊ฒฐ๊ณผ๋กœ๋ถ€ํ„ฐ, HOPES ํ”„๋ ˆ์ž„์›Œํฌ๋Š” ๋Œ€์ƒ ์•„ํ‚คํ…์ฒ˜๋ฅผ ์œ„ํ•œ ์ž๋™ ์ฝ”๋“œ ์ƒ์„ฑ์„ ์ง€์›ํ•œ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ์ž๋™ ์ฝ”๋“œ ์ƒ์„ฑ๊ธฐ๋Š” CIC ํƒœ์Šคํฌ ๋ชจ๋ธ์˜ ํ™•์žฅ๋œ ํŠน์ง•๋“ค์„ ์ง€์›ํ•˜๋„๋ก ํ™•์žฅ๋˜์—ˆ๋‹ค. ์‘์šฉ ์ˆ˜์ค€์—์„œ๋Š” MTM-SDF ๊ทธ๋ž˜ํ”„๋ฅผ ์ฃผ์–ด์ง„ ์ •์  ์Šค์ผ€์ค„๋ง ๊ฒฐ๊ณผ๋ฅผ ๋”ฐ๋ฅด๋Š” ๋ฉ€ํ‹ฐํ”„๋กœ์„ธ์„œ ์ฝ”๋“œ๋ฅผ ์ƒ์„ฑํ•˜๋„๋ก ํ™•์žฅ๋˜์—ˆ๋‹ค. ๋˜ํ•œ, ๋„ค ๊ฐ€์ง€ ์„œ๋กœ ๋‹ค๋ฅธ ์Šค์ผ€์ค„๋ง ์ •์ฑ… (fully-static, self-timed, static-assignment, fully-dynamic)์— ๋Œ€ํ•œ ๋ฉ€ํ‹ฐํ”„๋กœ์„ธ์„œ ์ฝ”๋“œ ์ƒ์„ฑ์„ ์ง€์›ํ•œ๋‹ค. ์‹œ์Šคํ…œ ์ˆ˜์ค€์—์„œ๋Š” ์ง€์›ํ•˜๋Š” ์‹œ์Šคํ…œ ์š”์ฒญ API์— ๋Œ€ํ•œ ์‹ค์ œ ๊ตฌํ˜„ ์ฝ”๋“œ๋ฅผ ์ƒ์„ฑํ•˜๋ฉฐ, ์ •์  ์Šค์ผ€์ค„ ๊ฒฐ๊ณผ์™€ ํƒœ์Šคํฌ๋“ค์˜ ์ œ์–ด ๊ฐ€๋Šฅํ•œ ์†์„ฑ๋“ค์— ๋Œ€ํ•œ ์ž๋ฃŒ ๊ตฌ์กฐ ์ฝ”๋“œ๋ฅผ ์ƒ์„ฑํ•œ๋‹ค. ๋ณต์ˆ˜ ๋ชจ๋“œ ๋ฉ€ํ‹ฐ๋ฏธ๋””์–ด ํ„ฐ๋ฏธ๋„ ์˜ˆ์ œ๋ฅผ ํ†ตํ•œ ๊ธฐ์ดˆ์ ์ธ ์‹คํ—˜๋“ค์„ ํ†ตํ•ด, ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•๋ก ์˜ ํƒ€๋‹น์„ฑ์„ ๋ณด์ธ๋‹ค.As the number of processors in a chip increases, and more functions are integrated, the system status will change dynamically due to various factors such as the workload variation, QoS requirement, and unexpected component failure. On the other hand, computation-complexity of user applications is also steadily increasingvideo and graphics applications are two major driving forces in smart mobile devices, which define the main application domain of interest in this dissertation. So, a systematic design methodology is highly required to implement such complex systems which contain dynamically changed behavior as well as computation-intensive workload that can be parallelized. A model-based approach is one of representative approaches for parallel embedded software development. Especially, HOPES framework is proposed which is a design environment for parallel embedded software supporting the overall design steps: system specification, performance estimation, design space exploration, and automatic code generation. Distinguished from other design environments, it introduces a novel concept of programming platform, called CIC (Common Intermediate Code) that can be understood as a generic execution model of heterogeneous multiprocessor architecture. The CIC task model is based on a process network model, but it can be refined to the SDF (Synchronous Data Flow) model, since it has a very desirable features for static analyzability as well as parallel processing. However, the SDF model has a typical weakness of expression capability, especially for the system-level specification and dynamically changed behavior of an application. To overcome this weakness, in this dissertation, we propose an extended CIC task model based on dataflow and FSM models to specify the dynamic behavior of the system distinguishing inter- and intra-application dynamism. At the top-level, each application is specified by a dataflow task and the dynamic behavior is modeled as a control task that supervises the execution of applications. Inside a dataflow task, it specifies the dynamic behavior using a similar way as FSM-based SADFan SDF task may have multiple behaviors and a tabular specification of an FSM, called MTM (Mode Transition Machine), describes the mode transition rules for the SDF graph. We call it to MTM-SDF model which is classified as multi-mode dataflow models in the dissertation. It assumes that an application has a finite number of behaviors (or modes) and each behavior (mode) is represented by an SDF graph. It enables us to perform compile-time scheduling of each graph to maximize the throughput varying the number of allocated processors, and store the scheduling information. Also, a multiprocessor scheduling technique is proposed for a multi-mode dataflow graph. While there exist several scheduling techniques for multi-mode dataflow models, no one allows task migration between modes. By observing that the resource requirement can be additionally reduced if task migration is allowed, we propose a multiprocessor scheduling technique of a multi-mode dataflow graph considering task migration between modes. Based on a genetic algorithm, the proposed technique schedules all SDF graphs in all modes simultaneously to minimize the resource requirement. To satisfy the throughput constraint, the proposed technique calculates the actual throughput requirement of each mode and the output buffer size for tolerating throughput jitter. For the specified task graph and scheduling results, the CIC translator generates parallelized code for the target architecture. Therefore the CIC translator is extended to support extended features of the CIC task model. In application-level, it is extended to support multiprocessor code generation for an MTM-SDF graph considering the given static scheduling results. Also, multiprocessor code generation of four different scheduling policies are supported for an MTM-SDF graph: fully-static, self-timed, static-assignment, and fully-dynamic. In system-level, the CIC translator is extended to support code generation for implementation of system request APIs and data structures for the static scheduling results and configurable task parameters. Through preliminary experiments with a multi-mode multimedia terminal example, the viability of the proposed methodology is verified.Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Contribution 7 1.3 Dissertation organization 9 Chapter 2 Background 10 2.1 Related work 10 2.1.1 Compiler-based approach 10 2.1.2 Language-based approach 11 2.1.3 Model-based approach 15 2.2 HOPES framework 19 2.3 Common Intermediate Code (CIC) Model 21 Chapter 3 Dynamic Behavior Specification 26 3.1 Problem definition 26 3.1.1 System-level dynamic behavior 26 3.1.2 Application-level dynamic behavior 27 3.2 Related work 28 3.3 Motivational example 31 3.4 Control task specification for system-level dynamism 33 3.4.1 Internal specification 33 3.4.2 Action scripts 38 3.5 MTM-SDF specification for application-level dynamism 44 3.5.1 MTM specification 44 3.5.2 Task graph specification 45 3.5.3 Execution semantic of an MTM-SDF graph 46 Chapter 4 Multiprocessor Scheduling of an Multi-mode Dataflow Graph 50 4.1 Related work 51 4.2 Motivational example 56 4.2.1 Throughput requirement calculation considering mode transition delay 56 4.2.2 Task migration between mode transition 58 4.3 Problem definition 61 4.4 Throughput requirement analysis 65 4.4.1 Mode transition delay 66 4.4.2 Arrival curves of the output buffer 70 4.4.3 Buffer size determination 71 4.4.4 Throughput requirement analysis 73 4.5 Proposed MMDF scheduling framework 75 4.5.1 Optimization problem 75 4.5.2 GA configuration 76 4.5.3 Fitness function 78 4.5.4 Local optimization technique 79 4.6 Experimental results 81 4.6.1 MMDF scheduling technique 83 4.6.2 Scalability of the Proposed Framework 88 Chapter 5 Multiprocessor Code Generation for the Extended CIC Model 89 5.1 CIC translator 89 5.2 Code generation for application-level dynamism 91 5.2.1 Function call-style code generation (fully-static, self-timed) 94 5.2.2 Thread-style code generation (static-assignment, fully-dynamic) 98 5.3 Code generation for system-level dynamism 101 5.4 Experimental results 105 Chapter 6 Conclusion and Future Work 107 Bibliography 109 ์ดˆ๋ก 125Docto

    Predictable mapping of streaming applications on multiprocessors

    Get PDF
    Het ontwerp van nieuwe consumentenelektronica wordt voortdurend complexer omdat er steeds meer functionaliteit in deze apparaten geยจintegreerd wordt. Een voorspelbaar ontwerptraject is nodig om deze complexiteit te beheersen. Het resultaat van dit ontwerptraject zou een systeem moeten zijn, waarin iedere applicatie zijn eigen taken binnen een strikte tijdslimiet kan uitvoeren, onafhankelijk van andere applicaties die hetzelfde systeem gebruiken. Dit vereist dat het tijdsgedrag van de hardware, de software, evenals hun interactie kan worden voorspeld. Er wordt vaak voorgesteld om een heterogeen multi-processor systeem (MPSoC) te gebruiken in moderne elektronische systemen. Een MP-SoC heeft voor veel applicaties een goede verhouding tussen rekenkracht en energiegebruik. Onchip netwerken (NoCs) worden voorgesteld als interconnect in deze systemen. Een NoC is schaalbaar en het biedt garanties wat betreft de hoeveelheid tijd die er nodig is om gegevens te communiceren tussen verschillende processoren en geheugens. Door het NoC te combineren met een voorspelbare strategie om de processoren en geheugens te delen, ontstaat een hardware platform met een voorspelbaar tijdsgedrag. Om een voorspelbaar systeem te verkrijgen moet ook het tijdsgedrag van een applicatie die wordt uitgevoerd op het platform voorspelbaar en analyseerbaar zijn. Het Synchronous Dataflow (SDF) model is erg geschikt voor het modelleren van applicaties die werken met gegevensstromen. Het model kan vele ontwerpbeslissingen modelleren en het is mogelijk om tijdens het ontwerptraject het tijdsgedrag van het systeem te analyseren. Dit proefschrift probeert om applicaties die gemodelleerd zijn met SDF grafen op een zodanige manier af te beelden op een NoC-gebaseerd MP-SoC, dat garanties op het tijdsgedrag van individuele applicaties gegeven kunnen worden. De doorstroomsnelheid van een applicatie is vaak een van de belangrijkste eisen bij het ontwerpen van systemen voor applicaties die werken met gegevensstromen. Deze doorstroomsnelheid wordt in hoge mate beยจinvloed door de beschikbare ruimte om resultaten (gegevens) op te slaan. De opslagruimte in een SDF graaf wordt gemodelleerd door de pijlen in de graaf. Het probleem is dat er een vaste grootte voor de opslagruimte aan de pijlen van een SDF graaf moet worden toegewezen. Deze grootte moet zodanig worden gekozen dat de vereiste doorstroomsnelheid van het systeem gehaald wordt, terwijl de benodigde opslagruimte geminimaliseerd wordt. De eerste belangrijkste bijdrage van dit proefschrift is een techniek om de minimale opslagruimte voor iedere mogelijke doorstroomsnelheid van een applicatie te vinden. Ondanks de theoretische complexiteit van dit probleem presteert de techniek in praktijk goed. Doordat de techniek alle mogelijke minimale combinaties van opslagruimte en doorstroomsnelheid vindt, is het mogelijk om met situaties om te gaan waarin nog niet alle ontwerpbeslissingen zijn genomen. De ontwerpbeslissingen om twee taken van een applicatie op ยดeยดen processor uit te voeren, zou bijvoorbeeld de doorstroomsnelheid kunnen beยจinvloeden. Hierdoor is er een onzekerheid in het begin van het ontwerptraject tussen de berekende doorstroomsnelheid en de doorstroomsnelheid die daadwerkelijk gerealiseerd kan worden als alle ontwerpbeslissingen zijn genomen. Tijdens het ontwerptraject moeten de taken waaruit een applicatie is opgebouwd toegewezen worden aan de verschillende processoren en geheugens in het systeem. Indien meerdere taken een processor delen, moet ook de volgorde bepaald worden waarin deze taken worden uitgevoerd. Een belangrijke bijdrage van dit proefschrift is een techniek die deze toewijzing uitvoert en die de volgorde bepaalt waarin taken worden uitgevoerd. Bestaande technieken kunnen alleen omgaan met taken die een ยดeยดen-op-ยดeยดen relatie met elkaar hebben, dat wil zeggen, taken die een gelijk aantal keren uitgevoerd worden. In een SDF graaf kunnen ook complexere relaties worden uitgedrukt. Deze relaties kunnen omgeschreven worden naar een ยดeยดen-op-ยดeยดen relatie, maar dat kan leiden tot een exponentiยจele groei van het aantal taken in de graaf. Hierdoor kan het onmogelijk worden om in een beperkte tijd alle taken aan de processoren toe te wijzen en om de volgorde te bepalen waarin deze taken worden uitgevoerd. De techniek die in dit proefschrift wordt gepresenteerd, kan omgaan met de complexe relaties tussen taken in een SDF graaf zonder de vertaling naar de ยดeยดen-op-ยดeยดen relaties te maken. Dit is mogelijk dankzij een nieuwe, efficiยจente techniek om de doorstroomsnelheid van SDF grafen te bepalen. Nadat de taken van een applicatie toegewezen zijn aan de processoren in het hardware platform moet de communicatie tussen deze taken op het NoC gepland worden. In deze planning moet voor ieder bericht dat tussen de taken wordt verstuurd, worden bepaald welke route er gebruikt wordt en wanneer de communicatie gestart wordt. Dit proefschrift introduceert drie strategieยจen voor het versturen van berichten met een strikte tijdslimiet. Alle drie de strategieยจen maken maximaal gebruik van de beschikbare vrijheid die moderne NoCs bieden. Experimenten tonen aan dat deze strategieยจen hierdoor efficiยจenter omgaan met de beschikbare hardware dan bestaande strategieยจen. Naast deze strategieยจen wordt er een techniek gepresenteerd om uit de ontwerpbeslissingen die gemaakt zijn tijdens het toewijzen van taken aan de processoren alle tijdslimieten af te leiden waarbinnen de berichten over het NoC gecommuniceerd moeten worden. Deze techniek koppelt de eerder genoemde techniek voor het toewijzen van taken aan processoren aan de drie strategieยจen om berichten te versturen over het NoC. Tenslotte worden de verschillende technieken die in dit proefschrift worden geยจintroduceerd gecombineerd tot een compleet ontwerptraject. Het startpunt is een SDF graaf die een applicatie modelleert en een NoC-gebaseerd MP-SoC platform met een voorspelbaar tijdsgedrag. Het doel van het ontwerptraject is het op een zodanige manier afbeelden van de applicatie op het platform dat de doorstroomsnelheid van de applicatie gegarandeerd kan worden. Daarnaast probeert het ontwerptraject de hoeveelheid hardware die gebruikt wordt te minimaliseren. Er wordt een experiment gepresenteerd waarin drie verschillende multimedia applicaties (H.263 encoder/decoder en een MP3 decoder) op een NoCgebaseerd MP-SoC worden afgebeeld. Dit experiment toont aan dat de technieken die in dit proefschrift worden voorgesteld, gebruikt kunnen worden voor het ontwerpen van systemen met een voorspelbaar tijdsgedrag. Hiermee is het voorgestelde ontwerptraject het eerste traject dat een met een SDF-gemodelleerde applicatie op een NoC-gebaseerd MP-SoC kan afbeelden, terwijl er garanties worden gegeven over de doorstroomsnelheid van de applicatie

    Predictable multi-processor system on chip design for multimedia applications

    Get PDF
    The design of multimedia systems has become increasingly complex due to consumer requirements. Consumers demand the functionalities offered by a huge desktop from these systems. Many of these systems are mobile. Therefore, power consumption and size of these devices should be small. These systems are increasingly becoming multi-processor based (MPSoCs) for the reasons of power and performance. Applications execute on these systems in different combinations also known as use-cases. Applications may have different performance requirements in each use-case. Currently, verification of all these use-cases takes bulk of the design effort. There is a need for analysis based techniques so that the platforms have a predictable behaviour and in turn provide guarantees on performance without expending precious man hours on verification. In this dissertation, techniques and architectures have been developed to design and manage these multi-processor based systems efficiently. The dissertation presents predictable architectural components for MPSoCs, a Predictable MPSoC design strategy, automatic platform synthesis tool, a run-time system and an MPSoC simulation technique. The introduction of predictability helps in rapid design of MPSoC platforms. Chapter 1 of the thesis studies the trends in modern multimedia applications and processor architectures. The chapter further highlights the problems in the design of MPSoC platforms and emphasizes the need of predictable design techniques. Predictable design techniques require predictable application and architectural components. The chapter further elaborates on Synchronous Data Flow Graphs which are used to model the applications throughout this thesis. The chapter presents the architecture template used in this thesis and enlists the contributions of the thesis. One of the contributions of this thesis is the design of a predictable component called communication assist. Chapter 2 of the thesis describes the architecture of this communication assist. The communication assist presented in this thesis not only decouples the communication from computation but also provides timing guarantees. Based on this communication assist, an MPSoC platform generation technique has been presented that can design MPSoC platforms capable of satisfying the throughput constraints of multiple applications in all use-cases. The technique is presented in Chapter 3. The design strategy uses three simple steps for platform design. In the first step it finds the required number of processors. The second step minimizes the communication interconnect between the processors and the third step minimizes the communication memory requirement of the platform. Further in Chapter 4, a tool has been developed to generate CA-based platforms for FPGAs. The output of this tool can be used to synthesize platforms on real hardware with the help of FPGA synthesis tools. The applications executing on these platforms often exhibit dynamism e.g. variation in task execution times and change in application throughput requirements. Further, new applications may often be added by consumers at run-time. Resource managers have been presented in literature to handle such dynamic situations. However, the scalability of these resource managers becomes an issue with the increase in number of processors and applications. Chapter 5 presents distributed run-time resource management techniques. Two versions of distributed resource managers have been presented which are scalable with the number of applications and processors. MPSoC platforms for real-time applications are designed assuming worst-case task execution times. It is known that the difference between average-case and worst-case behaviour can be quite large. Therefore, knowing the average case performance is also important for the system designer, and software simulation is often employed to estimate this. However, simulation in software is slow and does not scale with the number of applications and processing elements. In Chapter 6, a fast and scalable simulation methodology is introduced that can simulate the execution of multiple applications on an MPSoC platform. It is based on parallel execution of SDF (Synchronous Data Flow) models of applications. The simulation methodology uses Parallel Discrete Event Simulation (PDES) primitives and it is termed as "Smart Conservative PDES". The methodology generates a parallel simulator which is synthesizable on FPGAs. The framework can also be used to model dynamic arbitration policies which are difficult to analyse using models. The generated platform is also useful in carrying out Design Space Exploration as shown in the thesis. Finally, Chapter 7 summarizes the main findings and (practical) implications of the studies described in previous chapters of this dissertation. Using the contributions mentioned in the thesis, a designer can design and implement predictable multiprocessor based systems capable of satisfying throughput constraints of multiple applications in given set of use-cases, and employ resource management strategies to deal with dynamism in the applications. The chapter also describes the main limitations of this dissertation and makes suggestions for future research

    SCALABLE TECHNIQUES FOR SCHEDULING AND MAPPING DSP APPLICATIONS ONTO EMBEDDED MULTIPROCESSOR PLATFORMS

    Get PDF
    A variety of multiprocessor architectures has proliferated even for off-the-shelf computing platforms. To make use of these platforms, traditional implementation frameworks focus on implementing Digital Signal Processing (DSP) applications using special platform features to achieve high performance. However, due to the fast evolution of the underlying architectures, solution redevelopment is error prone and re-usability of existing solutions and libraries is limited. In this thesis, we facilitate an efficient migration of DSP systems to multiprocessor platforms while systematically leveraging previous investment in optimized library kernels using dataflow design frameworks. We make these library elements, which are typically tailored to specialized architectures, more amenable to extensive analysis and optimization using an efficient and systematic process. In this thesis we provide techniques to allow such migration through four basic contributions: 1. We propose and develop a framework to explore efficient utilization of Single Instruction Multiple Data (SIMD) cores and accelerators available in heterogeneous multiprocessor platforms consisting of General Purpose Processors (GPPs) and Graphics Processing Units (GPUs). We also propose new scheduling techniques by applying extensive block processing in conjunction with appropriate task mapping and task ordering methods that match efficiently with the underlying architecture. The approach gives the developer the ability to prototype a GPU-accelerated application and explore its design space efficiently and effectively. 2. We introduce the concept of Partial Expansion Graphs (PEGs) as an implementation model and associated class of scheduling strategies. PEGs are designed to help realize DSP systems in terms of forms and granularities of parallelism that are well matched to the given applications and targeted platforms. PEGs also facilitate derivation of both static and dynamic scheduling techniques, depending on the amount of variability in task execution times and other operating conditions. We show how to implement efficient PEG-based scheduling methods using real time operating systems, and to re-use pre-optimized libraries of DSP components within such implementations. 3. We develop new algorithms for scheduling and mapping systems implemented using PEGs. Collectively, these algorithms operate in three steps. First, the amount of data parallelism in the application graph is tuned systematically over many iterations to profit from the available cores in the target platform. Then a mapping algorithm that uses graph analysis is developed to distribute data and task parallel instances over different cores while trying to balance the load of all processing units to make use of pipeline parallelism. Finally, we use a novel technique for performance evaluation by implementing the scheduler and a customizable solution on the programmable platform. This allows accurate fitness functions to be measured and used to drive runtime adaptation of schedules. 4. In addition to providing scheduling techniques for the mentioned applications and platforms, we also show how to integrate the resulting solution in the underlying environment. This is achieved by leveraging existing libraries and applying the GPP-GPU scheduling framework to augment a popular existing Software Defined Radio (SDR) development environment -- GNU Radio -- with a dataflow foundation and a stand-alone GPU-accelerated library. We also show how to realize the PEG model on real time operating system libraries, such as the Texas Instruments DSP/BIOS. A code generator that accepts a manual system designer solution as well as automatically configured solutions is provided to complete the design flow starting from application model to running system
    • โ€ฆ
    corecore