51 research outputs found

    Predictable multi-processor system on chip design for multimedia applications

    Get PDF
    The design of multimedia systems has become increasingly complex due to consumer requirements. Consumers demand the functionalities offered by a huge desktop from these systems. Many of these systems are mobile. Therefore, power consumption and size of these devices should be small. These systems are increasingly becoming multi-processor based (MPSoCs) for the reasons of power and performance. Applications execute on these systems in different combinations also known as use-cases. Applications may have different performance requirements in each use-case. Currently, verification of all these use-cases takes bulk of the design effort. There is a need for analysis based techniques so that the platforms have a predictable behaviour and in turn provide guarantees on performance without expending precious man hours on verification. In this dissertation, techniques and architectures have been developed to design and manage these multi-processor based systems efficiently. The dissertation presents predictable architectural components for MPSoCs, a Predictable MPSoC design strategy, automatic platform synthesis tool, a run-time system and an MPSoC simulation technique. The introduction of predictability helps in rapid design of MPSoC platforms. Chapter 1 of the thesis studies the trends in modern multimedia applications and processor architectures. The chapter further highlights the problems in the design of MPSoC platforms and emphasizes the need of predictable design techniques. Predictable design techniques require predictable application and architectural components. The chapter further elaborates on Synchronous Data Flow Graphs which are used to model the applications throughout this thesis. The chapter presents the architecture template used in this thesis and enlists the contributions of the thesis. One of the contributions of this thesis is the design of a predictable component called communication assist. Chapter 2 of the thesis describes the architecture of this communication assist. The communication assist presented in this thesis not only decouples the communication from computation but also provides timing guarantees. Based on this communication assist, an MPSoC platform generation technique has been presented that can design MPSoC platforms capable of satisfying the throughput constraints of multiple applications in all use-cases. The technique is presented in Chapter 3. The design strategy uses three simple steps for platform design. In the first step it finds the required number of processors. The second step minimizes the communication interconnect between the processors and the third step minimizes the communication memory requirement of the platform. Further in Chapter 4, a tool has been developed to generate CA-based platforms for FPGAs. The output of this tool can be used to synthesize platforms on real hardware with the help of FPGA synthesis tools. The applications executing on these platforms often exhibit dynamism e.g. variation in task execution times and change in application throughput requirements. Further, new applications may often be added by consumers at run-time. Resource managers have been presented in literature to handle such dynamic situations. However, the scalability of these resource managers becomes an issue with the increase in number of processors and applications. Chapter 5 presents distributed run-time resource management techniques. Two versions of distributed resource managers have been presented which are scalable with the number of applications and processors. MPSoC platforms for real-time applications are designed assuming worst-case task execution times. It is known that the difference between average-case and worst-case behaviour can be quite large. Therefore, knowing the average case performance is also important for the system designer, and software simulation is often employed to estimate this. However, simulation in software is slow and does not scale with the number of applications and processing elements. In Chapter 6, a fast and scalable simulation methodology is introduced that can simulate the execution of multiple applications on an MPSoC platform. It is based on parallel execution of SDF (Synchronous Data Flow) models of applications. The simulation methodology uses Parallel Discrete Event Simulation (PDES) primitives and it is termed as "Smart Conservative PDES". The methodology generates a parallel simulator which is synthesizable on FPGAs. The framework can also be used to model dynamic arbitration policies which are difficult to analyse using models. The generated platform is also useful in carrying out Design Space Exploration as shown in the thesis. Finally, Chapter 7 summarizes the main findings and (practical) implications of the studies described in previous chapters of this dissertation. Using the contributions mentioned in the thesis, a designer can design and implement predictable multiprocessor based systems capable of satisfying throughput constraints of multiple applications in given set of use-cases, and employ resource management strategies to deal with dynamism in the applications. The chapter also describes the main limitations of this dissertation and makes suggestions for future research

    ์‹ค์‹œ๊ฐ„ ์ž„๋ฒ ๋””๋“œ ์‹œ์Šคํ…œ์„ ์œ„ํ•œ ๋™์  ํ–‰์œ„ ๋ช…์„ธ ๋ฐ ์„ค๊ณ„ ๊ณต๊ฐ„ ํƒ์ƒ‰ ๊ธฐ๋ฒ•

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2016. 8. ํ•˜์ˆœํšŒ.ํ•˜๋‚˜์˜ ์นฉ์— ์ง‘์ ๋˜๋Š” ํ”„๋กœ์„ธ์„œ์˜ ๊ฐœ์ˆ˜๊ฐ€ ๋งŽ์•„์ง€๊ณ , ๋งŽ์€ ๊ธฐ๋Šฅ๋“ค์ด ํ†ตํ•ฉ๋จ์— ๋”ฐ๋ผ, ์—ฐ์‚ฐ์–‘์˜ ๋ณ€ํ™”, ์„œ๋น„์Šค์˜ ํ’ˆ์งˆ, ์˜ˆ์ƒ์น˜ ๋ชปํ•œ ์‹œ์Šคํ…œ ์š”์†Œ์˜ ๊ณ ์žฅ ๋“ฑ๊ณผ ๊ฐ™์€ ๋‹ค์–‘ํ•œ ์š”์†Œ๋“ค์— ์˜ํ•ด ์‹œ์Šคํ…œ์˜ ์ƒํƒœ๊ฐ€ ๋™์ ์œผ๋กœ ๋ณ€ํ™”ํ•˜๊ฒŒ ๋œ๋‹ค. ๋ฐ˜๋ฉด์—, ๋ณธ ๋…ผ๋ฌธ์—์„œ ์ฃผ๋œ ๊ด€์‹ฌ์‚ฌ๋ฅผ ๊ฐ€์ง€๋Š” ์Šค๋งˆํŠธ ํฐ ์žฅ์น˜์—์„œ ์ฃผ๋กœ ์‚ฌ์šฉ๋˜๋Š” ๋น„๋””์˜ค, ๊ทธ๋ž˜ํ”ฝ ์‘์šฉ๋“ค์˜ ๊ฒฝ์šฐ, ๊ณ„์‚ฐ ๋ณต์žก๋„๊ฐ€ ์ง€์†์ ์œผ๋กœ ์ฆ๊ฐ€ํ•˜๊ณ  ์žˆ๋‹ค. ๋”ฐ๋ผ์„œ, ์ด๋ ‡๊ฒŒ ๋™์ ์œผ๋กœ ๋ณ€ํ•˜๋Š” ํ–‰์œ„๋ฅผ ๊ฐ€์ง€๋ฉด์„œ๋„ ๋ณ‘๋ ฌ์„ฑ์„ ๋‚ด์ œํ•œ ๊ณ„์‚ฐ ์ง‘์•ฝ์ ์ธ ์—ฐ์‚ฐ์„ ํฌํ•จํ•˜๋Š” ๋ณต์žกํ•œ ์‹œ์Šคํ…œ์„ ๊ตฌํ˜„ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์ฒด๊ณ„์ ์ธ ์„ค๊ณ„ ๋ฐฉ๋ฒ•๋ก ์ด ๊ณ ๋„๋กœ ์š”๊ตฌ๋œ๋‹ค. ๋ชจ๋ธ ๊ธฐ๋ฐ˜ ๋ฐฉ๋ฒ•๋ก ์€ ๋ณ‘๋ ฌ ์ž„๋ฒ ๋””๋“œ ์†Œํ”„ํŠธ์›จ์–ด ๊ฐœ๋ฐœ์„ ์œ„ํ•œ ๋Œ€ํ‘œ์ ์ธ ๋ฐฉ๋ฒ• ์ค‘ ํ•˜๋‚˜์ด๋‹ค. ํŠนํžˆ, ์‹œ์Šคํ…œ ๋ช…์„ธ, ์ •์  ์„ฑ๋Šฅ ๋ถ„์„, ์„ค๊ณ„ ๊ณต๊ฐ„ ํƒ์ƒ‰, ๊ทธ๋ฆฌ๊ณ  ์ž๋™ ์ฝ”๋“œ ์ƒ์„ฑ๊นŒ์ง€์˜ ๋ชจ๋“  ์„ค๊ณ„ ๋‹จ๊ณ„๋ฅผ ์ง€์›ํ•˜๋Š” ๋ณ‘๋ ฌ ์ž„๋ฒ ๋””๋“œ ์†Œํ”„ํŠธ์›จ์–ด ์„ค๊ณ„ ํ™˜๊ฒฝ์œผ๋กœ์„œ, HOPES ํ”„๋ ˆ์ž„์›Œํฌ๊ฐ€ ์ œ์‹œ๋˜์—ˆ๋‹ค. ๋‹ค๋ฅธ ์„ค๊ณ„ ํ™˜๊ฒฝ๋“ค๊ณผ๋Š” ๋‹ค๋ฅด๊ฒŒ, ์ด๊ธฐ์ข… ๋ฉ€ํ‹ฐํ”„๋กœ์„ธ์„œ ์•„ํ‚คํ…์ฒ˜์—์„œ์˜ ์ผ๋ฐ˜์ ์ธ ์ˆ˜ํ–‰ ๋ชจ๋ธ๋กœ์„œ, ๊ณตํ†ต ์ค‘๊ฐ„ ์ฝ”๋“œ (CIC) ๋ผ๊ณ  ๋ถ€๋ฅด๋Š” ํ”„๋กœ๊ทธ๋ž˜๋ฐ ํ”Œ๋žซํผ์ด๋ผ๋Š” ์ƒˆ๋กœ์šด ๊ฐœ๋…์„ ์†Œ๊ฐœํ•˜์˜€๋‹ค. CIC ํƒœ์Šคํฌ ๋ชจ๋ธ์€ ํ”„๋กœ์„ธ์Šค ๋„คํŠธ์›Œํฌ ๋ชจ๋ธ์— ๊ธฐ๋ฐ˜ํ•˜๊ณ  ์žˆ์ง€๋งŒ, SDF ๋ชจ๋ธ๋กœ ๊ตฌ์ฒดํ™”๋  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์—, ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์ •์  ๋ถ„์„์ด ์šฉ์ดํ•˜๋‹ค๋Š” ์žฅ์ ์„ ๊ฐ€์ง„๋‹ค. ํ•˜์ง€๋งŒ, SDF ๋ชจ๋ธ์€ ์‘์šฉ์˜ ๋™์ ์ธ ํ–‰์œ„๋ฅผ ๋ช…์„ธํ•  ์ˆ˜ ์—†๋‹ค๋Š” ํ‘œํ˜„์ƒ์˜ ์ œ์•ฝ์„ ๊ฐ€์ง„๋‹ค. ์ด๋Ÿฌํ•œ ์ œ์•ฝ์„ ๊ทน๋ณตํ•˜๊ณ , ์‹œ์Šคํ…œ์˜ ๋™์  ํ–‰์œ„๋ฅผ ์‘์šฉ ์™ธ๋ถ€์™€ ๋‚ด๋ถ€๋กœ ๊ตฌ๋ถ„ํ•˜์—ฌ ๋ช…์„ธํ•˜๊ธฐ ์œ„ํ•ด, ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋ฐ์ดํ„ฐ ํ”Œ๋กœ์šฐ์™€ ์œ ํ•œ์ƒํƒœ๊ธฐ (FSM) ๋ชจ๋ธ์— ๊ธฐ๋ฐ˜ํ•˜์—ฌ ํ™•์žฅ๋œ CIC ํƒœ์Šคํฌ ๋ชจ๋ธ์„ ์ œ์•ˆํ•œ๋‹ค. ์ƒ์œ„ ์ˆ˜์ค€์—์„œ๋Š”, ๊ฐ ์‘์šฉ์€ ๋ฐ์ดํ„ฐ ํ”Œ๋กœ์šฐ ํƒœ์Šคํฌ๋กœ ๋ช…์„ธ ๋˜๋ฉฐ, ๋™์  ํ–‰์œ„๋Š” ์‘์šฉ๋“ค์˜ ์ˆ˜ํ–‰์„ ๊ฐ๋…ํ•˜๋Š” ์ œ์–ด ํƒœ์Šคํฌ๋กœ ๋ชจ๋ธ ๋œ๋‹ค. ๋ฐ์ดํ„ฐ ํ”Œ๋กœ์šฐ ํƒœ์Šคํฌ ๋‚ด๋ถ€๋Š”, ์œ ํ•œ์ƒํƒœ๊ธฐ ๊ธฐ๋ฐ˜์˜ SADF ๋ชจ๋ธ๊ณผ ์œ ์‚ฌํ•œ ํ˜•ํƒœ๋กœ ๋™์  ํ–‰์œ„๊ฐ€ ๋ช…์„ธ ๋œ๋‹คSDF ํƒœ์Šคํฌ๋Š” ๋ณต์ˆ˜๊ฐœ์˜ ํ–‰์œ„๋ฅผ ๊ฐ€์งˆ ์ˆ˜ ์žˆ์œผ๋ฉฐ, ๋ชจ๋“œ ์ „ํ™˜๊ธฐ (MTM)์ด๋ผ๊ณ  ๋ถˆ๋ฆฌ๋Š” ์œ ํ•œ ์ƒํƒœ๊ธฐ์˜ ํ…Œ์ด๋ธ” ํ˜•ํƒœ์˜ ๋ช…์„ธ๋ฅผ ํ†ตํ•ด SDF ๊ทธ๋ž˜ํ”„์˜ ๋ชจ๋“œ ์ „ํ™˜ ๊ทœ์น™์„ ๋ช…์„ธ ํ•œ๋‹ค. ์ด๋ฅผ MTM-SDF ๊ทธ๋ž˜ํ”„๋ผ๊ณ  ๋ถ€๋ฅด๋ฉฐ, ๋ณต์ˆ˜ ๋ชจ๋“œ ๋ฐ์ดํ„ฐ ํ”Œ๋กœ์šฐ ๋ชจ๋ธ ์ค‘ ํ•˜๋‚˜๋ผ ๊ตฌ๋ถ„๋œ๋‹ค. ์‘์šฉ์€ ์œ ํ•œํ•œ ํ–‰์œ„ (๋˜๋Š” ๋ชจ๋“œ)๋ฅผ ๊ฐ€์ง€๋ฉฐ, ๊ฐ ํ–‰์œ„ (๋ชจ๋“œ)๋Š” SDF ๊ทธ๋ž˜ํ”„๋กœ ํ‘œํ˜„๋˜๋Š” ๊ฒƒ์„ ๊ฐ€์ •ํ•œ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ๋‹ค์–‘ํ•œ ํ”„๋กœ์„ธ์„œ ๊ฐœ์ˆ˜์— ๋Œ€ํ•ด ๋‹จ์œ„์‹œ๊ฐ„๋‹น ์ฒ˜๋ฆฌ๋Ÿ‰์„ ์ตœ๋Œ€ํ™”ํ•˜๋Š” ์ปดํŒŒ์ผ-์‹œ๊ฐ„ ์Šค์ผ€์ค„๋ง์„ ์ˆ˜ํ–‰ํ•˜๊ณ , ์Šค์ผ€์ค„ ๊ฒฐ๊ณผ๋ฅผ ์ €์žฅํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•œ๋‹ค. ๋˜ํ•œ, ๋ณต์ˆ˜ ๋ชจ๋“œ ๋ฐ์ดํ„ฐ ํ”Œ๋กœ์šฐ ๊ทธ๋ž˜ํ”„๋ฅผ ์œ„ํ•œ ๋ฉ€ํ‹ฐํ”„๋กœ์„ธ์„œ ์Šค์ผ€์ค„๋ง ๊ธฐ๋ฒ•์„ ์ œ์‹œํ•œ๋‹ค. ๋ณต์ˆ˜ ๋ชจ๋“œ ๋ฐ์ดํ„ฐ ํ”Œ๋กœ์šฐ ๊ทธ๋ž˜ํ”„๋ฅผ ์œ„ํ•œ ๋ช‡๋ช‡ ์Šค์ผ€์ค„๋ง ๊ธฐ๋ฒ•๋“ค์ด ์กด์žฌํ•˜์ง€๋งŒ, ๋ชจ๋“œ ์‚ฌ์ด์— ํƒœ์Šคํฌ ์ด์ฃผ๋ฅผ ํ—ˆ์šฉํ•œ ๊ธฐ๋ฒ•๋“ค์€ ์กด์žฌํ•˜์ง€ ์•Š๋Š”๋‹ค. ํ•˜์ง€๋งŒ ํƒœ์Šคํฌ ์ด์ฃผ๋ฅผ ํ—ˆ์šฉํ•˜๊ฒŒ ๋˜๋ฉด ์ž์› ์š”๊ตฌ๋Ÿ‰์„ ์ค„์ผ ์ˆ˜ ์žˆ๋‹ค๋Š” ๋ฐœ๊ฒฌ์„ ํ†ตํ•ด, ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋ชจ๋“œ ์‚ฌ์ด์˜ ํƒœ์Šคํฌ ์ด์ฃผ๋ฅผ ํ—ˆ์šฉํ•˜๋Š” ๋ณต์ˆ˜ ๋ชจ๋“œ ๋ฐ์ดํ„ฐ ํ”Œ๋กœ์šฐ ๊ทธ๋ž˜ํ”„๋ฅผ ์œ„ํ•œ ๋ฉ€ํ‹ฐํ”„๋กœ์„ธ์„œ ์Šค์ผ€์ค„๋ง ๊ธฐ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์œ ์ „ ์•Œ๊ณ ๋ฆฌ์ฆ˜์— ๊ธฐ๋ฐ˜ํ•˜์—ฌ, ์ œ์•ˆํ•˜๋Š” ๊ธฐ๋ฒ•์€ ์ž์› ์š”๊ตฌ๋Ÿ‰์„ ์ตœ์†Œํ™”ํ•˜๊ธฐ ์œ„ํ•ด ๊ฐ ๋ชจ๋“œ์— ํ•ด๋‹นํ•˜๋Š” ๋ชจ๋“  SDF ๊ทธ๋ž˜ํ”„๋ฅผ ๋™์‹œ์— ์Šค์ผ€์ค„ ํ•œ๋‹ค. ์ฃผ์–ด์ง„ ๋‹จ์œ„ ์‹œ๊ฐ„๋‹น ์ฒ˜๋ฆฌ๋Ÿ‰ ์ œ์•ฝ์„ ๋งŒ์กฑ์‹œํ‚ค๊ธฐ ์œ„ํ•ด, ์ œ์•ˆํ•˜๋Š” ๊ธฐ๋ฒ•์€ ๊ฐ ๋ชจ๋“œ ๋ณ„๋กœ ์‹ค์ œ ์ฒ˜๋ฆฌ๋Ÿ‰ ์š”๊ตฌ๋Ÿ‰์„ ๊ณ„์‚ฐํ•˜๋ฉฐ, ์ฒ˜๋ฆฌ๋Ÿ‰์˜ ๋ถˆ๊ทœ์น™์„ฑ์„ ์™„ํ™”ํ•˜๊ธฐ ์œ„ํ•œ ์ถœ๋ ฅ ๋ฒ„ํผ์˜ ํฌ๊ธฐ๋ฅผ ๊ณ„์‚ฐํ•œ๋‹ค. ๋ช…์„ธ๋œ ํƒœ์Šคํฌ ๊ทธ๋ž˜ํ”„์™€ ์Šค์ผ€์ค„ ๊ฒฐ๊ณผ๋กœ๋ถ€ํ„ฐ, HOPES ํ”„๋ ˆ์ž„์›Œํฌ๋Š” ๋Œ€์ƒ ์•„ํ‚คํ…์ฒ˜๋ฅผ ์œ„ํ•œ ์ž๋™ ์ฝ”๋“œ ์ƒ์„ฑ์„ ์ง€์›ํ•œ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ์ž๋™ ์ฝ”๋“œ ์ƒ์„ฑ๊ธฐ๋Š” CIC ํƒœ์Šคํฌ ๋ชจ๋ธ์˜ ํ™•์žฅ๋œ ํŠน์ง•๋“ค์„ ์ง€์›ํ•˜๋„๋ก ํ™•์žฅ๋˜์—ˆ๋‹ค. ์‘์šฉ ์ˆ˜์ค€์—์„œ๋Š” MTM-SDF ๊ทธ๋ž˜ํ”„๋ฅผ ์ฃผ์–ด์ง„ ์ •์  ์Šค์ผ€์ค„๋ง ๊ฒฐ๊ณผ๋ฅผ ๋”ฐ๋ฅด๋Š” ๋ฉ€ํ‹ฐํ”„๋กœ์„ธ์„œ ์ฝ”๋“œ๋ฅผ ์ƒ์„ฑํ•˜๋„๋ก ํ™•์žฅ๋˜์—ˆ๋‹ค. ๋˜ํ•œ, ๋„ค ๊ฐ€์ง€ ์„œ๋กœ ๋‹ค๋ฅธ ์Šค์ผ€์ค„๋ง ์ •์ฑ… (fully-static, self-timed, static-assignment, fully-dynamic)์— ๋Œ€ํ•œ ๋ฉ€ํ‹ฐํ”„๋กœ์„ธ์„œ ์ฝ”๋“œ ์ƒ์„ฑ์„ ์ง€์›ํ•œ๋‹ค. ์‹œ์Šคํ…œ ์ˆ˜์ค€์—์„œ๋Š” ์ง€์›ํ•˜๋Š” ์‹œ์Šคํ…œ ์š”์ฒญ API์— ๋Œ€ํ•œ ์‹ค์ œ ๊ตฌํ˜„ ์ฝ”๋“œ๋ฅผ ์ƒ์„ฑํ•˜๋ฉฐ, ์ •์  ์Šค์ผ€์ค„ ๊ฒฐ๊ณผ์™€ ํƒœ์Šคํฌ๋“ค์˜ ์ œ์–ด ๊ฐ€๋Šฅํ•œ ์†์„ฑ๋“ค์— ๋Œ€ํ•œ ์ž๋ฃŒ ๊ตฌ์กฐ ์ฝ”๋“œ๋ฅผ ์ƒ์„ฑํ•œ๋‹ค. ๋ณต์ˆ˜ ๋ชจ๋“œ ๋ฉ€ํ‹ฐ๋ฏธ๋””์–ด ํ„ฐ๋ฏธ๋„ ์˜ˆ์ œ๋ฅผ ํ†ตํ•œ ๊ธฐ์ดˆ์ ์ธ ์‹คํ—˜๋“ค์„ ํ†ตํ•ด, ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•๋ก ์˜ ํƒ€๋‹น์„ฑ์„ ๋ณด์ธ๋‹ค.As the number of processors in a chip increases, and more functions are integrated, the system status will change dynamically due to various factors such as the workload variation, QoS requirement, and unexpected component failure. On the other hand, computation-complexity of user applications is also steadily increasingvideo and graphics applications are two major driving forces in smart mobile devices, which define the main application domain of interest in this dissertation. So, a systematic design methodology is highly required to implement such complex systems which contain dynamically changed behavior as well as computation-intensive workload that can be parallelized. A model-based approach is one of representative approaches for parallel embedded software development. Especially, HOPES framework is proposed which is a design environment for parallel embedded software supporting the overall design steps: system specification, performance estimation, design space exploration, and automatic code generation. Distinguished from other design environments, it introduces a novel concept of programming platform, called CIC (Common Intermediate Code) that can be understood as a generic execution model of heterogeneous multiprocessor architecture. The CIC task model is based on a process network model, but it can be refined to the SDF (Synchronous Data Flow) model, since it has a very desirable features for static analyzability as well as parallel processing. However, the SDF model has a typical weakness of expression capability, especially for the system-level specification and dynamically changed behavior of an application. To overcome this weakness, in this dissertation, we propose an extended CIC task model based on dataflow and FSM models to specify the dynamic behavior of the system distinguishing inter- and intra-application dynamism. At the top-level, each application is specified by a dataflow task and the dynamic behavior is modeled as a control task that supervises the execution of applications. Inside a dataflow task, it specifies the dynamic behavior using a similar way as FSM-based SADFan SDF task may have multiple behaviors and a tabular specification of an FSM, called MTM (Mode Transition Machine), describes the mode transition rules for the SDF graph. We call it to MTM-SDF model which is classified as multi-mode dataflow models in the dissertation. It assumes that an application has a finite number of behaviors (or modes) and each behavior (mode) is represented by an SDF graph. It enables us to perform compile-time scheduling of each graph to maximize the throughput varying the number of allocated processors, and store the scheduling information. Also, a multiprocessor scheduling technique is proposed for a multi-mode dataflow graph. While there exist several scheduling techniques for multi-mode dataflow models, no one allows task migration between modes. By observing that the resource requirement can be additionally reduced if task migration is allowed, we propose a multiprocessor scheduling technique of a multi-mode dataflow graph considering task migration between modes. Based on a genetic algorithm, the proposed technique schedules all SDF graphs in all modes simultaneously to minimize the resource requirement. To satisfy the throughput constraint, the proposed technique calculates the actual throughput requirement of each mode and the output buffer size for tolerating throughput jitter. For the specified task graph and scheduling results, the CIC translator generates parallelized code for the target architecture. Therefore the CIC translator is extended to support extended features of the CIC task model. In application-level, it is extended to support multiprocessor code generation for an MTM-SDF graph considering the given static scheduling results. Also, multiprocessor code generation of four different scheduling policies are supported for an MTM-SDF graph: fully-static, self-timed, static-assignment, and fully-dynamic. In system-level, the CIC translator is extended to support code generation for implementation of system request APIs and data structures for the static scheduling results and configurable task parameters. Through preliminary experiments with a multi-mode multimedia terminal example, the viability of the proposed methodology is verified.Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Contribution 7 1.3 Dissertation organization 9 Chapter 2 Background 10 2.1 Related work 10 2.1.1 Compiler-based approach 10 2.1.2 Language-based approach 11 2.1.3 Model-based approach 15 2.2 HOPES framework 19 2.3 Common Intermediate Code (CIC) Model 21 Chapter 3 Dynamic Behavior Specification 26 3.1 Problem definition 26 3.1.1 System-level dynamic behavior 26 3.1.2 Application-level dynamic behavior 27 3.2 Related work 28 3.3 Motivational example 31 3.4 Control task specification for system-level dynamism 33 3.4.1 Internal specification 33 3.4.2 Action scripts 38 3.5 MTM-SDF specification for application-level dynamism 44 3.5.1 MTM specification 44 3.5.2 Task graph specification 45 3.5.3 Execution semantic of an MTM-SDF graph 46 Chapter 4 Multiprocessor Scheduling of an Multi-mode Dataflow Graph 50 4.1 Related work 51 4.2 Motivational example 56 4.2.1 Throughput requirement calculation considering mode transition delay 56 4.2.2 Task migration between mode transition 58 4.3 Problem definition 61 4.4 Throughput requirement analysis 65 4.4.1 Mode transition delay 66 4.4.2 Arrival curves of the output buffer 70 4.4.3 Buffer size determination 71 4.4.4 Throughput requirement analysis 73 4.5 Proposed MMDF scheduling framework 75 4.5.1 Optimization problem 75 4.5.2 GA configuration 76 4.5.3 Fitness function 78 4.5.4 Local optimization technique 79 4.6 Experimental results 81 4.6.1 MMDF scheduling technique 83 4.6.2 Scalability of the Proposed Framework 88 Chapter 5 Multiprocessor Code Generation for the Extended CIC Model 89 5.1 CIC translator 89 5.2 Code generation for application-level dynamism 91 5.2.1 Function call-style code generation (fully-static, self-timed) 94 5.2.2 Thread-style code generation (static-assignment, fully-dynamic) 98 5.3 Code generation for system-level dynamism 101 5.4 Experimental results 105 Chapter 6 Conclusion and Future Work 107 Bibliography 109 ์ดˆ๋ก 125Docto

    Analysis, design and management of multimedia multi- processor systems

    Get PDF
    Ph.DNUS-TU/E JOINT PH.D. PROGRAMM

    MULTI-SCALE SCHEDULING TECHNIQUES FOR SIGNAL PROCESSING SYSTEMS

    Get PDF
    A variety of hardware platforms for signal processing has emerged, from distributed systems such as Wireless Sensor Networks (WSNs) to parallel systems such as Multicore Programmable Digital Signal Processors (PDSPs), Multicore General Purpose Processors (GPPs), and Graphics Processing Units (GPUs) to heterogeneous combinations of parallel and distributed devices. When a signal processing application is implemented on one of those platforms, the performance critically depends on the scheduling techniques, which in general allocate computation and communication resources for competing processing tasks in the application to optimize performance metrics such as power consumption, throughput, latency, and accuracy. Signal processing systems implemented on such platforms typically involve multiple levels of processing and communication hierarchy, such as network-level, chip-level, and processor-level in a structural context, and application-level, subsystem-level, component-level, and operation- or instruction-level in a behavioral context. In this thesis, we target scheduling issues that carefully address and integrate scheduling considerations at different levels of these structural and behavioral hierarchies. The core contributions of the thesis include the following. Considering both the network-level and chip-level, we have proposed an adaptive scheduling algorithm for wireless sensor networks (WSNs) designed for event detection. Our algorithm exploits discrepancies among the detection accuracy of individual sensors, which are derived from a collaborative training process, to allow each sensor to operate in a more energy efficient manner while the network satisfies given constraints on overall detection accuracy. Considering the chip-level and processor-level, we incorporated both temperature and process variations to develop new scheduling methods for throughput maximization on multicore processors. In particular, we studied how to process a large number of threads with high speed and without violating a given maximum temperature constraint. We targeted our methods to multicore processors in which the cores may operate at different frequencies and different levels of leakage. We develop speed selection and thread assignment schedulers based on the notion of a core's steady state temperature. Considering the application-level, component-level and operation-level, we developed a new dataflow based design flow within the targeted dataflow interchange format (TDIF) design tool. Our new multiprocessor system-on-chip (MPSoC)-oriented design flow, called TDIF-PPG, is geared towards analysis and mapping of embedded DSP applications on MPSoCs. An important feature of TDIF-PPG is its capability to integrate graph level parallelism and actor level parallelism into the application mapping process. Here, graph level parallelism is exposed by the dataflow graph application representation in TDIF, and actor level parallelism is modeled by a novel model for multiprocessor dataflow graph implementation that we call the Parallel Processing Group (PPG) model. Building on the contribution above, we formulated a new type of parallel task scheduling problem called Parallel Actor Scheduling (PAS) for chip-level MPSoC mapping of DSP systems that are represented as synchronous dataflow (SDF) graphs. In contrast to traditional SDF-based scheduling techniques, which focus on exploiting graph level (inter-actor) parallelism, the PAS problem targets the integrated exploitation of both intra- and inter-actor parallelism for platforms in which individual actors can be parallelized across multiple processing units. We address a special case of the PAS problem in which all of the actors in the DSP application or subsystem being optimized can be parallelized. For this special case, we develop and experimentally evaluate a two-phase scheduling framework with three work flows --- particle swarm optimization with a mixed integer programming formulation, particle swarm optimization with a simulated annealing engine, and particle swarm optimization with a fast heuristic based on list scheduling. Then, we extend our scheduling framework to support general PAS problem which considers the actors cannot be parallelized

    Energy Aware Runtime Systems for Elastic Stream Processing Platforms

    Get PDF
    Following an invariant growth in the required computational performance of processors, the multicore revolution started around 20 years ago. This revolution was mainly an answer to power dissipation constraints restricting the increase of clock frequency in single-core processors. The multicore revolution not only brought in the challenge of parallel programming, i.e. being able to develop software exploiting the entire capabilities of manycore architectures, but also the challenge of programming heterogeneous platforms. The question of โ€œon which processing element to map a specific computational unit?โ€, is well known in the embedded community. With the introduction of general-purpose graphics processing units (GPGPUs), digital signal processors (DSPs) along with many-core processors on different system-on-chip platforms, heterogeneous parallel platforms are nowadays widespread over several domains, from consumer devices to media processing platforms for telecom operators. Finding mapping together with a suitable hardware architecture is a process called design-space exploration. This process is very challenging in heterogeneous many-core architectures, which promise to offer benefits in terms of energy efficiency. The main problem is the exponential explosion of space exploration. With the recent trend of increasing levels of heterogeneity in the chip, selecting the parameters to take into account when mapping software to hardware is still an open research topic in the embedded area. For example, the current Linux scheduler has poor performance when mapping tasks to computing elements available in hardware. The only metric considered is CPU workload, which as was shown in recent work does not match true performance demands from the applications. Doing so may produce an incorrect allocation of resources, resulting in a waste of energy. The origin of this research work comes from the observation that these approaches do not provide full support for the dynamic behavior of stream processing applications, especially if these behaviors are established only at runtime. This research will contribute to the general goal of developing energy-efficient solutions to design streaming applications on heterogeneous and parallel hardware platforms. Streaming applications are nowadays widely spread in the software domain. Their distinctive characiteristic is the retrieving of multiple streams of data and the need to process them in real time. The proposed work will develop new approaches to address the challenging problem of efficient runtime coordination of dynamic applications, focusing on energy and performance management.Efter en ofรถrรคnderlig tillvรคxt i prestandakrav hos processorer, bรถrjade den flerkรคrniga processor-revolutionen fรถr ungefรคr 20 รฅr sedan. Denna revolution skedde till stรถrsta del som en lรถsning till begrรคnsningar i energieffekten allt eftersom klockfrekvensen kontinuerligt hรถjdes i en-kรคrniga processorer. Den flerkรคrniga processor-revolutionen medfรถrde inte enbart utmaningen gรคllande parallellprogrammering, m.a.o. fรถrmรฅgan att utveckla mjukvara som anvรคnder sig av alla delelement i de flerkรคrniga processorerna, men ocksรฅ utmaningen med programmering av heterogena plattformar. Frรฅgestรคllningen โ€pรฅ vilken processorelement skall en viss berรคkning utfรถras?โ€ รคr vรคl kรคnt inom ramen fรถr inbyggda datorsystem. Efter introduktionen av grafikprocessorer fรถr allmรคnna berรคkningar (GPGPU), signalprocesserings-processorer (DSP) samt flerkรคrniga processorer pรฅ olika system-on-chip plattformar, รคr heterogena parallella plattformar idag omfattande inom mรฅnga domรคner, frรฅn konsumtionsartiklar till mediaprocesseringsplattformar fรถr telekommunikationsoperatรถrer. Processen att placera berรคkningarna pรฅ en passande hรฅrdvaruplattform kallas fรถr utforskning av en designrymd (design-space exploration). Denna process รคr mycket utmanande fรถr heterogena flerkรคrniga arkitekturer, och kan medfรถra fรถrdelar nรคr det gรคller energieffektivitet. Det stรถrsta problemet รคr att de olika valmรถjligheterna i designrymden kan vรคxa exponentiellt. Enligt den nuvarande trenden som fรถrespรฅr รถkad heterogeniska aspekter i processorerna รคr utmaningen att hitta den mest passande placeringen av berรคkningarna pรฅ hรฅrdvaran รคnnu en forskningsfrรฅga inom ramen fรถr inbyggda datorsystem. Till exempel, den nuvarande schemalรคggaren i Linux operativsystemet รคr inkapabel att hitta en effektiv placering av berรคkningarna pรฅ den underliggande hรฅrdvaran. Det enda mรคtsรคttet som anvรคnds รคr processorns belastning vilket, som visats i tidigare forskning, inte motsvarar den verkliga prestandan i applikationen. Anvรคndning av detta mรคtsรคtt vid resursallokering resulterar i slรถseri med energi. Denna forskning hรคrstammar frรฅn observationerna att dessa tillvรคgagรฅngssรคtt inte stรถder det dynamiska beteendet hos strรถm-processeringsapplikationer (stream processing applications), speciellt om beteendena bara etableras vid kรถrtid. Denna forskning kontribuerar till det allmรคnna mรฅlet att utveckla energieffektiva lรถsningar fรถr strรถm-applikationer (streaming applications) pรฅ heterogena flerkรคrniga hรฅrdvaruplattformar. Strรถm-applikationer รคr numera mycket vanliga i mjukvarudomรคn. Deras distinkta karaktรคr รคr inlรคsning av flertalet datastrรถmmar, och behov av att processera dem i realtid. Arbetet i denna forskning understรถder utvecklingen av nya sรคtt fรถr att lรถsa det utmanade problemet att effektivt koordinera dynamiska applikationer i realtid och fokus pรฅ energi- och prestandahantering

    ๋งค๋‹ˆ์ฝ”์–ด ๊ฐ€์†๊ธฐ์˜ ๊ฒฐํ•จ์„ ๊ณ ๋ คํ•œ ํƒœ์Šคํฌ ๋งคํ•‘ ๋ฐ ์ž์› ๊ด€๋ฆฌ ๊ธฐ๋ฒ•

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2014. 8. ํ•˜์ˆœํšŒ.๊ธฐ์ˆ ์ด ๋ฐœ์ „ํ•จ์— ๋”ฐ๋ผ ํ•˜๋‚˜์˜ ์นฉ ์•ˆ์— ์ง‘์ ๋˜๋Š” ํ”„๋กœ์„ธ์„œ์˜ ๊ฐฏ์ˆ˜๊ฐ€ ์ ์  ์ฆ๊ฐ€ํ•˜๊ฒŒ ๋˜์—ˆ๋‹ค. ๋˜ํ•œ, ์‘์šฉ๋“ค์˜ ๋ณด๋‹ค ๋†’์€ ์—ฐ์‚ฐ ๋Šฅ๋ ฅ์— ๋Œ€ํ•œ ์š”๊ตฌ๋กœ ์ธํ•ด ๋งค๋‹ˆ์ฝ”์–ด ๊ฐ€์†๊ธฐ๋Š” ์‹œ์Šคํ…œ-์˜จ-์นฉ์—์„œ ์ค‘์š”ํ•œ ์—ฐ์‚ฐ ์žฅ์น˜๊ฐ€ ๋˜์—ˆ๋‹ค. ์‹œ์Šคํ…œ์˜ ์ƒํƒœ๊ฐ€ ์—ฌ๋Ÿฌ๊ฐ€์ง€ ์š”์ธ์— ์˜ํ•ด ๋™์ ์œผ๋กœ ๋ณ€ํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ์‹œ์Šคํ…œ ์ˆ˜ํ–‰์ค‘์— ๊ทธ๋Ÿฌํ•œ ๊ฐ€์†๊ธฐ๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ๋‹ค๋ฃจ๋Š” ๊ฒƒ์€ ๋งค์šฐ ์–ด๋ ค์šด ๋ฌธ์ œ์ด๋‹ค. ์‹œ์Šคํ…œ ์ˆ˜์ค€์—์„œ๋Š” ์‘์šฉ๋“ค์ด ์‚ฌ์šฉ์ž์˜ ์š”๊ตฌ์— ๋”ฐ๋ผ ์‹œ์ž‘ ๋˜๋Š” ์ข…๋ฃŒ๊ฐ€ ๋˜๊ณ , ์‘์šฉ ๋ ˆ๋ฒจ์—์„œ๋Š” ์‘์šฉ ์ž์ฒด์˜ ๋™์ž‘์ด ์ž…๋ ฅ ๋ฐ์ดํƒ€๋‚˜ ์ˆ˜ํ–‰๋ชจ๋“œ์— ๋”ฐ๋ผ ๋™์ ์œผ๋กœ ๋ณ€ํ•˜๊ฒŒ ๋œ๋‹ค. ์•„ํ‚คํ…์ฒ˜ ์ˆ˜์ค€์—์„œ๋Š” ํ”„๋กœ์„ธ์„œ์˜ ์˜๊ตฌ ๊ณ ์žฅ์œผ๋กœ ์ธํ•ด ํ•˜๋“œ์›จ์–ด ์ปดํฌ๋„ŒํŠธ์˜ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ์ƒํ™ฉ์ด ๋ณ€ํ•˜๊ฒŒ ๋œ๋‹ค. ๋ณธ ํ•™์œ„๋…ผ๋ฌธ์—์„œ๋Š” ๊ฐ€์†๊ธฐ๋ฅผ ๋‹ค๋ฃจ๋Š”๋ฐ ์žˆ์–ด์„œ์˜ ์œ„์™€ ๊ฐ™์€ ์–ด๋ ค์›€๋“ค์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์„ธ๊ฐ€์ง€ ๊ธฐ๋ฒ•์„ ์ œ์‹œํ•˜์˜€๋‹ค. ์ฒซ๋ฒˆ์งธ ๊ธฐ๋ฒ•์€ ํ”„๋กœ์„ธ์„œ์˜ ์˜๊ตฌ ๊ณ ์žฅ์ด ๋ฐœ์ƒํ•˜์˜€์„ ๋•Œ, ์ „์ฒด ์‘์šฉ๋“ค์„ ์‹œ๊ฐ„ ์ œ์•ฝ ํ•˜์— ์ฒ˜๋ฆฌ๋Ÿ‰์˜ ์ €ํ•˜๋ฅผ ์ตœ์†Œํ™”ํ•˜๋ฉฐ ์žฌ์Šค์ผ€์ฅด์„ ํ•˜๋Š” ๊ฒƒ์ด๋‹ค. ์ตœ์ ์˜ ์žฌ์Šค์ผ€์ฅด ๊ฒฐ๊ณผ๋“ค์€ ์ง„ํ™” ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ด์šฉํ•˜์—ฌ ์ปดํŒŒ์ผ ์‹œ์—, ๊ฐ๊ฐ์˜ ํ”„๋กœ์„ธ์„œ ๊ณ ์žฅ ์ƒํ™ฉ์— ๋”ฐ๋ผ ์ค€๋น„๊ฐ€ ๋œ๋‹ค. ์ˆ˜ํ–‰ ์‹œ๊ฐ„์— ํ”„๋กœ์„ธ์„œ ๊ณ ์žฅ์ด ๊ฐ์ง€๋˜๋ฉด, ์ •์ƒ์ ์œผ๋กœ ๋™์ž‘ํ•˜๋Š” ํ”„๋กœ์„ธ์„œ๋“ค์ด ์ €์žฅ๋œ ์Šค์ผ€์ฅด์„ ๊ฐ€์ง€๊ณ  ํƒœ์Šคํฌ ์ด์ฃผ๋ฅผ ์ˆ˜ํ–‰ํ•œ ํ›„ ํƒœ์Šคํฌ๋“ค์˜ ๋‚˜๋จธ์ง€ ์ˆ˜ํ–‰์„ ์ง€์†ํ•œ๋‹ค. ์ด ๊ธฐ๋ฒ•์—์„œ๋Š” ๋˜ํ•œ ๋” ์ข‹์€ ์„ฑ๋Šฅ์„ ์–ป๊ธฐ ์œ„ํ•ด, ์„ ์ , ๋น„์„ ์  ๋ฐ ์œตํ•ฉ ์ด์ฃผ ์ •์ฑ…์ด ์ œ์•ˆ๋˜์—ˆ๋‹ค. ์ œ์•ˆ๋œ ๊ธฐ๋ฒ•์˜ ๊ฐ€๋Šฅ์„ฑ์€ ์‹ค์ œ ๋””์ง€ํ„ธ ์‹ ํ˜ธ์ฒ˜๋ฆฌ ์‘์šฉ๋“ค๊ณผ ์ž„์˜๋กœ ์ƒ์„ฑ๋œ ์‘์šฉ๋“ค์— ๋Œ€ํ•ด ์‹œ๊ฐ„์ œ์•ฝ๊ณผ ๋‹ค์–‘ํ•œ ํ”„๋กœ์„ธ์„œ ๊ณ ์žฅ ์ƒํ™ฉ์— ๋Œ€ํ•ด ๊ฒ€์ฆ๋˜์—ˆ๋‹ค. ๋‘ ๋ฒˆ์งธ๋กœ ์ œ์•ˆ๋œ ๊ธฐ๋ฒ•์€ ๋ณตํ•ฉ ์ž์› ๊ด€๋ฆฌ ๊ธฐ๋ฒ•์œผ๋กœ, ์ฒซ๋ฒˆ์งธ ๊ธฐ๋ฒ•์—์„œ ๋‹ค๋ฃฌ ํ”„๋กœ์„ธ์„œ ์˜๊ตฌ๊ณ ์žฅ ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ, ๋™๊ธฐํ™” ๋ฐ์ดํƒ€-ํ๋ฆ„ ๊ทธ๋ž˜ํ”„๋กœ ๊ธฐ์ˆ ๋œ ์—ฌ๋Ÿฌ ์‘์šฉ๋“ค๊ณผ ์‘์šฉ๋“ค์˜ ๋™์  ์–‘์ƒ์„ ๋‹ค๋ฃจ๋Š” ๊ฒƒ๊นŒ์ง€๋กœ ํ™•์žฅ์ด ๋œ ๊ฒƒ์ด๋‹ค. ์ œ์•ˆ๋œ ๊ธฐ๋ฒ•์—์„œ๋Š”, ์šฐ์„  ์„ค๊ณ„ ์ˆ˜์ค€์—์„œ ํ• ๋‹น๋˜๋Š” ํ”„๋กœ์„ธ์„œ์˜ ๊ฐฏ์ˆ˜๋ฅผ ๋ณ€ํ™”์‹œ์ผœ๊ฐ€๋ฉด์„œ ๋™๊ธฐํ™”๋œ ๋ฐ์ดํƒ€-ํ๋ฆ„ ๊ทธ๋ž˜ํ”„๋“ค์˜ ์ฒ˜๋ฆฌ๋Ÿ‰์ด ์ตœ๋Œ€๋กœ ์–ป์–ด์ง€๋Š” ๋งคํ•‘ ๊ฒฐ๊ณผ๋“ค์„ ์–ป๋Š”๋‹ค. ๊ทธ๋ฆฌ๊ณ ๋‚˜์„œ ์ˆ˜ํ–‰ ์‹œ๊ฐ„์—๋Š” ๋ฏธ๋ฆฌ ๊ณ„์‚ฐ๋œ ๋งคํ•‘ ์ •๋ณด๋“ค์„ ๊ฐ€์ง€๊ณ  ์ˆ˜ํ–‰์ค‘์ธ ์‘์šฉ๋“ค์˜ ๋งคํ•‘์„, ๋™์ ์ธ ์‹œ์Šคํ…œ ๋ณ€ํ™”๊ฐ€ ๋ฐœ์ƒํ•  ๋•Œ๋งˆ๋‹ค ์ ์šฉํ•˜๊ฒŒ ๋œ๋‹ค. ์ œ์•ˆ๋œ ์ž์› ๊ด€๋ฆฌ ๊ธฐ๋ฒ•์€ Noxim์ด๋ผ๋Š” ๋„คํŠธ์›Œํฌ-์˜จ-์นฉ ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ ์œ„์—์„œ ๊ตฌํ˜„์ด ๋˜์—ˆ์œผ๋ฉฐ, ์‹คํ—˜ ๊ฒฐ๊ณผ๋“ค์€ ์ œ์•ˆ๋œ ๊ธฐ๋ฒ•์ด ์ตœ์‹ ์˜ ๋‹ค๋ฅธ ๊ธฐ๋ฒ•๋“ค๊ณผ ๋น„๊ตํ•˜์—ฌ ๋” ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์˜€๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ๋Š”, ์‹œ์Šคํ…œ์˜ ์„ฑ๋Šฅ์„ ์‹œ์Šคํ…œ-์˜จ-์นฉ ์ œ์ž‘ ์ด์ „์— ๋ณด๋‹ค ์ •ํ™•ํ•˜๊ฒŒ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด์„œ, ๋‘ ๋ฒˆ์งธ ๊ธฐ๋ฒ•์„ ๊ตฌํ˜„ํ•œ ์†Œํ”„ํŠธ์›จ์–ด ํ”Œ๋žซํผ์ด ๋งค๋‹ˆ์ฝ”์–ด ์•„ํ‚คํ…์ฒ˜๋ฅผ ๋Œ€์ƒ์œผ๋กœ ์ œ์•ˆ๋˜์—ˆ๋‹ค. ๊ธฐ์กด์˜ ๋งค๋‹ˆ์ฝ”์–ด ์•„ํ‚คํ…์ฒ˜๋ฅผ ๋Œ€์ƒ์œผ๋กœ ํ•œ ์—ฐ๊ตฌ๋“ค์€ ์ฃผ๋กœ ์ƒ์œ„ ์ˆ˜์ค€์˜ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ์„ฑ๋Šฅ์„ ์ธก์ •ํ•˜์˜€๊ธฐ ๋•Œ๋ฌธ์—, ์‹ค์ œ ์„ฑ๋Šฅ๊ณผ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์„ฑ๋Šฅ์ด ์–ผ๋งˆ๋‚˜ ์ฐจ์ด๊ฐ€ ๋‚ ์ง€๋ฅผ ์ •ํ™•ํ•˜๊ฒŒ ์•Œ ์ˆ˜๊ฐ€ ์—†์—ˆ๋‹ค. ์ด๋Ÿฌํ•œ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ์†Œํ”„ํŠธ์›จ์–ด ํ”Œ๋žซํผ๊ณผ, ๊ฐ€์ƒ ํ”„๋กœํ† ํƒ€์ดํ•‘ ์‹œ์Šคํ…œ ๋ฐ ์ œ์˜จ ์—๋ฎฌ๋ ˆ์ด์…˜ ์‹œ์Šคํ…œ์—์„œ์˜ ํ”Œ๋žซํผ ๊ตฌํ˜„ ๋ฐฉ๋ฒ•์ด ์ œ์•ˆ์ด ๋˜์—ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์‹ค์ œ ์‹œ์Šคํ…œ ๊ตฌํ˜„์„ ํ†ตํ•˜์—ฌ ์ œ์•ˆ๋œ ๋ณตํ•ฉ ์ž์› ๊ด€๋ฆฌ ๊ธฐ๋ฒ•์—์„œ์˜ ๋‹ค์–‘ํ•œ ๋™์  ๋น„์šฉ๋“ค์ด ์ •ํ™•ํ•˜๊ฒŒ ์ถ”์‚ฐ์ด ๋  ์ˆ˜ ์žˆ์—ˆ๋‹ค. ์‹คํ—˜์—์„œ๋Š” ์ œ์•ˆ๋œ ์†Œํ”„ํŠธ์›จ์–ด ๊ธฐ๋ฒ•์ด ํƒœ์Šคํฌ๋“ค์˜ ๋™์  ๋งคํ•‘๊ณผ ์ฒดํฌ-ํฌ์ธํŒ…์„ ํ†ตํ•œ ํ”„๋กœ์„ธ์„œ ์˜๊ตฌ ๊ณ ์žฅ์„ ํšจ๊ณผ์ ์œผ๋กœ ๊ฐ๋‚ดํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์˜€๋‹ค.Owing to the incessant technology improvement, the number of processors integrated into a single chip increases consistently, integrating more and more applications. Also, demand for higher computing capability for applications makes a many-core accelerator become an important computing resource in a system-on-chip. Efficient handling of the accelerator at run-time, however, is very challenging because the system status is subject to change dynamically by various factors. At the system level, the set of applications running concurrently may change according to user request. At the application level, the application behavior may change dynamically depending on input data or operation mode. At the architecture level, hardware resource availability may vary since hardware components may experience transient or permanent failures. In this thesis, to resolve the difficulties in handling many-core accelerator, three techniques are proposed. The first technique is the re-scheduling of the entire application to minimize throughput degradation under a latency constraint when a permanent processor failure occurs. Sub-optimal re-scheduling results using a genetic algorithm for each scenario of processor failures are obtained at compile-time. If a failure is detected at run-time, the live processors obtain the saved schedule, perform task transfer, and execute the remaining tasks of the current iteration. In this technique, preemptive and non-preemptive migration policies and a hybrid policy are proposed to obtain better performance. The viability of the proposed technique with real-life DSP applications as well as randomly generated graphs under timing constraints and random fault scenarios are shown through experiments. The second technique is a hybrid resource management scheme, expanded version of the first technique that also handles multi-applications specified as SDF graph and their relevant dynamisms such as application/task arrivals/ends as well as processor permanent failures. In the proposed technique, at design-time, throughput-maximized mappings of each SDF graph by varying the number of allocated processors are determined. Then, at run-time, the pre-computed mapping information is exploited to adjust the mapping of active applications to the processors without user intervention on the system status change. The proposed resource management is evaluated through intensive experiments with an in-house simulator built on top of Noxim, a Network-on-Chip simulator. Experimental results show the enhanced adaptability to dynamic system status change compared to other state-of-the-art approaches. Finally, the software platform for a homogeneous many-core architecture that implements the second technique is proposed to evaluate the system performance more accurately before SoC fabrication. Existing approaches usually use a high-level simulation model to estimate the performance without knowing how much actual performance will be deviated from the estimation. To overcome the limitation, the software platform is proposed and implementation details on a virtual prototyping system and on an emulation system realized with an Intel Xeon-Phi coprocessor are presented. Actual implementation enables us to investigate the overheads involved in the hybrid resource management technique in detail, which was not possible in high-level simulation. Experimental results confirm that the proposed software platform adapts to the dynamic workload variation effectively by dynamic mapping of tasks and tolerate unexpected core failures by check-pointing.Abstract i Contents iv List of Figures viii List of Tables xii Chapter 1 Introduction 1 1.1 Motivation . . . . . . . . . . . . 1 1.2 Contribution . . . . . . . . . . . . 5 1.3 Thesis Organization . . . . . . . . . . . 7 Chapter 2 Preliminaries 8 2.1 Application Model . . . . . . . . . . 8 2.2 Architecture Model . . . . . . . . . . 13 2.3 Fault Model . . . . . . . . . . . . 15 2.4 Thesis Overview . . . . . . . . . . . 15 Chapter 3 Fault-aware Task Mapping 17 3.1 Introduction . . . . . . . . . . . . 17 3.2 Related Work . . . . . . . . . . . . 20 3.2.1 Static Approach . . . . . . . . . . 21 3.2.2 Dynamic Approach . . . . . . . . . . 22 3.3 Proposed Task Remapping/Rescheduling Technique . . 23 3.3.1 Remapping Technique . . . . . . . . 23 3.3.2 Rescheduling Technique . . . . . . . . 31 3.4 Experiments . . . . . . . . . . . . . 38 3.4.1 Remapping Results . . . . . . . . 38 3.4.2 Rescheduling Results . . . . . . . . 46 Chapter 4 Fault-aware Resource Management 53 4.1 Introduction . . . . . . . . . . . . 53 4.2 Related Work . . . . . . . . . . . . 54 4.2.1 Static Approach . . . . . . . . . . 55 4.2.2 Dynamic Approach . . . . . . . . . 55 4.2.3 Hybrid Approach . . . . . . . . . . 57 4.2.4 Summary . . . . . . . . . . . . 57 4.3 Background . . . . . . . . . . . . . 58 4.3.1 Energy Model . . . . . . . . . . . 59 4.3.2 Notation . . . . . . . . . . . . 60 4.4 Proposed Resource Management Technique . . . . 61 4.4.1 Motivational Example . . . . . . . . . 61 4.4.2 Overall Procedure . . . . . . . . . . 65 4.4.3 Design-time Analysis . . . . . . . . . 66 4.4.4 Run-time Mapping . . . . . . . . . . 67 4.5 Experiments . . . . . . . . . . . . . 74 4.5.1 Setup . . . . . . . . . . . . . . 74 4.5.2 Analysis of Run-time Overheads . . . . . . 75 4.5.3 Comparison with Other Approaches . . . . 79 Chapter 5 Software Platform for Resource Management 86 5.1 Introduction . . . . . . . . . . . . 86 5.2 Related Work . . . . . . . . . . . . 87 5.3 Overall Structure . . . . . . . . . . . . 88 5.4 Components of Software Platform . . . . . . 89 5.4.1 Application API Layer . . . . . . . . . 89 5.4.2 Communication Interface Module . . . . . 92 5.4.3 Host Interface Layer . . . . . . . . . 93 5.4.4 Memory Management Module . . . . . . 94 5.4.5 Design-time Analysis . . . . . . . . . 94 5.4.6 Slave Manager . . . . . . . . . . . 98 5.5 Software Platform Implementation . . . . . . 99 5.5.1 Scheduling Information . . . . . . . . 100 5.5.2 Function Migration and Execution . . . . . 101 5.5.3 Function Migration and Execution . . . . . 102 5.6 Virtual Prototyping System . . . . . . . . 105 5.7 Xeon Emulation System . . . . . . . . . 106 5.8 Experiments . . . . . . . . . . . . . 107 5.8.1 Setup . . . . . . . . . . . . . . 107 5.8.2 Experiments on the Virtual Prototyping System . . 108 5.8.3 Experiments on the Xeon Emulation System . . . 111 Chapter 6 Conclusion 116 Bibliography 119 Abstract in Korean 130Docto

    Improving Model-Based Software Synthesis: A Focus on Mathematical Structures

    Get PDF
    Computer hardware keeps increasing in complexity. Software design needs to keep up with this. The right models and abstractions empower developers to leverage the novelties of modern hardware. This thesis deals primarily with Models of Computation, as a basis for software design, in a family of methods called software synthesis. We focus on Kahn Process Networks and data๏ฌ‚ow applications as abstractions, both for programming and for deriving an e๏ฌƒcient execution on heterogeneous multicores. The latter we accomplish by exploring the design space of possible mappings of computation and data to hardware resources. Mapping algorithms are not at the center of this thesis, however. Instead, we examine the mathematical structure of the mapping space, leveraging its inherent symmetries or geometric properties to improve mapping methods in general. This thesis thoroughly explores the process of model-based design, aiming to go beyond the more established software synthesis on data๏ฌ‚ow applications. We starting with the problem of assessing these methods through benchmarking, and go on to formally examine the general goals of benchmarks. In this context, we also consider the role modern machine learning methods play in benchmarking. We explore different established semantics, stretching the limits of Kahn Process Networks. We also discuss novel models, like Reactors, which are designed to be a deterministic, adaptive model with time as a ๏ฌrst-class citizen. By investigating abstractions and transformations in the Ohua language for implicit data๏ฌ‚ow programming, we also focus on programmability. The focus of the thesis is in the models and methods, but we evaluate them in diverse use-cases, generally centered around Cyber-Physical Systems. These include the 5G telecommunication standard, automotive and signal processing domains. We even go beyond embedded systems and discuss use-cases in GPU programming and microservice-based architectures
    • โ€ฆ
    corecore