12 research outputs found

    StreamDrive: A Dynamic Dataflow Framework for Clustered Embedded Architectures

    Get PDF
    In this paper, we present StreamDrive, a dynamic dataflow framework for programming clustered embedded multicore architectures. StreamDrive simplifies development of dynamic dataflow applications starting from sequential reference C code and allows seamless handling of heterogeneous and applicationspecific processing elements by applications. We address issues of ecient implementation of the dynamic dataflow runtime system in the context of constrained embedded environments, which have not been sufficiently addressed by previous research. We conducted a detailed performance evaluation of the StreamDrive implementation on our Application Specic MultiProcessor (ASMP) cluster using the Oriented FAST and Rotated BRIEF (ORB) algorithm typical of image processing domain.We have used the proposed incremental development flow for the transformation of the ORB original reference C code into an optimized dynamic dataflow implementation. Our implementation has less than 10% parallelization overhead, near-linear speedup when the number of processors increases from 1 to 8, and achieves the performance of 15 VGA frames per second with a small cluster configuration of 4 processing elements and 64KB of shared memory, and of 30 VGA frames per second with 8 processors and 128KB of shared memory

    A Dataflow Framework For Developing Flexible Embedded Accelerators A Computer Vision Case Study.

    Get PDF
    The focus of this dissertation is the design and the implementation of a computing platform which can accelerate data processing in the embedded computation domain. We focus on a heterogeneous computing platform, whose hardware implementation can approach the power and area efficiency of specialized designs, while remaining flexible across the application domain. The multi-core architectures require parallel programming, which is widely-regarded as more challenging than sequential programming. Although shared memory parallel programs may be fairly easy to write (using OpenMP, for example), they are quite hard to optimize; providing embedded application developers with optimizing tools and programming frameworks is a challenge. The heterogeneous specialized elements make the problem even more difficult. Dataflow is a parallel computation model that relies exclusively on message passing, and that has some advantages over parallel programming tools in wide use today: simplicity, graphical representation, and determinism. Dataflow model is also a good match to streaming applications, such as audio, video and image processing, which operate on large sequences of data and are characterized by abundant parallelism and regular memory access patterns. Dataflow model of computation has gained acceptance in simulation and signal-processing communities. This thesis evaluates the applicability of the dataflow model for implementing domain-specific embedded accelerators for streaming applications

    ์‹ค์‹œ๊ฐ„ ์ž„๋ฒ ๋””๋“œ ์‹œ์Šคํ…œ์„ ์œ„ํ•œ ๋™์  ํ–‰์œ„ ๋ช…์„ธ ๋ฐ ์„ค๊ณ„ ๊ณต๊ฐ„ ํƒ์ƒ‰ ๊ธฐ๋ฒ•

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2016. 8. ํ•˜์ˆœํšŒ.ํ•˜๋‚˜์˜ ์นฉ์— ์ง‘์ ๋˜๋Š” ํ”„๋กœ์„ธ์„œ์˜ ๊ฐœ์ˆ˜๊ฐ€ ๋งŽ์•„์ง€๊ณ , ๋งŽ์€ ๊ธฐ๋Šฅ๋“ค์ด ํ†ตํ•ฉ๋จ์— ๋”ฐ๋ผ, ์—ฐ์‚ฐ์–‘์˜ ๋ณ€ํ™”, ์„œ๋น„์Šค์˜ ํ’ˆ์งˆ, ์˜ˆ์ƒ์น˜ ๋ชปํ•œ ์‹œ์Šคํ…œ ์š”์†Œ์˜ ๊ณ ์žฅ ๋“ฑ๊ณผ ๊ฐ™์€ ๋‹ค์–‘ํ•œ ์š”์†Œ๋“ค์— ์˜ํ•ด ์‹œ์Šคํ…œ์˜ ์ƒํƒœ๊ฐ€ ๋™์ ์œผ๋กœ ๋ณ€ํ™”ํ•˜๊ฒŒ ๋œ๋‹ค. ๋ฐ˜๋ฉด์—, ๋ณธ ๋…ผ๋ฌธ์—์„œ ์ฃผ๋œ ๊ด€์‹ฌ์‚ฌ๋ฅผ ๊ฐ€์ง€๋Š” ์Šค๋งˆํŠธ ํฐ ์žฅ์น˜์—์„œ ์ฃผ๋กœ ์‚ฌ์šฉ๋˜๋Š” ๋น„๋””์˜ค, ๊ทธ๋ž˜ํ”ฝ ์‘์šฉ๋“ค์˜ ๊ฒฝ์šฐ, ๊ณ„์‚ฐ ๋ณต์žก๋„๊ฐ€ ์ง€์†์ ์œผ๋กœ ์ฆ๊ฐ€ํ•˜๊ณ  ์žˆ๋‹ค. ๋”ฐ๋ผ์„œ, ์ด๋ ‡๊ฒŒ ๋™์ ์œผ๋กœ ๋ณ€ํ•˜๋Š” ํ–‰์œ„๋ฅผ ๊ฐ€์ง€๋ฉด์„œ๋„ ๋ณ‘๋ ฌ์„ฑ์„ ๋‚ด์ œํ•œ ๊ณ„์‚ฐ ์ง‘์•ฝ์ ์ธ ์—ฐ์‚ฐ์„ ํฌํ•จํ•˜๋Š” ๋ณต์žกํ•œ ์‹œ์Šคํ…œ์„ ๊ตฌํ˜„ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์ฒด๊ณ„์ ์ธ ์„ค๊ณ„ ๋ฐฉ๋ฒ•๋ก ์ด ๊ณ ๋„๋กœ ์š”๊ตฌ๋œ๋‹ค. ๋ชจ๋ธ ๊ธฐ๋ฐ˜ ๋ฐฉ๋ฒ•๋ก ์€ ๋ณ‘๋ ฌ ์ž„๋ฒ ๋””๋“œ ์†Œํ”„ํŠธ์›จ์–ด ๊ฐœ๋ฐœ์„ ์œ„ํ•œ ๋Œ€ํ‘œ์ ์ธ ๋ฐฉ๋ฒ• ์ค‘ ํ•˜๋‚˜์ด๋‹ค. ํŠนํžˆ, ์‹œ์Šคํ…œ ๋ช…์„ธ, ์ •์  ์„ฑ๋Šฅ ๋ถ„์„, ์„ค๊ณ„ ๊ณต๊ฐ„ ํƒ์ƒ‰, ๊ทธ๋ฆฌ๊ณ  ์ž๋™ ์ฝ”๋“œ ์ƒ์„ฑ๊นŒ์ง€์˜ ๋ชจ๋“  ์„ค๊ณ„ ๋‹จ๊ณ„๋ฅผ ์ง€์›ํ•˜๋Š” ๋ณ‘๋ ฌ ์ž„๋ฒ ๋””๋“œ ์†Œํ”„ํŠธ์›จ์–ด ์„ค๊ณ„ ํ™˜๊ฒฝ์œผ๋กœ์„œ, HOPES ํ”„๋ ˆ์ž„์›Œํฌ๊ฐ€ ์ œ์‹œ๋˜์—ˆ๋‹ค. ๋‹ค๋ฅธ ์„ค๊ณ„ ํ™˜๊ฒฝ๋“ค๊ณผ๋Š” ๋‹ค๋ฅด๊ฒŒ, ์ด๊ธฐ์ข… ๋ฉ€ํ‹ฐํ”„๋กœ์„ธ์„œ ์•„ํ‚คํ…์ฒ˜์—์„œ์˜ ์ผ๋ฐ˜์ ์ธ ์ˆ˜ํ–‰ ๋ชจ๋ธ๋กœ์„œ, ๊ณตํ†ต ์ค‘๊ฐ„ ์ฝ”๋“œ (CIC) ๋ผ๊ณ  ๋ถ€๋ฅด๋Š” ํ”„๋กœ๊ทธ๋ž˜๋ฐ ํ”Œ๋žซํผ์ด๋ผ๋Š” ์ƒˆ๋กœ์šด ๊ฐœ๋…์„ ์†Œ๊ฐœํ•˜์˜€๋‹ค. CIC ํƒœ์Šคํฌ ๋ชจ๋ธ์€ ํ”„๋กœ์„ธ์Šค ๋„คํŠธ์›Œํฌ ๋ชจ๋ธ์— ๊ธฐ๋ฐ˜ํ•˜๊ณ  ์žˆ์ง€๋งŒ, SDF ๋ชจ๋ธ๋กœ ๊ตฌ์ฒดํ™”๋  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์—, ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์ •์  ๋ถ„์„์ด ์šฉ์ดํ•˜๋‹ค๋Š” ์žฅ์ ์„ ๊ฐ€์ง„๋‹ค. ํ•˜์ง€๋งŒ, SDF ๋ชจ๋ธ์€ ์‘์šฉ์˜ ๋™์ ์ธ ํ–‰์œ„๋ฅผ ๋ช…์„ธํ•  ์ˆ˜ ์—†๋‹ค๋Š” ํ‘œํ˜„์ƒ์˜ ์ œ์•ฝ์„ ๊ฐ€์ง„๋‹ค. ์ด๋Ÿฌํ•œ ์ œ์•ฝ์„ ๊ทน๋ณตํ•˜๊ณ , ์‹œ์Šคํ…œ์˜ ๋™์  ํ–‰์œ„๋ฅผ ์‘์šฉ ์™ธ๋ถ€์™€ ๋‚ด๋ถ€๋กœ ๊ตฌ๋ถ„ํ•˜์—ฌ ๋ช…์„ธํ•˜๊ธฐ ์œ„ํ•ด, ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋ฐ์ดํ„ฐ ํ”Œ๋กœ์šฐ์™€ ์œ ํ•œ์ƒํƒœ๊ธฐ (FSM) ๋ชจ๋ธ์— ๊ธฐ๋ฐ˜ํ•˜์—ฌ ํ™•์žฅ๋œ CIC ํƒœ์Šคํฌ ๋ชจ๋ธ์„ ์ œ์•ˆํ•œ๋‹ค. ์ƒ์œ„ ์ˆ˜์ค€์—์„œ๋Š”, ๊ฐ ์‘์šฉ์€ ๋ฐ์ดํ„ฐ ํ”Œ๋กœ์šฐ ํƒœ์Šคํฌ๋กœ ๋ช…์„ธ ๋˜๋ฉฐ, ๋™์  ํ–‰์œ„๋Š” ์‘์šฉ๋“ค์˜ ์ˆ˜ํ–‰์„ ๊ฐ๋…ํ•˜๋Š” ์ œ์–ด ํƒœ์Šคํฌ๋กœ ๋ชจ๋ธ ๋œ๋‹ค. ๋ฐ์ดํ„ฐ ํ”Œ๋กœ์šฐ ํƒœ์Šคํฌ ๋‚ด๋ถ€๋Š”, ์œ ํ•œ์ƒํƒœ๊ธฐ ๊ธฐ๋ฐ˜์˜ SADF ๋ชจ๋ธ๊ณผ ์œ ์‚ฌํ•œ ํ˜•ํƒœ๋กœ ๋™์  ํ–‰์œ„๊ฐ€ ๋ช…์„ธ ๋œ๋‹คSDF ํƒœ์Šคํฌ๋Š” ๋ณต์ˆ˜๊ฐœ์˜ ํ–‰์œ„๋ฅผ ๊ฐ€์งˆ ์ˆ˜ ์žˆ์œผ๋ฉฐ, ๋ชจ๋“œ ์ „ํ™˜๊ธฐ (MTM)์ด๋ผ๊ณ  ๋ถˆ๋ฆฌ๋Š” ์œ ํ•œ ์ƒํƒœ๊ธฐ์˜ ํ…Œ์ด๋ธ” ํ˜•ํƒœ์˜ ๋ช…์„ธ๋ฅผ ํ†ตํ•ด SDF ๊ทธ๋ž˜ํ”„์˜ ๋ชจ๋“œ ์ „ํ™˜ ๊ทœ์น™์„ ๋ช…์„ธ ํ•œ๋‹ค. ์ด๋ฅผ MTM-SDF ๊ทธ๋ž˜ํ”„๋ผ๊ณ  ๋ถ€๋ฅด๋ฉฐ, ๋ณต์ˆ˜ ๋ชจ๋“œ ๋ฐ์ดํ„ฐ ํ”Œ๋กœ์šฐ ๋ชจ๋ธ ์ค‘ ํ•˜๋‚˜๋ผ ๊ตฌ๋ถ„๋œ๋‹ค. ์‘์šฉ์€ ์œ ํ•œํ•œ ํ–‰์œ„ (๋˜๋Š” ๋ชจ๋“œ)๋ฅผ ๊ฐ€์ง€๋ฉฐ, ๊ฐ ํ–‰์œ„ (๋ชจ๋“œ)๋Š” SDF ๊ทธ๋ž˜ํ”„๋กœ ํ‘œํ˜„๋˜๋Š” ๊ฒƒ์„ ๊ฐ€์ •ํ•œ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ๋‹ค์–‘ํ•œ ํ”„๋กœ์„ธ์„œ ๊ฐœ์ˆ˜์— ๋Œ€ํ•ด ๋‹จ์œ„์‹œ๊ฐ„๋‹น ์ฒ˜๋ฆฌ๋Ÿ‰์„ ์ตœ๋Œ€ํ™”ํ•˜๋Š” ์ปดํŒŒ์ผ-์‹œ๊ฐ„ ์Šค์ผ€์ค„๋ง์„ ์ˆ˜ํ–‰ํ•˜๊ณ , ์Šค์ผ€์ค„ ๊ฒฐ๊ณผ๋ฅผ ์ €์žฅํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•œ๋‹ค. ๋˜ํ•œ, ๋ณต์ˆ˜ ๋ชจ๋“œ ๋ฐ์ดํ„ฐ ํ”Œ๋กœ์šฐ ๊ทธ๋ž˜ํ”„๋ฅผ ์œ„ํ•œ ๋ฉ€ํ‹ฐํ”„๋กœ์„ธ์„œ ์Šค์ผ€์ค„๋ง ๊ธฐ๋ฒ•์„ ์ œ์‹œํ•œ๋‹ค. ๋ณต์ˆ˜ ๋ชจ๋“œ ๋ฐ์ดํ„ฐ ํ”Œ๋กœ์šฐ ๊ทธ๋ž˜ํ”„๋ฅผ ์œ„ํ•œ ๋ช‡๋ช‡ ์Šค์ผ€์ค„๋ง ๊ธฐ๋ฒ•๋“ค์ด ์กด์žฌํ•˜์ง€๋งŒ, ๋ชจ๋“œ ์‚ฌ์ด์— ํƒœ์Šคํฌ ์ด์ฃผ๋ฅผ ํ—ˆ์šฉํ•œ ๊ธฐ๋ฒ•๋“ค์€ ์กด์žฌํ•˜์ง€ ์•Š๋Š”๋‹ค. ํ•˜์ง€๋งŒ ํƒœ์Šคํฌ ์ด์ฃผ๋ฅผ ํ—ˆ์šฉํ•˜๊ฒŒ ๋˜๋ฉด ์ž์› ์š”๊ตฌ๋Ÿ‰์„ ์ค„์ผ ์ˆ˜ ์žˆ๋‹ค๋Š” ๋ฐœ๊ฒฌ์„ ํ†ตํ•ด, ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋ชจ๋“œ ์‚ฌ์ด์˜ ํƒœ์Šคํฌ ์ด์ฃผ๋ฅผ ํ—ˆ์šฉํ•˜๋Š” ๋ณต์ˆ˜ ๋ชจ๋“œ ๋ฐ์ดํ„ฐ ํ”Œ๋กœ์šฐ ๊ทธ๋ž˜ํ”„๋ฅผ ์œ„ํ•œ ๋ฉ€ํ‹ฐํ”„๋กœ์„ธ์„œ ์Šค์ผ€์ค„๋ง ๊ธฐ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์œ ์ „ ์•Œ๊ณ ๋ฆฌ์ฆ˜์— ๊ธฐ๋ฐ˜ํ•˜์—ฌ, ์ œ์•ˆํ•˜๋Š” ๊ธฐ๋ฒ•์€ ์ž์› ์š”๊ตฌ๋Ÿ‰์„ ์ตœ์†Œํ™”ํ•˜๊ธฐ ์œ„ํ•ด ๊ฐ ๋ชจ๋“œ์— ํ•ด๋‹นํ•˜๋Š” ๋ชจ๋“  SDF ๊ทธ๋ž˜ํ”„๋ฅผ ๋™์‹œ์— ์Šค์ผ€์ค„ ํ•œ๋‹ค. ์ฃผ์–ด์ง„ ๋‹จ์œ„ ์‹œ๊ฐ„๋‹น ์ฒ˜๋ฆฌ๋Ÿ‰ ์ œ์•ฝ์„ ๋งŒ์กฑ์‹œํ‚ค๊ธฐ ์œ„ํ•ด, ์ œ์•ˆํ•˜๋Š” ๊ธฐ๋ฒ•์€ ๊ฐ ๋ชจ๋“œ ๋ณ„๋กœ ์‹ค์ œ ์ฒ˜๋ฆฌ๋Ÿ‰ ์š”๊ตฌ๋Ÿ‰์„ ๊ณ„์‚ฐํ•˜๋ฉฐ, ์ฒ˜๋ฆฌ๋Ÿ‰์˜ ๋ถˆ๊ทœ์น™์„ฑ์„ ์™„ํ™”ํ•˜๊ธฐ ์œ„ํ•œ ์ถœ๋ ฅ ๋ฒ„ํผ์˜ ํฌ๊ธฐ๋ฅผ ๊ณ„์‚ฐํ•œ๋‹ค. ๋ช…์„ธ๋œ ํƒœ์Šคํฌ ๊ทธ๋ž˜ํ”„์™€ ์Šค์ผ€์ค„ ๊ฒฐ๊ณผ๋กœ๋ถ€ํ„ฐ, HOPES ํ”„๋ ˆ์ž„์›Œํฌ๋Š” ๋Œ€์ƒ ์•„ํ‚คํ…์ฒ˜๋ฅผ ์œ„ํ•œ ์ž๋™ ์ฝ”๋“œ ์ƒ์„ฑ์„ ์ง€์›ํ•œ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ์ž๋™ ์ฝ”๋“œ ์ƒ์„ฑ๊ธฐ๋Š” CIC ํƒœ์Šคํฌ ๋ชจ๋ธ์˜ ํ™•์žฅ๋œ ํŠน์ง•๋“ค์„ ์ง€์›ํ•˜๋„๋ก ํ™•์žฅ๋˜์—ˆ๋‹ค. ์‘์šฉ ์ˆ˜์ค€์—์„œ๋Š” MTM-SDF ๊ทธ๋ž˜ํ”„๋ฅผ ์ฃผ์–ด์ง„ ์ •์  ์Šค์ผ€์ค„๋ง ๊ฒฐ๊ณผ๋ฅผ ๋”ฐ๋ฅด๋Š” ๋ฉ€ํ‹ฐํ”„๋กœ์„ธ์„œ ์ฝ”๋“œ๋ฅผ ์ƒ์„ฑํ•˜๋„๋ก ํ™•์žฅ๋˜์—ˆ๋‹ค. ๋˜ํ•œ, ๋„ค ๊ฐ€์ง€ ์„œ๋กœ ๋‹ค๋ฅธ ์Šค์ผ€์ค„๋ง ์ •์ฑ… (fully-static, self-timed, static-assignment, fully-dynamic)์— ๋Œ€ํ•œ ๋ฉ€ํ‹ฐํ”„๋กœ์„ธ์„œ ์ฝ”๋“œ ์ƒ์„ฑ์„ ์ง€์›ํ•œ๋‹ค. ์‹œ์Šคํ…œ ์ˆ˜์ค€์—์„œ๋Š” ์ง€์›ํ•˜๋Š” ์‹œ์Šคํ…œ ์š”์ฒญ API์— ๋Œ€ํ•œ ์‹ค์ œ ๊ตฌํ˜„ ์ฝ”๋“œ๋ฅผ ์ƒ์„ฑํ•˜๋ฉฐ, ์ •์  ์Šค์ผ€์ค„ ๊ฒฐ๊ณผ์™€ ํƒœ์Šคํฌ๋“ค์˜ ์ œ์–ด ๊ฐ€๋Šฅํ•œ ์†์„ฑ๋“ค์— ๋Œ€ํ•œ ์ž๋ฃŒ ๊ตฌ์กฐ ์ฝ”๋“œ๋ฅผ ์ƒ์„ฑํ•œ๋‹ค. ๋ณต์ˆ˜ ๋ชจ๋“œ ๋ฉ€ํ‹ฐ๋ฏธ๋””์–ด ํ„ฐ๋ฏธ๋„ ์˜ˆ์ œ๋ฅผ ํ†ตํ•œ ๊ธฐ์ดˆ์ ์ธ ์‹คํ—˜๋“ค์„ ํ†ตํ•ด, ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•๋ก ์˜ ํƒ€๋‹น์„ฑ์„ ๋ณด์ธ๋‹ค.As the number of processors in a chip increases, and more functions are integrated, the system status will change dynamically due to various factors such as the workload variation, QoS requirement, and unexpected component failure. On the other hand, computation-complexity of user applications is also steadily increasingvideo and graphics applications are two major driving forces in smart mobile devices, which define the main application domain of interest in this dissertation. So, a systematic design methodology is highly required to implement such complex systems which contain dynamically changed behavior as well as computation-intensive workload that can be parallelized. A model-based approach is one of representative approaches for parallel embedded software development. Especially, HOPES framework is proposed which is a design environment for parallel embedded software supporting the overall design steps: system specification, performance estimation, design space exploration, and automatic code generation. Distinguished from other design environments, it introduces a novel concept of programming platform, called CIC (Common Intermediate Code) that can be understood as a generic execution model of heterogeneous multiprocessor architecture. The CIC task model is based on a process network model, but it can be refined to the SDF (Synchronous Data Flow) model, since it has a very desirable features for static analyzability as well as parallel processing. However, the SDF model has a typical weakness of expression capability, especially for the system-level specification and dynamically changed behavior of an application. To overcome this weakness, in this dissertation, we propose an extended CIC task model based on dataflow and FSM models to specify the dynamic behavior of the system distinguishing inter- and intra-application dynamism. At the top-level, each application is specified by a dataflow task and the dynamic behavior is modeled as a control task that supervises the execution of applications. Inside a dataflow task, it specifies the dynamic behavior using a similar way as FSM-based SADFan SDF task may have multiple behaviors and a tabular specification of an FSM, called MTM (Mode Transition Machine), describes the mode transition rules for the SDF graph. We call it to MTM-SDF model which is classified as multi-mode dataflow models in the dissertation. It assumes that an application has a finite number of behaviors (or modes) and each behavior (mode) is represented by an SDF graph. It enables us to perform compile-time scheduling of each graph to maximize the throughput varying the number of allocated processors, and store the scheduling information. Also, a multiprocessor scheduling technique is proposed for a multi-mode dataflow graph. While there exist several scheduling techniques for multi-mode dataflow models, no one allows task migration between modes. By observing that the resource requirement can be additionally reduced if task migration is allowed, we propose a multiprocessor scheduling technique of a multi-mode dataflow graph considering task migration between modes. Based on a genetic algorithm, the proposed technique schedules all SDF graphs in all modes simultaneously to minimize the resource requirement. To satisfy the throughput constraint, the proposed technique calculates the actual throughput requirement of each mode and the output buffer size for tolerating throughput jitter. For the specified task graph and scheduling results, the CIC translator generates parallelized code for the target architecture. Therefore the CIC translator is extended to support extended features of the CIC task model. In application-level, it is extended to support multiprocessor code generation for an MTM-SDF graph considering the given static scheduling results. Also, multiprocessor code generation of four different scheduling policies are supported for an MTM-SDF graph: fully-static, self-timed, static-assignment, and fully-dynamic. In system-level, the CIC translator is extended to support code generation for implementation of system request APIs and data structures for the static scheduling results and configurable task parameters. Through preliminary experiments with a multi-mode multimedia terminal example, the viability of the proposed methodology is verified.Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Contribution 7 1.3 Dissertation organization 9 Chapter 2 Background 10 2.1 Related work 10 2.1.1 Compiler-based approach 10 2.1.2 Language-based approach 11 2.1.3 Model-based approach 15 2.2 HOPES framework 19 2.3 Common Intermediate Code (CIC) Model 21 Chapter 3 Dynamic Behavior Specification 26 3.1 Problem definition 26 3.1.1 System-level dynamic behavior 26 3.1.2 Application-level dynamic behavior 27 3.2 Related work 28 3.3 Motivational example 31 3.4 Control task specification for system-level dynamism 33 3.4.1 Internal specification 33 3.4.2 Action scripts 38 3.5 MTM-SDF specification for application-level dynamism 44 3.5.1 MTM specification 44 3.5.2 Task graph specification 45 3.5.3 Execution semantic of an MTM-SDF graph 46 Chapter 4 Multiprocessor Scheduling of an Multi-mode Dataflow Graph 50 4.1 Related work 51 4.2 Motivational example 56 4.2.1 Throughput requirement calculation considering mode transition delay 56 4.2.2 Task migration between mode transition 58 4.3 Problem definition 61 4.4 Throughput requirement analysis 65 4.4.1 Mode transition delay 66 4.4.2 Arrival curves of the output buffer 70 4.4.3 Buffer size determination 71 4.4.4 Throughput requirement analysis 73 4.5 Proposed MMDF scheduling framework 75 4.5.1 Optimization problem 75 4.5.2 GA configuration 76 4.5.3 Fitness function 78 4.5.4 Local optimization technique 79 4.6 Experimental results 81 4.6.1 MMDF scheduling technique 83 4.6.2 Scalability of the Proposed Framework 88 Chapter 5 Multiprocessor Code Generation for the Extended CIC Model 89 5.1 CIC translator 89 5.2 Code generation for application-level dynamism 91 5.2.1 Function call-style code generation (fully-static, self-timed) 94 5.2.2 Thread-style code generation (static-assignment, fully-dynamic) 98 5.3 Code generation for system-level dynamism 101 5.4 Experimental results 105 Chapter 6 Conclusion and Future Work 107 Bibliography 109 ์ดˆ๋ก 125Docto

    Self-adaptivity of applications on network on chip multiprocessors: the case of fault-tolerant Kahn process networks

    Get PDF
    Technology scaling accompanied with higher operating frequencies and the ability to integrate more functionality in the same chip has been the driving force behind delivering higher performance computing systems at lower costs. Embedded computing systems, which have been riding the same wave of success, have evolved into complex architectures encompassing a high number of cores interconnected by an on-chip network (usually identified as Multiprocessor System-on-Chip). However these trends are hindered by issues that arise as technology scaling continues towards deep submicron scales. Firstly, growing complexity of these systems and the variability introduced by process technologies make it ever harder to perform a thorough optimization of the system at design time. Secondly, designers are faced with a reliability wall that emerges as age-related degradation reduces the lifetime of transistors, and as the probability of defects escaping post-manufacturing testing is increased. In this thesis, we take on these challenges within the context of streaming applications running in network-on-chip based parallel (not necessarily homogeneous) systems-on-chip that adopt the no-remote memory access model. In particular, this thesis tackles two main problems: (1) fault-aware online task remapping, (2) application-level self-adaptation for quality management. For the former, by viewing fault tolerance as a self-adaptation aspect, we adopt a cross-layer approach that aims at graceful performance degradation by addressing permanent faults in processing elements mostly at system-level, in particular by exploiting redundancy available in multi-core platforms. We propose an optimal solution based on an integer linear programming formulation (suitable for design time adoption) as well as heuristic-based solutions to be used at run-time. We assess the impact of our approach on the lifetime reliability. We propose two recovery schemes based on a checkpoint-and-rollback and a rollforward technique. For the latter, we propose two variants of a monitor-controller- adapter loop that adapts application-level parameters to meet performance goals. We demonstrate not only that fault tolerance and self-adaptivity can be achieved in embedded platforms, but also that it can be done without incurring large overheads. In addressing these problems, we present techniques which have been realized (depending on their characteristics) in the form of a design tool, a run-time library or a hardware core to be added to the basic architecture

    Integrated support for Adaptivity and Fault-tolerance in MPSoCs

    Get PDF
    The technology improvement and the adoption of more and more complex applications in consumer electronics are forcing a rapid increase in the complexity of multiprocessor systems on chip (MPSoCs). Following this trend, MPSoCs are becoming increasingly dynamic and adaptive, for several reasons. One of these is that applications are getting intrinsically dynamic. Another reason is that the workload on emerging MPSoCs cannot be predicted because modern systems are open to new incoming applications at run-time. A third reason which calls for adaptivity is the decreasing component reliability associated with technology scaling. Components below the 32-nm node are more inclined to temporal or even permanent faults. In case of a malfunctioning system component, the rest of the system is supposed to take over its tasks. Thus, the system adaptivity goal shall influence several de- sign decisions, that have been listed below: 1) The applications should be specified such that system adaptivity can be easily supported. To this end, we consider Polyhedral Process Networks (PPNs) as model of computation to specify applications. PPNs are composed by concurrent and autonomous processes that communicate between each other using bounded FIFO channels. Moreover, in PPNs the control is completely distributed, as well as the memories. This represents a good match with the emerging MPSoC architectures, in which processing elements and memories are usually distributed. Most importantly, the simple operational semantics of PPNs allows for an easy adoption of system adaptivity mechanisms. 2) The hardware platform should guarantee the flexibility that adaptivity mechanisms require. Networks-on-Chip (NoCs) are emerging communication infrastructures for MPSoCs that, among many other advantages, allow for system adaptivity. This is because NoCs are generic, since the same platformcan be used to run different applications, or to run the same application with different mapping of processes. However, there is a mismatch between the generic structure of the NoCs and the semantics of the PPN model. Therefore, in this thesis we investigate and propose several communication approaches to overcome this mismatch. 3) The system must be able to change the process mapping at run-time, using process migration. To this end, a process migration mechanism has been proposed and evaluated. This mechanism takes into account specific requirements of the embedded domain such as predictability and efficiency. To face the problem of graceful degradation of the system, we enriched the MADNESS NoC platform by adding fault tolerance support at both software and hardware level. The proposed process migration mechanism can be exploited to cope with permanent faults by migrating the processes running on the faulty processing element. A fast heuristic is used to determine the new mapping of the processes to tiles. The experimental results prove that the overhead in terms of execution time, due to the execution time of the remapping heuristic, together with the actual process migration, is almost negligible compared to the execution time of the whole application. This means that the proposed approach allows the system to change its performance metrics and to react to faults without a substantial impact on the user experience

    Integrated support for Adaptivity and Fault-tolerance in MPSoCs

    Get PDF
    The technology improvement and the adoption of more and more complex applications in consumer electronics are forcing a rapid increase in the complexity of multiprocessor systems on chip (MPSoCs). Following this trend, MPSoCs are becoming increasingly dynamic and adaptive, for several reasons. One of these is that applications are getting intrinsically dynamic. Another reason is that the workload on emerging MPSoCs cannot be predicted because modern systems are open to new incoming applications at run-time. A third reason which calls for adaptivity is the decreasing component reliability associated with technology scaling. Components below the 32-nm node are more inclined to temporal or even permanent faults. In case of a malfunctioning system component, the rest of the system is supposed to take over its tasks. Thus, the system adaptivity goal shall influence several de- sign decisions, that have been listed below: 1) The applications should be specified such that system adaptivity can be easily supported. To this end, we consider Polyhedral Process Networks (PPNs) as model of computation to specify applications. PPNs are composed by concurrent and autonomous processes that communicate between each other using bounded FIFO channels. Moreover, in PPNs the control is completely distributed, as well as the memories. This represents a good match with the emerging MPSoC architectures, in which processing elements and memories are usually distributed. Most importantly, the simple operational semantics of PPNs allows for an easy adoption of system adaptivity mechanisms. 2) The hardware platform should guarantee the flexibility that adaptivity mechanisms require. Networks-on-Chip (NoCs) are emerging communication infrastructures for MPSoCs that, among many other advantages, allow for system adaptivity. This is because NoCs are generic, since the same platformcan be used to run different applications, or to run the same application with different mapping of processes. However, there is a mismatch between the generic structure of the NoCs and the semantics of the PPN model. Therefore, in this thesis we investigate and propose several communication approaches to overcome this mismatch. 3) The system must be able to change the process mapping at run-time, using process migration. To this end, a process migration mechanism has been proposed and evaluated. This mechanism takes into account specific requirements of the embedded domain such as predictability and efficiency. To face the problem of graceful degradation of the system, we enriched the MADNESS NoC platform by adding fault tolerance support at both software and hardware level. The proposed process migration mechanism can be exploited to cope with permanent faults by migrating the processes running on the faulty processing element. A fast heuristic is used to determine the new mapping of the processes to tiles. The experimental results prove that the overhead in terms of execution time, due to the execution time of the remapping heuristic, together with the actual process migration, is almost negligible compared to the execution time of the whole application. This means that the proposed approach allows the system to change its performance metrics and to react to faults without a substantial impact on the user experience

    Classification of Dataflow Actors with Satisfiability and Abstract Interpretation

    No full text
    International audienceDataflow programming has been used to describe signal processing applications for many years, traditionally with cyclo-static dataflow (CSDF) or synchronous dataflow (SDF) models that restrict expressive power in favor of compile-time analysis and predictability. More recently, dynamic dataflow is being used for the description of multimedia video standards as promoted by the RVC standard (ISO/IEC 23001:4). Dynamic dataflow is not restricted with respect to expressive power, but it does require runtime scheduling in the general case, which may be costly to perform on software. The authors presented in a previous paper a method to automatically classify actors of a dynamic dataflow program within more restrictive dataflow models when possible, along with a method to transform the actors classified as static to improve execution speed by reducing the number of FIFO accesses (Wipliez & Raulet, 2010). This paper presents an extension of the classification method using satisfiability solving, and details the precise semantics used for the abstract interpretation of actors. The extended classification is able to classify more actors than what could previously be achieved

    Parallelizing dynamic sequential programs using polyhedral process networks

    Get PDF
    The Polyhedral Process Network (PPN) is a suitable parallel model of computation (MoC) used to specify embedded streaming applications in a parallel form facilitating the efficient mapping onto embedded parallel execution platforms. Unfortunately, specifying an application using a parallel MoC is a very difficult and highly error-prone task. To overcome the associated difficulties, we have developed the pn compiler, which derives PPN specifications from sequential static affine nested loop programs (SANLPs). However, there are many applications that have adaptive and dynamic behavior which cannot be expressed as SANLPs. In order to handle such dynamic applications, in this dissertation we address an important question: whether some of the static restrictions of the SANLPs can be relaxed while keeping the ability to perform compile-time analysis and to derive PPNs in an automated way. Achieving this will significantly extend the range of applications that can be parallelized in an automated way. By studying different dynamic applications we distinguished three relaxations to SANLP programs that would allow one to specify dynamic applications as sequential programs. These relaxations allow dynamic if-conditions, for-loops with dynamic bounds and while-loops in a program. The first relaxation has already been considered. In this dissertation, we consider the other two more difficult relaxations.UBL - phd migration 201

    Synchronization of tasks in multiprocessor systems-on-chip

    Get PDF
    Tese de mestrado integrado. Engenharia Electrotรฉcnica e de Computadores. Faculdade de Engenharia. Universidade do Porto. 201
    corecore