123 research outputs found

    Heterogeneity-aware scheduling and data partitioning for system performance acceleration

    Get PDF
    Over the past decade, heterogeneous processors and accelerators have become increasingly prevalent in modern computing systems. Compared with previous homogeneous parallel machines, the hardware heterogeneity in modern systems provides new opportunities and challenges for performance acceleration. Classic operating systems optimisation problems such as task scheduling, and application-specific optimisation techniques such as the adaptive data partitioning of parallel algorithms, are both required to work together to address hardware heterogeneity. Significant effort has been invested in this problem, but either focuses on a specific type of heterogeneous systems or algorithm, or a high-level framework without insight into the difference in heterogeneity between different types of system. A general software framework is required, which can not only be adapted to multiple types of systems and workloads, but is also equipped with the techniques to address a variety of hardware heterogeneity. This thesis presents approaches to design general heterogeneity-aware software frameworks for system performance acceleration. It covers a wide variety of systems, including an OS scheduler targeting on-chip asymmetric multi-core processors (AMPs) on mobile devices, a hierarchical many-core supercomputer and multi-FPGA systems for high performance computing (HPC) centers. Considering heterogeneity from on-chip AMPs, such as thread criticality, core sensitivity, and relative fairness, it suggests a collaborative based approach to co-design the task selector and core allocator on OS scheduler. Considering the typical sources of heterogeneity in HPC systems, such as the memory hierarchy, bandwidth limitations and asymmetric physical connection, it proposes an application-specific automatic data partitioning method for a modern supercomputer, and a topological-ranking heuristic based schedule for a multi-FPGA based reconfigurable cluster. Experiments on both a full system simulator (GEM5) and real systems (Sunway Taihulight Supercomputer and Xilinx Multi-FPGA based clusters) demonstrate the significant advantages of the suggested approaches compared against the state-of-the-art on variety of workloads."This work is supported by St Leonards 7th Century Scholarship and Computer Science PhD funding from University of St Andrews; by UK EPSRC grant Discovery: Pattern Discovery and Program Shaping for Manycore Systems (EP/P020631/1)." -- Acknowledgement

    Adaptive Dispatching of Tasks in the Cloud

    Full text link
    The increasingly wide application of Cloud Computing enables the consolidation of tens of thousands of applications in shared infrastructures. Thus, meeting the quality of service requirements of so many diverse applications in such shared resource environments has become a real challenge, especially since the characteristics and workload of applications differ widely and may change over time. This paper presents an experimental system that can exploit a variety of online quality of service aware adaptive task allocation schemes, and three such schemes are designed and compared. These are a measurement driven algorithm that uses reinforcement learning, secondly a "sensible" allocation algorithm that assigns jobs to sub-systems that are observed to provide a lower response time, and then an algorithm that splits the job arrival stream into sub-streams at rates computed from the hosts' processing capabilities. All of these schemes are compared via measurements among themselves and with a simple round-robin scheduler, on two experimental test-beds with homogeneous and heterogeneous hosts having different processing capacities.Comment: 10 pages, 9 figure

    ํ˜‘์—… ๋กœ๋ด‡์„ ์œ„ํ•œ ์„œ๋น„์Šค ๊ธฐ๋ฐ˜๊ณผ ๋ชจ๋ธ ๊ธฐ๋ฐ˜์˜ ์†Œํ”„ํŠธ์›จ์–ด ๊ฐœ๋ฐœ ๋ฐฉ๋ฒ•๋ก 

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ)--์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› :๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€,2020. 2. ํ•˜์ˆœํšŒ.๊ฐ€๊นŒ์šด ๋ฏธ๋ž˜์—๋Š” ๋‹ค์–‘ํ•œ ๋กœ๋ด‡์ด ๋‹ค์–‘ํ•œ ๋ถ„์•ผ์—์„œ ํ•˜๋‚˜์˜ ์ž„๋ฌด๋ฅผ ํ˜‘๋ ฅํ•˜์—ฌ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ชจ์Šต์€ ํ”ํžˆ ๋ณผ ์ˆ˜ ์žˆ๊ฒŒ ๋  ๊ฒƒ์ด๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์‹ค์ œ๋กœ ์ด๋Ÿฌํ•œ ๋ชจ์Šต์ด ์‹คํ˜„๋˜๊ธฐ์—๋Š” ๋‘ ๊ฐ€์ง€์˜ ์–ด๋ ค์›€์ด ์žˆ๋‹ค. ๋จผ์ € ๋กœ๋ด‡์„ ์šด์šฉํ•˜๊ธฐ ์œ„ํ•œ ์†Œํ”„ํŠธ์›จ์–ด๋ฅผ ๋ช…์„ธํ•˜๋Š” ๊ธฐ์กด ์—ฐ๊ตฌ๋“ค์€ ๋Œ€๋ถ€๋ถ„ ๊ฐœ๋ฐœ์ž๊ฐ€ ๋กœ๋ด‡์˜ ํ•˜๋“œ์›จ์–ด์™€ ์†Œํ”„ํŠธ์›จ์–ด์— ๋Œ€ํ•œ ์ง€์‹์„ ์•Œ๊ณ  ์žˆ๋Š” ๊ฒƒ์„ ๊ฐ€์ •ํ•˜๊ณ  ์žˆ๋‹ค. ๊ทธ๋ž˜์„œ ๋กœ๋ด‡์ด๋‚˜ ์ปดํ“จํ„ฐ์— ๋Œ€ํ•œ ์ง€์‹์ด ์—†๋Š” ์‚ฌ์šฉ์ž๋“ค์ด ์—ฌ๋Ÿฌ ๋Œ€์˜ ๋กœ๋ด‡์ด ํ˜‘๋ ฅํ•˜๋Š” ์‹œ๋‚˜๋ฆฌ์˜ค๋ฅผ ์ž‘์„ฑํ•˜๊ธฐ๋Š” ์‰ฝ์ง€ ์•Š๋‹ค. ๋˜ํ•œ, ๋กœ๋ด‡์˜ ์†Œํ”„ํŠธ์›จ์–ด๋ฅผ ๊ฐœ๋ฐœํ•  ๋•Œ ๋กœ๋ด‡์˜ ํ•˜๋“œ์›จ์–ด์˜ ํŠน์„ฑ๊ณผ ๊ด€๋ จ์ด ๊นŠ์–ด์„œ, ๋‹ค์–‘ํ•œ ๋กœ๋ด‡์˜ ์†Œํ”„ํŠธ์›จ์–ด๋ฅผ ๊ฐœ๋ฐœํ•˜๋Š” ๊ฒƒ๋„ ๊ฐ„๋‹จํ•˜์ง€ ์•Š๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ƒ์œ„ ์ˆ˜์ค€์˜ ๋ฏธ์…˜ ๋ช…์„ธ์™€ ๋กœ๋ด‡์˜ ํ–‰์œ„ ํ”„๋กœ๊ทธ๋ž˜๋ฐ์œผ๋กœ ๋‚˜๋ˆ„์–ด ์ƒˆ๋กœ์šด ์†Œํ”„ํŠธ์›จ์–ด ๊ฐœ๋ฐœ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ๋˜ํ•œ, ๋ณธ ํ”„๋ ˆ์ž„์›Œํฌ๋Š” ํฌ๊ธฐ๊ฐ€ ์ž‘์€ ๋กœ๋ด‡๋ถ€ํ„ฐ ๊ณ„์‚ฐ ๋Šฅ๋ ฅ์ด ์ถฉ๋ถ„ํ•œ ๋กœ๋ด‡๋“ค์ด ์„œ๋กœ ๊ตฐ์ง‘์„ ์ด๋ฃจ์–ด ๋ฏธ์…˜์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋„๋ก ์ง€์›ํ•œ๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ๋กœ๋ด‡์˜ ํ•˜๋“œ์›จ์–ด๋‚˜ ์†Œํ”„ํŠธ์›จ์–ด์— ๋Œ€ํ•œ ์ง€์‹์ด ๋ถ€์กฑํ•œ ์‚ฌ์šฉ์ž๋„ ๋กœ๋ด‡์˜ ๋™์ž‘์„ ์ƒ์œ„ ์ˆ˜์ค€์—์„œ ๋ช…์„ธํ•  ์ˆ˜ ์žˆ๋Š” ์Šคํฌ๋ฆฝํŠธ ์–ธ์–ด๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ์ œ์•ˆํ•˜๋Š” ์–ธ์–ด๋Š” ๊ธฐ์กด์˜ ์Šคํฌ๋ฆฝํŠธ ์–ธ์–ด์—์„œ๋Š” ์ง€์›ํ•˜์ง€ ์•Š๋Š” ๋„ค ๊ฐ€์ง€์˜ ๊ธฐ๋Šฅ์ธ ํŒ€์˜ ๊ตฌ์„ฑ, ๊ฐ ํŒ€์˜ ์„œ๋น„์Šค ๊ธฐ๋ฐ˜ ํ”„๋กœ๊ทธ๋ž˜๋ฐ, ๋™์ ์œผ๋กœ ๋ชจ๋“œ ๋ณ€๊ฒฝ, ๋‹ค์ค‘ ์ž‘์—…(๋ฉ€ํ‹ฐ ํƒœ์Šคํ‚น)์„ ์ง€์›ํ•œ๋‹ค. ์šฐ์„  ๋กœ๋ด‡์€ ํŒ€์œผ๋กœ ๊ทธ๋ฃน ์ง€์„ ์ˆ˜ ์žˆ๊ณ , ๋กœ๋ด‡์ด ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋Š” ๊ธฐ๋Šฅ์„ ์„œ๋น„์Šค ๋‹จ์œ„๋กœ ์ถ”์ƒํ™”ํ•˜์—ฌ ์ƒˆ๋กœ์šด ๋ณตํ•ฉ ์„œ๋น„์Šค๋ฅผ ๋ช…์„ธํ•  ์ˆ˜ ์žˆ๋‹ค. ๋˜ํ•œ ๋กœ๋ด‡์˜ ๋ฉ€ํ‹ฐ ํƒœ์Šคํ‚น์„ ์œ„ํ•ด 'ํ”Œ๋žœ' ์ด๋ผ๋Š” ๊ฐœ๋…์„ ๋„์ž…ํ•˜์˜€๊ณ , ๋ณตํ•ฉ ์„œ๋น„์Šค ๋‚ด์—์„œ ์ด๋ฒคํŠธ๋ฅผ ๋ฐœ์ƒ์‹œ์ผœ์„œ ๋™์ ์œผ๋กœ ๋ชจ๋“œ๊ฐ€ ๋ณ€ํ™˜ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜์˜€๋‹ค. ๋‚˜์•„๊ฐ€ ์—ฌ๋Ÿฌ ๋กœ๋ด‡์˜ ํ˜‘๋ ฅ์ด ๋”์šฑ ๊ฒฌ๊ณ ํ•˜๊ณ , ์œ ์—ฐํ•˜๊ณ , ํ™•์žฅ์„ฑ์„ ๋†’์ด๊ธฐ ์œ„ํ•ด, ๊ตฐ์ง‘ ๋กœ๋ด‡์„ ์šด์šฉํ•  ๋•Œ ๋กœ๋ด‡์ด ์ž„๋ฌด๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” ๋„์ค‘์— ๋ฌธ์ œ๊ฐ€ ์ƒ๊ธธ ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ƒํ™ฉ์— ๋”ฐ๋ผ ๋กœ๋ด‡์„ ๋™์ ์œผ๋กœ ๋‹ค๋ฅธ ํ–‰์œ„๋ฅผ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•œ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ๋™์ ์œผ๋กœ๋„ ํŒ€์„ ๊ตฌ์„ฑํ•  ์ˆ˜ ์žˆ๊ณ , ์—ฌ๋Ÿฌ ๋Œ€์˜ ๋กœ๋ด‡์ด ํ•˜๋‚˜์˜ ์„œ๋น„์Šค๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” ๊ทธ๋ฃน ์„œ๋น„์Šค๋ฅผ ์ง€์›ํ•˜๊ณ , ์ผ๋Œ€ ๋‹ค ํ†ต์‹ ๊ณผ ๊ฐ™์€ ์ƒˆ๋กœ์šด ๊ธฐ๋Šฅ์„ ์Šคํฌ๋ฆฝํŠธ ์–ธ์–ด์— ๋ฐ˜์˜ํ•˜์˜€๋‹ค. ๋”ฐ๋ผ์„œ ํ™•์žฅ๋œ ์ƒ์œ„ ์ˆ˜์ค€์˜ ์Šคํฌ๋ฆฝํŠธ ์–ธ์–ด๋Š” ๋น„์ „๋ฌธ๊ฐ€๋„ ๋‹ค์–‘ํ•œ ์œ ํ˜•์˜ ํ˜‘๋ ฅ ์ž„๋ฌด๋ฅผ ์‰ฝ๊ฒŒ ๋ช…์„ธํ•  ์ˆ˜ ์žˆ๋‹ค. ๋กœ๋ด‡์˜ ํ–‰์œ„๋ฅผ ํ”„๋กœ๊ทธ๋ž˜๋ฐํ•˜๊ธฐ ์œ„ํ•ด ๋‹ค์–‘ํ•œ ์†Œํ”„ํŠธ์›จ์–ด ๊ฐœ๋ฐœ ํ”„๋ ˆ์ž„์›Œํฌ๊ฐ€ ์—ฐ๊ตฌ๋˜๊ณ  ์žˆ๋‹ค. ํŠนํžˆ ์žฌ์‚ฌ์šฉ์„ฑ๊ณผ ํ™•์žฅ์„ฑ์„ ์ค‘์ ์œผ๋กœ ๋‘” ์—ฐ๊ตฌ๋“ค์ด ์ตœ๊ทผ ๋งŽ์ด ์‚ฌ์šฉ๋˜๊ณ  ์žˆ์ง€๋งŒ, ๋Œ€๋ถ€๋ถ„์˜ ์ด๋“ค ์—ฐ๊ตฌ๋Š” ๋ฆฌ๋ˆ…์Šค ์šด์˜์ฒด์ œ์™€ ๊ฐ™์ด ๋งŽ์€ ํ•˜๋“œ์›จ์–ด ์ž์›์„ ํ•„์š”๋กœ ํ•˜๋Š” ์šด์˜์ฒด์ œ๋ฅผ ๊ฐ€์ •ํ•˜๊ณ  ์žˆ๋‹ค. ๋˜ํ•œ, ํ”„๋กœ๊ทธ๋žจ์˜ ๋ถ„์„ ๋ฐ ์„ฑ๋Šฅ ์˜ˆ์ธก ๋“ฑ์„ ๊ณ ๋ คํ•˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์—, ์ž์› ์ œ์•ฝ์ด ์‹ฌํ•œ ํฌ๊ธฐ๊ฐ€ ์ž‘์€ ๋กœ๋ด‡์˜ ์†Œํ”„ํŠธ์›จ์–ด๋ฅผ ๊ฐœ๋ฐœํ•˜๊ธฐ์—๋Š” ์–ด๋ ต๋‹ค. ๊ทธ๋ž˜์„œ ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ์ž„๋ฒ ๋””๋“œ ์†Œํ”„ํŠธ์›จ์–ด๋ฅผ ์„ค๊ณ„ํ•  ๋•Œ ์“ฐ์ด๋Š” ์ •ํ˜•์ ์ธ ๋ชจ๋ธ์„ ์ด์šฉํ•œ๋‹ค. ์ด ๋ชจ๋ธ์€ ์ •์  ๋ถ„์„๊ณผ ์„ฑ๋Šฅ ์˜ˆ์ธก์ด ๊ฐ€๋Šฅํ•˜์ง€๋งŒ, ๋กœ๋ด‡์˜ ํ–‰์œ„๋ฅผ ํ‘œํ˜„ํ•˜๊ธฐ์—๋Š” ์ œ์•ฝ์ด ์žˆ๋‹ค. ๊ทธ๋ž˜์„œ ๋ณธ ๋…ผ๋ฌธ์—์„œ ์™ธ๋ถ€์˜ ์ด๋ฒคํŠธ์— ์˜ํ•ด ์ˆ˜ํ–‰ ์ค‘๊ฐ„์— ํ–‰์œ„๋ฅผ ๋ณ€๊ฒฝํ•˜๋Š” ๋กœ๋ด‡์„ ์œ„ํ•ด ์œ ํ•œ ์ƒํƒœ ๋จธ์‹  ๋ชจ๋ธ๊ณผ ๋ฐ์ดํ„ฐ ํ”Œ๋กœ์šฐ ๋ชจ๋ธ์ด ๊ฒฐํ•ฉํ•˜์—ฌ ๋™์  ํ–‰์œ„๋ฅผ ๋ช…์„ธํ•  ์ˆ˜ ์žˆ๋Š” ํ™•์žฅ๋œ ๋ชจ๋ธ์„ ์ ์šฉํ•œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๋”ฅ๋Ÿฌ๋‹๊ณผ ๊ฐ™์ด ๊ณ„์‚ฐ๋Ÿ‰์„ ๋งŽ์ด ํ•„์š”๋กœ ํ•˜๋Š” ์‘์šฉ์„ ๋ถ„์„ํ•˜๊ธฐ ์œ„ํ•ด, ๋ฃจํ”„ ๊ตฌ์กฐ๋ฅผ ๋ช…์‹œ์ ์œผ๋กœ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๋Š” ๋ชจ๋ธ์„ ์ œ์•ˆํ•œ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ ์—ฌ๋Ÿฌ ๋กœ๋ด‡์˜ ํ˜‘์—… ์šด์šฉ์„ ์œ„ํ•ด ๋กœ๋ด‡ ์‚ฌ์ด์— ๊ณต์œ ๋˜๋Š” ์ •๋ณด๋ฅผ ๋‚˜ํƒ€๋‚ด๊ธฐ ์œ„ํ•ด ๋‘ ๊ฐ€์ง€ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•œ๋‹ค. ๋จผ์ € ์ค‘์•™์—์„œ ๊ณต์œ  ์ •๋ณด๋ฅผ ๊ด€๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ํƒœ์Šคํฌ๋ผ๋Š” ํŠน๋ณ„ํ•œ ํƒœ์Šคํฌ๋ฅผ ํ†ตํ•ด ๊ณต์œ  ์ •๋ณด๋ฅผ ๋‚˜ํƒ€๋‚ธ๋‹ค. ๋˜ํ•œ, ๋กœ๋ด‡์ด ์ž์‹ ์˜ ์ •๋ณด๋ฅผ ๊ฐ€๊นŒ์šด ๋กœ๋ด‡๋“ค๊ณผ ๊ณต์œ ํ•˜๊ธฐ ์œ„ํ•ด ๋ฉ€ํ‹ฐ์บ์ŠคํŒ…์„ ์œ„ํ•œ ์ƒˆ๋กœ์šด ํฌํŠธ๋ฅผ ์ถ”๊ฐ€ํ•œ๋‹ค. ์ด๋ ‡๊ฒŒ ํ™•์žฅ๋œ ์ •ํ˜•์ ์ธ ๋ชจ๋ธ์€ ์‹ค์ œ ๋กœ๋ด‡ ์ฝ”๋“œ๋กœ ์ž๋™ ์ƒ์„ฑ๋˜์–ด, ์†Œํ”„ํŠธ์›จ์–ด ์„ค๊ณ„ ์ƒ์‚ฐ์„ฑ ๋ฐ ๊ฐœ๋ฐœ ํšจ์œจ์„ฑ์— ์ด์ ์„ ๊ฐ€์ง„๋‹ค. ๋น„์ „๋ฌธ๊ฐ€๊ฐ€ ๋ช…์„ธํ•œ ์Šคํฌ๋ฆฝํŠธ ์–ธ์–ด๋Š” ์ •ํ˜•์ ์ธ ํƒœ์Šคํฌ ๋ชจ๋ธ๋กœ ๋ณ€ํ™˜ํ•˜๊ธฐ ์œ„ํ•ด ์ค‘๊ฐ„ ๋‹จ๊ณ„์ธ ์ „๋žต ๋‹จ๊ณ„๋ฅผ ์ถ”๊ฐ€ํ•˜์˜€๋‹ค. ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•๋ก ์˜ ํƒ€๋‹น์„ฑ์„ ๊ฒ€์ฆํ•˜๊ธฐ ์œ„ํ•ด, ์‹œ๋ฎฌ๋ ˆ์ด์…˜๊ณผ ์—ฌ๋Ÿฌ ๋Œ€์˜ ์‹ค์ œ ๋กœ๋ด‡์„ ์ด์šฉํ•œ ํ˜‘์—…ํ•˜๋Š” ์‹œ๋‚˜๋ฆฌ์˜ค์— ๋Œ€ํ•ด ์‹คํ—˜์„ ์ง„ํ–‰ํ•˜์˜€๋‹ค.In the near future, it will be common that a variety of robots are cooperating to perform a mission in various fields. There are two software challenges when deploying collaborative robots: how to specify a cooperative mission and how to program each robot to accomplish its mission. In this paper, we propose a novel software development framework that separates mission specification and robot behavior programming, which is called service-oriented and model-based (SeMo) framework. Also, it can support distributed robot systems, swarm robots, and their hybrid. For mission specification, a novel scripting language is proposed with the expression capability. It involves team composition and service-oriented behavior specification of each team, allowing dynamic mode change of operation and multi-tasking. Robots are grouped into teams, and the behavior of each team is defined with a composite service. The internal behavior of a composite service is defined by a sequence of services that the robots will perform. The notion of plan is applied to express multi-tasking. And the robot may have various operating modes, so mode change is triggered by events generated in a composite service. Moreover, to improve the robustness, scalability, and flexibility of robot collaboration, the high-level mission scripting language is extended with new features such as team hierarchy, group service, one-to-many communication. We assume that any robot fails during the execution of scenarios, and the grouping of robots can be made at run-time dynamically. Therefore, the extended mission specification enables a casual user to specify various types of cooperative missions easily. For robot behavior programming, an extended dataflow model is used for task-level behavior specification that does not depend on the robot hardware platform. To specify the dynamic behavior of the robot, we apply an extended task model that supports a hybrid specification of dataflow and finite state machine models. Furthermore, we propose a novel extension to allow the explicit specification of loop structures. This extension helps the compute-intensive application, which contains a lot of loop structures, to specify explicitly and analyze at compile time. Two types of information sharing, global information sharing and local knowledge sharing, are supported for robot collaboration in the dataflow graph. For global information, we use the library task, which supports shared resource management and server-client interaction. On the other hand, to share information locally with near robots, we add another type of port for multicasting and use the knowledge sharing technique. The actual robot code per robot is automatically generated from the associated task graph, which minimizes the human efforts in low-level robot programming and improves the software design productivity significantly. By abstracting the tasks or algorithms as services and adding the strategy description layer in the design flow, the mission specification is refined into task-graph specification automatically. The viability of the proposed methodology is verified with preliminary experiments with three cooperative mission scenarios with heterogeneous robot platforms and robot simulator.Chapter 1. Introduction 1 1.1 Motivation 1 1.2 Contribution 7 1.3 Dissertation Organization 9 Chapter 2. Background and Existing Research 11 2.1 Terminologies 11 2.2 Robot Software Development Frameworks 25 2.3 Parallel Embedded Software Development Framework 31 Chapter 3. Overview of the SeMo Framework 41 3.1 Motivational Examples 45 Chapter 4. Robot Behavior Programming 47 4.1 Related works 48 4.2 Model-based Task Graph Specification for Individual Robots 56 4.3 Model-based Task Graph Specification for Cooperating Robots 70 4.4 Automatic Code Generation 74 4.5 Experiments 78 Chapter 5. High-level Mission Specification 81 5.1 Service-oriented Mission Specification 82 5.2 Strategy Description 93 5.3 Automatic Task Graph Generation 96 5.4 Related works 99 5.5 Experiments 104 Chapter 6. Conclusion 114 6.1 Future Research 116 Bibliography 118 Appendices 133 ์š”์•ฝ 158Docto

    Timing Predictability in Future Multi-Core Avionics Systems

    Full text link

    Overlay virtualized wireless sensor networks for application in industrial internet of things : a review

    Get PDF
    Abstract: In recent times, Wireless Sensor Networks (WSNs) are broadly applied in the Industrial Internet of Things (IIoT) in order to enhance the productivity and efficiency of existing and prospective manufacturing industries. In particular, an area of interest that concerns the use of WSNs in IIoT is the concept of sensor network virtualization and overlay networks. Both network virtualization and overlay networks are considered contemporary because they provide the capacity to create services and applications at the edge of existing virtual networks without changing the underlying infrastructure. This capability makes both network virtualization and overlay network services highly beneficial, particularly for the dynamic needs of IIoT based applications such as in smart industry applications, smart city, and smart home applications. Consequently, the study of both WSN virtualization and overlay networks has become highly patronized in the literature, leading to the growth and maturity of the research area. In line with this growth, this paper provides a review of the development made thus far concerning virtualized sensor networks, with emphasis on the application of overlay networks in IIoT. Principally, the process of virtualization in WSN is discussed along with its importance in IIoT applications. Different challenges in WSN are also presented along with possible solutions given by the use of virtualized WSNs. Further details are also presented concerning the use of overlay networks as the next step to supporting virtualization in shared sensor networks. Our discussion closes with an exposition of the existing challenges in the use of virtualized WSN for IIoT applications. In general, because overlay networks will be contributory to the future development and advancement of smart industrial and smart city applications, this review may be considered by researchers as a reference point for those particularly interested in the study of this growing field

    Towards multiprogrammed GPUs

    Get PDF
    Programmable Graphics Processing Units (GPUs) have recently become the most pervasitheve massively parallel processors. They have come a long way, from fixed function ASICs designed to accelerate graphics tasks to a programmable architecture that can also execute general-purpose computations. Because of their performance and efficiency, an increasing amount of software is relying on them to accelerate data parallel and computationally intensive sections of code. They have earned a place in many systems, from low power mobile devices to the biggest data centers in the world. However, GPUs are still plagued by the fact that they essentially have no multiprogramming support, resulting in low system performance if the GPU is shared among multiple programs. In this dissertation we set to provide the rich GPU multiprogramming support by improving the multitasking capabilities and increasing the virtual memory functionality and performance. The main issue hindering the multitasking support in GPUs is the nonpreemptive execution of GPU kernels. Here we propose two preemption mechanisms with dierent design philosophies, that can be used by a scheduler to preempt execution on GPU cores and make room for some other process. We also argue for the spatial sharing of the GPU and propose a concrete hardware scheduler implementation that dynamically partitions the GPU cores among running kernels, according to their set priorities. Opposing the assumptions made in the related work, we demonstrate that preemptive execution is feasible and the desired approach to GPU multitasking. We further show improved system fairness and responsiveness with our scheduling policy. We also pinpoint that at the core of the insufficient virtual memory support lies the exceptions handling mechanism used by modern GPUs. Currently, GPUs offload the actual exception handling work to the CPU, while the faulting instruction is stalled in the GPU core. This stall-on-fault model prevents some of the virtual memory features and optimizations and is especially harmful in multiprogrammed environments because it prevents context switching the GPU unless all the in-flight faults are resolved. In this disseritation, we propose three GPU core organizations with varying performance-complexity trade-off that get rid of the stall-on-fault execution and enable preemptible exceptions on the GPU (i.e., the faulting instruction can be squashed and restarted later). Building on this support, we implement two use cases and demonstrate their utility. One is a scheme that performs context switch of the faulted threads and tries to find some other useful work to do in the meantime, hiding the latency of the fault and improving the system performance. The other enables the fault handling code to run locally, on the GPU, instead of relying on the CPU offloading and show that the local fault handling can also improve performance.Las Unidades de Procesamiento de Grรกficos Programables (GPU, por sus siglas en inglรฉs) se han convertido recientemente en los procesadores masivamente paralelos mรกs difundidos. Han recorrido un largo camino desde ASICs de funciรณn fija diseรฑados para acelerar tareas grรกficas, hasta una arquitectura programable que tambiรฉn puede ejecutar cรกlculos de propรณsito general. Debido a su rendimiento y eficiencia, una cantidad creciente de software se basa en ellas para acelerar las secciones de cรณdigo computacionalmente intensivas que disponen de paralelismo de datos. Se han ganado un lugar en muchos sistemas, desde dispositivos mรณviles de baja potencia hasta los centros de datos mรกs grandes del mundo. Sin embargo, las GPUs siguen plagadas por el hecho de que esencialmente no tienen soporte de multiprogramaciรณn, lo que resulta en un bajo rendimiento del sistema si la GPU se comparte entre mรบltiples programas. En esta disertaciรณn nos centramos en proporcionar soporte de multiprogramaciรณn para GPUs mediante la mejora de las capacidades de multitarea y del soporte de memoria virtual. El principal problema que dificulta el soporte multitarea en las GPUs es la ejecuciรณn no apropiativa de los nรบcleos de la GPU. Proponemos dos mecanismos de apropiaciรณn con diferentes filosofรญas de diseรฑo, que pueden ser utilizados por un planificador para apropiarse de los nรบcleos de la GPU y asignarlos a otros procesos. Tambiรฉn abogamos por la divisiรณn espacial de la GPU y proponemos una implementaciรณn concreta de un planificador hardware que divide dinรกmicamente los nรบcleos de la GPU entre los kernels en ejecuciรณn, de acuerdo con sus prioridades establecidas. Oponiรฉndose a las suposiciones hechas por otros en trabajos relacionados, demostramos que la ejecuciรณn apropiativa es factible y el enfoque deseado para la multitarea en GPUs. Ademรกs, mostramos una mayor equidad y capacidad de respuesta del sistema con nuestra polรญtica de asignaciรณn de nรบcleos de la GPU. Tambiรฉn seรฑalamos que la causa principal del insuficiente soporte de la memoria virtual en las GPUs es el mecanismo de manejo de excepciones utilizado por las GPUs modernas. En la actualidad, las GPUs descargan el manejo de las excepciones a la CPU, mientras que la instrucciรณn que causo la fallada se encuentra esperando en el nรบcleo de la GPU. Este modelo de bloqueo en fallada impide algunas de las funciones y optimizaciones de la memoria virtual y es especialmente perjudicial en entornos multiprogramados porque evita el cambio de contexto de la GPU a menos que se resuelvan todas las fallas pendientes. En esta disertaciรณn, proponemos tres implementaciones del pipeline de los nรบcleos de la GPU que ofrecen distintos balances de rendimiento-complejidad y permiten la apropiaciรณn del nรบcleo aunque haya excepciones pendientes (es decir, la instrucciรณn que produjo la fallada puede ser reiniciada mรกs tarde). Basรกndonos en esta nueva funcionalidad, implementamos dos casos de uso para demostrar su utilidad. El primero es un planificador que asigna el nรบcleo a otros subprocesos cuando hay una fallada para tratar de hacer trabajo รบtil mientras esta se resuelve, ocultando asรญ la latencia de la fallada y mejorando el rendimiento del sistema. El segundo permite que el cรณdigo de manejo de las falladas se ejecute localmente en la GPU, en lugar de descargar el manejo a la CPU, mostrando que el manejo local de falladas tambiรฉn puede mejorar el rendimiento.Postprint (published version

    Hardware support for memory protection in sensor nodes

    Get PDF
    With reference to the typical hardware configuration of a sensor node, we present the architecture of a memory protection unit (MPU) designed as a low-complexity addition to the microcontroller. The MPU is aimed at supporting memory protection and the privileged execution mode. It is connected to the system buses, and is seen by the processor as a memory-mapped input/output device. The contents of the internal MPU registers specify the composition of the protection contexts of the running program in terms of access rights for the memory pages. The MPU generates a hardware interrupt to the processor when it detects a protection violation. The proposed MPU architecture is evaluated from a number of salient viewpoints, which include the distribution, review and revocation of access permissions, and the support for important memory protection paradigms, including hierarchical contexts and protection rings

    Parallel and Distributed Computing

    Get PDF
    The 14 chapters presented in this book cover a wide variety of representative works ranging from hardware design to application development. Particularly, the topics that are addressed are programmable and reconfigurable devices and systems, dependability of GPUs (General Purpose Units), network topologies, cache coherence protocols, resource allocation, scheduling algorithms, peertopeer networks, largescale network simulation, and parallel routines and algorithms. In this way, the articles included in this book constitute an excellent reference for engineers and researchers who have particular interests in each of these topics in parallel and distributed computing
    • โ€ฆ
    corecore