1,414 research outputs found

    Scheduling Optimisations for SPIN to Minimise Buffer Requirements in Synchronous Data Flow

    Get PDF
    Synchronous Data flow (SDF) graphs have a simple and elegant semantics (essentially linear algebra) which makes SDF graphs eminently suitable as a vehicle for studying scheduling optimisations. We extend related work on using SPIN to experiment with scheduling optimisations aimed at minimising buffer requirements.We show that for a benchmark of commonly used case studies the performance of our SPIN based scheduler is comparable to that of state of the art research tools. The key to success is using the semantics of SDF to prove when using (even unsound and/or incomplete) optimisations are justified. The main benefit of our approach lies in gaining deep insight in the optimisations at relatively low cost

    ์ด์ข… ๋ฉ€ํ‹ฐ ์ฝ”์–ด ํ”„๋กœ์„ธ์„œ์—์„œ SDF/L ๊ทธ๋ž˜ํ”„ ์Šค์ผ€์ค„๋ง ๊ธฐ๋ฒ•

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(์„์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2021.8. Ha Soonhoi.Although dataflow models are known to thrive at exploiting task-level parallelism of an application, it is difficult to exploit the parallelism of data. Data-level parallelism can be represented well with loop structures, but these structures are not explicitly specified in most existing dataflow models. SDF/L model was introduced to overcome this shortcoming by specifying the loop structures explicitly in a hierarchical fashion. To the best of our knowledge however, scheduling of SDF/L graph onto heterogeneous processors has not been considered in any previous work. In this dissertation, we introduce a scheduling technique of an application represented by the SDF/L model onto heterogeneous processors. In the proposed method, we explore the mapping of tasks using an evolutionary meta-heuristic and schedule hierarchically in a bottom-up fashion, creating parallel loop schedules at lower levels first and then re-using them when constructing the schedule at a higher level. To verify the efficiency of the proposed scheduling methodology, we apply it to benchmark examples and randomly generated SDF/L graphs.๋ฐ์ดํ„ฐํ”Œ๋กœ์šฐ ๋ชจ๋ธ์€ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์˜ ํƒœ์Šคํฌ๋ฅผ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌํ•  ๋•Œ ์ข‹์€ ๋ชจ๋ธ๋กœ ์•Œ๋ ค์ ธ ์žˆ์ง€๋งŒ ๋ฐ์ดํ„ฐ๋ฅผ ๋ณ‘๋ ฌ๋กœ ์ฒ˜๋ฆฌํ•˜๋Š” ๋ฐ์— ํ™œ์šฉํ•˜๊ธฐ๋Š” ์–ด๋ ต๋‹ค. ๋ฐ์ดํ„ฐ ์ˆ˜์ค€ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ๋Š” ๋ฃจํ”„ ๊ตฌ์กฐ๋ฅผ ํ†ตํ•ด ํ‘œํ˜„๋  ์ˆ˜ ์žˆ์œผ๋‚˜ ๊ธฐ์กด ๋ฐ์ดํ„ฐํ”Œ๋กœ์šฐ ๋ชจ๋ธ์—์„œ ๋ช…์‹œ์ ์œผ๋กœ ๋ฃจํ”„ ๊ตฌ์กฐ๋Š” ๋ช…์„ธํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ์—†์—ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋‹จ์ ์„ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•ด ๊ณ„์ธต์  ๊ตฌ์กฐ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋ฃจํ”„ ๊ตฌ์กฐ๋ฅผ ๋ช…์‹œ์ ์œผ๋กœ ๋ช…์„ธํ•  ์ˆ˜ ์žˆ๋Š” SDF/L ๋ชจ๋ธ์ด ์ œ์•ˆ๋˜์—ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ด๊ธฐ์ข… ํ”„๋กœ์„ธ์„œ์— ๋Œ€ํ•œ SDF/L ๊ทธ๋ž˜ํ”„์˜ ์Šค์ผ€์ค„๋ง์€ ์ด์ „๊นŒ์ง€ ๊ณ ๋ ค๋˜์ง€ ์•Š์€ ๊ฒƒ์œผ๋กœ ํŒŒ์•…๋œ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” SDF/L ๋ชจ๋ธ๋กœ ํ‘œํ˜„๋˜๋Š” ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์„ ์ด๊ธฐ์ข… ํ”„๋กœ์„ธ์„œ์— ๋Œ€ํ•˜์—ฌ ์Šค์ผ€์ค„๋งํ•˜๋Š” ๊ธฐ๋ฒ•์„ ์†Œ๊ฐœํ•œ๋‹ค. ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์—์„œ๋Š” ๋จผ์ € ์ง„ํ™”์  ๋ฉ”ํƒ€ ํœด๋ฆฌ์Šคํ‹ฑ์„ ์‚ฌ์šฉํ•˜์—ฌ ํƒœ์Šคํฌ ๋งคํ•‘์„ ํƒ์ƒ‰ํ•œ๋‹ค. ์ดํ›„ ํ•˜์œ„ ์ˆ˜์ค€์—์„œ ๋ณ‘๋ ฌ ๋ฃจํ”„ ์Šค์ผ€์ค„์„ ๋งŒ๋“  ๋‹ค์Œ ์ƒ์œ„ ์ˆ˜์ค€์—์„œ ์Šค์ผ€์ค„ ๊ตฌ์„ฑํ•  ๋•Œ ์žฌ์‚ฌ์šฉํ•˜๋Š” ์ƒํ–ฅ์‹์˜ ๊ณ„์ธต์  ํƒœ์Šคํฌ ์Šค์ผ€์ค„๋ง์„ ์ˆ˜ํ–‰ํ•œ๋‹ค. ์ œ์•ˆํ•˜๋Š” ์Šค์ผ€์ค„๋ง ๊ธฐ๋ฒ•์˜ ํšจ์œจ์„ฑ์„ ๊ฒ€์ฆํ•˜๊ธฐ ์œ„ํ•ด ๋ฒค์น˜๋งˆํฌ ์˜ˆ์ œ์™€ ๋ฌด์ž‘์œ„๋กœ ์ƒ์„ฑ๋œ SDF/L ๊ทธ๋ž˜ํ”„์— ๊ธฐ๋ฒ•์„ ์ ์šฉํ•˜์˜€๋‹ค.Chapter 1 Introduction 1 Chapter 2 Related Work 6 2.1 SDF Scheduling with Data-level Parallelism 8 2.2 Hierarchical Scheduling 9 Chapter 3 Problem and Challenges 11 3.1 Notations and Problem Description 11 3.2 Challenges 12 Chapter 4 Proposed methodology 15 4.1 Mapping Exploration 15 4.2 Priority Assignment and List Scheduling Heuristic 17 4.3 Hierarchical Scheduling 18 4.4 Complexity 23 Chapter 5 Experiments 24 5.1 Benchmarks 25 5.2 Randomly Generated Graphs 30 Chapter 6 Conclusions 35 Bibliography 37 ์š” ์•ฝ 41์„

    Heterogeneity-aware scheduling and data partitioning for system performance acceleration

    Get PDF
    Over the past decade, heterogeneous processors and accelerators have become increasingly prevalent in modern computing systems. Compared with previous homogeneous parallel machines, the hardware heterogeneity in modern systems provides new opportunities and challenges for performance acceleration. Classic operating systems optimisation problems such as task scheduling, and application-specific optimisation techniques such as the adaptive data partitioning of parallel algorithms, are both required to work together to address hardware heterogeneity. Significant effort has been invested in this problem, but either focuses on a specific type of heterogeneous systems or algorithm, or a high-level framework without insight into the difference in heterogeneity between different types of system. A general software framework is required, which can not only be adapted to multiple types of systems and workloads, but is also equipped with the techniques to address a variety of hardware heterogeneity. This thesis presents approaches to design general heterogeneity-aware software frameworks for system performance acceleration. It covers a wide variety of systems, including an OS scheduler targeting on-chip asymmetric multi-core processors (AMPs) on mobile devices, a hierarchical many-core supercomputer and multi-FPGA systems for high performance computing (HPC) centers. Considering heterogeneity from on-chip AMPs, such as thread criticality, core sensitivity, and relative fairness, it suggests a collaborative based approach to co-design the task selector and core allocator on OS scheduler. Considering the typical sources of heterogeneity in HPC systems, such as the memory hierarchy, bandwidth limitations and asymmetric physical connection, it proposes an application-specific automatic data partitioning method for a modern supercomputer, and a topological-ranking heuristic based schedule for a multi-FPGA based reconfigurable cluster. Experiments on both a full system simulator (GEM5) and real systems (Sunway Taihulight Supercomputer and Xilinx Multi-FPGA based clusters) demonstrate the significant advantages of the suggested approaches compared against the state-of-the-art on variety of workloads."This work is supported by St Leonards 7th Century Scholarship and Computer Science PhD funding from University of St Andrews; by UK EPSRC grant Discovery: Pattern Discovery and Program Shaping for Manycore Systems (EP/P020631/1)." -- Acknowledgement
    • โ€ฆ
    corecore