799 research outputs found

    Mapping and Scheduling of Directed Acyclic Graphs on An FPFA Tile

    Get PDF
    An architecture for a hand-held multimedia device requires components that are energy-efficient, flexible, and provide high performance. In the CHAMELEON [4] project we develop a coarse grained reconfigurable device for DSP-like algorithms, the so-called Field Programmable Function Array (FPFA). The FPFA devices are reminiscent to FPGAs, but with a matrix of Processing Parts (PP) instead of CLBs. The design of the FPFA focuses on: (1) Keeping each PP small to maximize the number of PPs that can fit on a chip; (2) providing sufficient flexibility; (3) Low energy consumption; (4) Exploiting the maximum amount of parallelism; (5) A strong support tool for FPFA-based applications. The challenge in providing compiler support for the FPFA-based design stems from the flexibility of the FPFA structure. If we do not use the characteristics of the FPFA structure properly, the advantages of an FPFA may become its disadvantages. The GECKO1project focuses on this problem. In this paper, we present a mapping and scheduling scheme for applications running on one FPFA tile. Applications are written in C and C code is translated to a Directed Acyclic Graphs (DAG) [4]. This scheme can map a DAG directly onto the reconfigurable PPs of an FPFA tile. It tries to achieve low power consumption by exploiting locality of reference and high performance by exploiting maximum parallelism

    ์ด์ข… ๋ฉ€ํ‹ฐ ์ฝ”์–ด ํ”„๋กœ์„ธ์„œ์—์„œ SDF/L ๊ทธ๋ž˜ํ”„ ์Šค์ผ€์ค„๋ง ๊ธฐ๋ฒ•

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(์„์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2021.8. Ha Soonhoi.Although dataflow models are known to thrive at exploiting task-level parallelism of an application, it is difficult to exploit the parallelism of data. Data-level parallelism can be represented well with loop structures, but these structures are not explicitly specified in most existing dataflow models. SDF/L model was introduced to overcome this shortcoming by specifying the loop structures explicitly in a hierarchical fashion. To the best of our knowledge however, scheduling of SDF/L graph onto heterogeneous processors has not been considered in any previous work. In this dissertation, we introduce a scheduling technique of an application represented by the SDF/L model onto heterogeneous processors. In the proposed method, we explore the mapping of tasks using an evolutionary meta-heuristic and schedule hierarchically in a bottom-up fashion, creating parallel loop schedules at lower levels first and then re-using them when constructing the schedule at a higher level. To verify the efficiency of the proposed scheduling methodology, we apply it to benchmark examples and randomly generated SDF/L graphs.๋ฐ์ดํ„ฐํ”Œ๋กœ์šฐ ๋ชจ๋ธ์€ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์˜ ํƒœ์Šคํฌ๋ฅผ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌํ•  ๋•Œ ์ข‹์€ ๋ชจ๋ธ๋กœ ์•Œ๋ ค์ ธ ์žˆ์ง€๋งŒ ๋ฐ์ดํ„ฐ๋ฅผ ๋ณ‘๋ ฌ๋กœ ์ฒ˜๋ฆฌํ•˜๋Š” ๋ฐ์— ํ™œ์šฉํ•˜๊ธฐ๋Š” ์–ด๋ ต๋‹ค. ๋ฐ์ดํ„ฐ ์ˆ˜์ค€ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ๋Š” ๋ฃจํ”„ ๊ตฌ์กฐ๋ฅผ ํ†ตํ•ด ํ‘œํ˜„๋  ์ˆ˜ ์žˆ์œผ๋‚˜ ๊ธฐ์กด ๋ฐ์ดํ„ฐํ”Œ๋กœ์šฐ ๋ชจ๋ธ์—์„œ ๋ช…์‹œ์ ์œผ๋กœ ๋ฃจํ”„ ๊ตฌ์กฐ๋Š” ๋ช…์„ธํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ์—†์—ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋‹จ์ ์„ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•ด ๊ณ„์ธต์  ๊ตฌ์กฐ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋ฃจํ”„ ๊ตฌ์กฐ๋ฅผ ๋ช…์‹œ์ ์œผ๋กœ ๋ช…์„ธํ•  ์ˆ˜ ์žˆ๋Š” SDF/L ๋ชจ๋ธ์ด ์ œ์•ˆ๋˜์—ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ด๊ธฐ์ข… ํ”„๋กœ์„ธ์„œ์— ๋Œ€ํ•œ SDF/L ๊ทธ๋ž˜ํ”„์˜ ์Šค์ผ€์ค„๋ง์€ ์ด์ „๊นŒ์ง€ ๊ณ ๋ ค๋˜์ง€ ์•Š์€ ๊ฒƒ์œผ๋กœ ํŒŒ์•…๋œ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” SDF/L ๋ชจ๋ธ๋กœ ํ‘œํ˜„๋˜๋Š” ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์„ ์ด๊ธฐ์ข… ํ”„๋กœ์„ธ์„œ์— ๋Œ€ํ•˜์—ฌ ์Šค์ผ€์ค„๋งํ•˜๋Š” ๊ธฐ๋ฒ•์„ ์†Œ๊ฐœํ•œ๋‹ค. ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์—์„œ๋Š” ๋จผ์ € ์ง„ํ™”์  ๋ฉ”ํƒ€ ํœด๋ฆฌ์Šคํ‹ฑ์„ ์‚ฌ์šฉํ•˜์—ฌ ํƒœ์Šคํฌ ๋งคํ•‘์„ ํƒ์ƒ‰ํ•œ๋‹ค. ์ดํ›„ ํ•˜์œ„ ์ˆ˜์ค€์—์„œ ๋ณ‘๋ ฌ ๋ฃจํ”„ ์Šค์ผ€์ค„์„ ๋งŒ๋“  ๋‹ค์Œ ์ƒ์œ„ ์ˆ˜์ค€์—์„œ ์Šค์ผ€์ค„ ๊ตฌ์„ฑํ•  ๋•Œ ์žฌ์‚ฌ์šฉํ•˜๋Š” ์ƒํ–ฅ์‹์˜ ๊ณ„์ธต์  ํƒœ์Šคํฌ ์Šค์ผ€์ค„๋ง์„ ์ˆ˜ํ–‰ํ•œ๋‹ค. ์ œ์•ˆํ•˜๋Š” ์Šค์ผ€์ค„๋ง ๊ธฐ๋ฒ•์˜ ํšจ์œจ์„ฑ์„ ๊ฒ€์ฆํ•˜๊ธฐ ์œ„ํ•ด ๋ฒค์น˜๋งˆํฌ ์˜ˆ์ œ์™€ ๋ฌด์ž‘์œ„๋กœ ์ƒ์„ฑ๋œ SDF/L ๊ทธ๋ž˜ํ”„์— ๊ธฐ๋ฒ•์„ ์ ์šฉํ•˜์˜€๋‹ค.Chapter 1 Introduction 1 Chapter 2 Related Work 6 2.1 SDF Scheduling with Data-level Parallelism 8 2.2 Hierarchical Scheduling 9 Chapter 3 Problem and Challenges 11 3.1 Notations and Problem Description 11 3.2 Challenges 12 Chapter 4 Proposed methodology 15 4.1 Mapping Exploration 15 4.2 Priority Assignment and List Scheduling Heuristic 17 4.3 Hierarchical Scheduling 18 4.4 Complexity 23 Chapter 5 Experiments 24 5.1 Benchmarks 25 5.2 Randomly Generated Graphs 30 Chapter 6 Conclusions 35 Bibliography 37 ์š” ์•ฝ 41์„

    Investigation of implementing a synchronization protocol under multiprocessors hierarchical scheduling

    Get PDF
    In the multi-core and multiprocessor domain, there has been considerable work done on scheduling techniques assuming that real-time tasks are independent. In practice a typical real-time system usually share logical resources among tasks. However, synchronization in the multiprocessor area has not received enough attention. In this paper we investigate the possibilities of extending multiprocessor hierarchical scheduling to support an existing synchronization protocol (FMLP) in multiprocessor systems. We discuss problems regarding implementation of the synchronization protocol under the multiprocessor hierarchical scheduling

    System Synthesis for Embedded Multiprocessors

    Get PDF
    Modern embedded systems must increasingly accommodate dynamically changing operating environments, high computational requirements, and tight time-to-market windows. Such trends and the ever-increasing design complexity of embedded systems have challenged designers to raise the level of abstraction and replace traditional ad-hoc approaches with more efficient synthesis techniques. Additionally, since embedded multiprocessor systems are typically designed as final implementations for dedicated functions, modifications to embedded system implementations are rare, and this allows embedded system designers to spend significantly larger amounts of time to optimize the architecture and the employed software. This dissertation presents several system-level synthesis algorithms that employ time-intensive optimization techniques that allow the designer to explore a significantly larger part of the design space. It looks at critical issues that are at the core of the synthesis process --- selecting the architecture, partitioning the functionality over the components of the architecture, and scheduling activities such that design constraints and optimization objectives are satisfied. More specifically for the scheduling step, a new solution to the two-step multiprocessor scheduling problem is proposed. For the first step of clustering a highly efficient genetic algorithm is proposed. Several techniques for the second step of merging are proposed and finally a complete two-step effective solution is presented. Also, a randomization technique is applied to existing deterministic techniques to extend these techniques so that they can utilize arbitrary increases in available optimization time. This novel framework for extending deterministic algorithms in our context allows for accurate and fair comparison of our techniques against the state of the art. To further generalize the proposed clustering-based scheduling approach, a complementary two-step multiprocessor scheduling approach for heterogeneous multiprocessor systems is presented. This work is amongst the first works that formally studies the application of clustering to heterogeneous system scheduling. Several techniques are proposed and compared and conclusive results are presented. A modular system-level synthesis framework is then proposed. It synthesizes multi-mode, multi-task embedded systems under a number of hard constraints; optimizes a comprehensive set of objectives; and provides a set of alternative trade-off points in a given multi-objective design evaluation space. An extension of the framework is proposed to better address DVS, memory optimization, and efficient mappings onto dynamically reconfigurable hardware. An integrated framework for energy-driven scheduling onto embedded multiprocessor systems is proposed. It employs a solution representation that encodes both task assignment and ordering into a single chromosome and hence significantly reduces the search space and problem complexity. It is shown that a task assignment and scheduling that result in better performance do not necessarily save power, and hence, integrating task scheduling and voltage scheduling is crucial for fully exploiting the energy-saving potential of an embedded multiprocessor implementation

    Mapping of portable parallel programs

    Get PDF
    An efficient parallel program designed for a parallel architecture includes a detailed outline of accurate assignments of concurrent computations onto processors, and data transfers onto communication links, such that the overall execution time is minimized. This process may be complex depending on the application task and the target multiprocessor architecture. Furthermore, this process is to be repeated for every different architecture even though the application task may be the same. Consequently, this has a major impact on the ever increasing cost of software development for multiprocessor systems. A remedy for this problem would be to design portable parallel programs which can be mapped efficiently onto any computer system. In this dissertation, we present a portable programming tool called Cluster-M. The three components of Cluster-M are the Specification Module, the Representation Module, and the Mapping Module. In the Specification Module, for a given problem, a machine-independent program is generated and represented in the form of a clustered task graph called Spec graph. Similarly, in the Representation Module, for a given architecture or heterogeneous suite of computers, a clustered system graph called Rep graph is generated. The Mapping Module is responsible for efficient mapping of Spec graphs onto Rep graphs. As part of this module, we present the first algorithm which produces a near-optimal mapping of an arbitrary non-uniform machine-independent task graph with M modules, onto an arbitrary non-uniform task-independent system graph having N processors, in 0(M P) time, where P = max(M, N). Our experimental results indicate that Cluster-M produces better or similar mapping results compared to other leading techniques which work only for restricted task or system graphs

    A Survey of Techniques For Improving Energy Efficiency in Embedded Computing Systems

    Full text link
    Recent technological advances have greatly improved the performance and features of embedded systems. With the number of just mobile devices now reaching nearly equal to the population of earth, embedded systems have truly become ubiquitous. These trends, however, have also made the task of managing their power consumption extremely challenging. In recent years, several techniques have been proposed to address this issue. In this paper, we survey the techniques for managing power consumption of embedded systems. We discuss the need of power management and provide a classification of the techniques on several important parameters to highlight their similarities and differences. This paper is intended to help the researchers and application-developers in gaining insights into the working of power management techniques and designing even more efficient high-performance embedded systems of tomorrow
    • โ€ฆ
    corecore