799 research outputs found
Mapping and Scheduling of Directed Acyclic Graphs on An FPFA Tile
An architecture for a hand-held multimedia device requires components that are energy-efficient, flexible, and provide high performance. In the CHAMELEON [4] project we develop a coarse grained reconfigurable device for DSP-like algorithms, the so-called Field Programmable Function Array (FPFA). The FPFA devices are reminiscent to FPGAs, but with a matrix of Processing Parts (PP) instead of CLBs. The design of the FPFA focuses on: (1) Keeping each PP small to maximize the number of PPs that can fit on a chip; (2) providing sufficient flexibility; (3) Low energy consumption; (4) Exploiting the maximum amount of parallelism; (5) A strong support tool for FPFA-based applications. The challenge in providing compiler support for the FPFA-based design stems from the flexibility of the FPFA structure. If we do not use the characteristics of the FPFA structure properly, the advantages of an FPFA may become its disadvantages. The GECKO1project focuses on this problem. In this paper, we present a mapping and scheduling scheme for applications running on one FPFA tile. Applications are written in C and C code is translated to a Directed Acyclic Graphs (DAG) [4]. This scheme can map a DAG directly onto the reconfigurable PPs of an FPFA tile. It tries to achieve low power consumption by exploiting locality of reference and high performance by exploiting maximum parallelism
์ด์ข ๋ฉํฐ ์ฝ์ด ํ๋ก์ธ์์์ SDF/L ๊ทธ๋ํ ์ค์ผ์ค๋ง ๊ธฐ๋ฒ
ํ์๋
ผ๋ฌธ(์์ฌ) -- ์์ธ๋ํ๊ต๋ํ์ : ๊ณต๊ณผ๋ํ ์ปดํจํฐ๊ณตํ๋ถ, 2021.8. Ha Soonhoi.Although dataflow models are known to thrive at exploiting task-level parallelism of an application, it is difficult to exploit the parallelism of data. Data-level parallelism can be represented well with loop structures, but these structures are not explicitly specified in most existing dataflow models. SDF/L model was introduced to overcome this shortcoming by specifying the loop structures explicitly in a hierarchical fashion. To the best of our knowledge however, scheduling of SDF/L graph onto heterogeneous processors has not been considered in any previous work.
In this dissertation, we introduce a scheduling technique of an application represented by the SDF/L model onto heterogeneous processors. In the proposed method, we explore the mapping of tasks using an evolutionary meta-heuristic and schedule hierarchically in a bottom-up fashion, creating parallel loop schedules at lower levels first and then re-using them when constructing the schedule at a higher level. To verify the efficiency of the proposed scheduling methodology, we apply it to benchmark examples and randomly generated SDF/L graphs.๋ฐ์ดํฐํ๋ก์ฐ ๋ชจ๋ธ์ ์ ํ๋ฆฌ์ผ์ด์
์ ํ์คํฌ๋ฅผ ๋ณ๋ ฌ ์ฒ๋ฆฌํ ๋ ์ข์ ๋ชจ๋ธ๋ก ์๋ ค์ ธ ์์ง๋ง ๋ฐ์ดํฐ๋ฅผ ๋ณ๋ ฌ๋ก ์ฒ๋ฆฌํ๋ ๋ฐ์ ํ์ฉํ๊ธฐ๋ ์ด๋ ต๋ค. ๋ฐ์ดํฐ ์์ค ๋ณ๋ ฌ ์ฒ๋ฆฌ๋ ๋ฃจํ ๊ตฌ์กฐ๋ฅผ ํตํด ํํ๋ ์ ์์ผ๋ ๊ธฐ์กด ๋ฐ์ดํฐํ๋ก์ฐ ๋ชจ๋ธ์์ ๋ช
์์ ์ผ๋ก ๋ฃจํ ๊ตฌ์กฐ๋ ๋ช
์ธํ๋ ๋ฐฉ๋ฒ์ด ์์๋ค. ์ด๋ฌํ ๋จ์ ์ ๊ทน๋ณตํ๊ธฐ ์ํด ๊ณ์ธต์ ๊ตฌ์กฐ๋ฅผ ํ์ฉํ์ฌ ๋ฃจํ ๊ตฌ์กฐ๋ฅผ ๋ช
์์ ์ผ๋ก ๋ช
์ธํ ์ ์๋ SDF/L ๋ชจ๋ธ์ด ์ ์๋์๋ค. ๊ทธ๋ฌ๋ ์ด๊ธฐ์ข
ํ๋ก์ธ์์ ๋ํ SDF/L ๊ทธ๋ํ์ ์ค์ผ์ค๋ง์ ์ด์ ๊น์ง ๊ณ ๋ ค๋์ง ์์ ๊ฒ์ผ๋ก ํ์
๋๋ค.
๋ณธ ๋
ผ๋ฌธ์์๋ SDF/L ๋ชจ๋ธ๋ก ํํ๋๋ ์ ํ๋ฆฌ์ผ์ด์
์ ์ด๊ธฐ์ข
ํ๋ก์ธ์์ ๋ํ์ฌ ์ค์ผ์ค๋งํ๋ ๊ธฐ๋ฒ์ ์๊ฐํ๋ค. ์ ์๋ ๋ฐฉ๋ฒ์์๋ ๋จผ์ ์งํ์ ๋ฉํ ํด๋ฆฌ์คํฑ์ ์ฌ์ฉํ์ฌ ํ์คํฌ ๋งคํ์ ํ์ํ๋ค. ์ดํ ํ์ ์์ค์์ ๋ณ๋ ฌ ๋ฃจํ ์ค์ผ์ค์ ๋ง๋ ๋ค์ ์์ ์์ค์์ ์ค์ผ์ค ๊ตฌ์ฑํ ๋ ์ฌ์ฌ์ฉํ๋ ์ํฅ์์ ๊ณ์ธต์ ํ์คํฌ ์ค์ผ์ค๋ง์ ์ํํ๋ค. ์ ์ํ๋ ์ค์ผ์ค๋ง ๊ธฐ๋ฒ์ ํจ์จ์ฑ์ ๊ฒ์ฆํ๊ธฐ ์ํด ๋ฒค์น๋งํฌ ์์ ์ ๋ฌด์์๋ก ์์ฑ๋ SDF/L ๊ทธ๋ํ์ ๊ธฐ๋ฒ์ ์ ์ฉํ์๋ค.Chapter 1 Introduction 1
Chapter 2 Related Work 6
2.1 SDF Scheduling with Data-level Parallelism 8
2.2 Hierarchical Scheduling 9
Chapter 3 Problem and Challenges 11
3.1 Notations and Problem Description 11
3.2 Challenges 12
Chapter 4 Proposed methodology 15
4.1 Mapping Exploration 15
4.2 Priority Assignment and List Scheduling Heuristic 17
4.3 Hierarchical Scheduling 18
4.4 Complexity 23
Chapter 5 Experiments 24
5.1 Benchmarks 25
5.2 Randomly Generated Graphs 30
Chapter 6 Conclusions 35
Bibliography 37
์ ์ฝ 41์
Investigation of implementing a synchronization protocol under multiprocessors hierarchical scheduling
In the multi-core and multiprocessor domain, there has been considerable work done on scheduling techniques assuming that real-time tasks are independent. In practice a typical real-time system usually share logical resources among tasks. However, synchronization in the multiprocessor area has not received enough attention. In this paper we investigate the possibilities of extending multiprocessor hierarchical scheduling to support an existing synchronization protocol (FMLP) in multiprocessor systems. We discuss problems regarding implementation of the synchronization protocol under the multiprocessor hierarchical scheduling
System Synthesis for Embedded Multiprocessors
Modern embedded systems must increasingly
accommodate dynamically changing operating environments, high computational requirements, and tight time-to-market windows. Such trends and the ever-increasing design complexity of embedded systems have challenged designers to raise the level of abstraction and replace traditional ad-hoc approaches with more efficient synthesis
techniques. Additionally, since embedded multiprocessor systems are typically designed as final implementations for dedicated
functions, modifications to embedded system implementations are rare, and this allows embedded system designers to spend
significantly larger amounts of time to optimize the architecture and the employed software. This dissertation presents several system-level synthesis algorithms that employ time-intensive optimization techniques that allow the designer to explore a significantly larger part of the design space. It looks at critical issues that are at the core of the synthesis process ---
selecting the architecture, partitioning the
functionality over the components of the architecture, and scheduling activities such that design constraints and optimization objectives
are satisfied.
More specifically for the scheduling step, a new solution to the two-step multiprocessor scheduling problem is proposed. For the first step of clustering a highly efficient genetic algorithm is proposed. Several techniques for the second step of merging are proposed and finally a complete two-step effective solution is presented. Also, a randomization technique is applied to existing deterministic techniques to extend these techniques so that they can
utilize arbitrary increases in available optimization time. This novel framework for extending deterministic algorithms in our context allows for accurate and fair comparison of our techniques against the state of the art.
To further generalize the proposed clustering-based scheduling approach, a complementary two-step multiprocessor scheduling approach for
heterogeneous multiprocessor systems is presented. This work is amongst the first works that formally studies the application of
clustering to heterogeneous system scheduling. Several techniques are proposed and compared and conclusive results are presented.
A modular system-level synthesis framework is then proposed. It synthesizes multi-mode, multi-task embedded systems under a number of hard constraints; optimizes a comprehensive set of objectives; and provides a set of alternative trade-off points in a given multi-objective design evaluation space. An extension of the
framework is proposed to better address DVS, memory optimization, and efficient mappings onto dynamically reconfigurable hardware.
An integrated framework for energy-driven
scheduling onto embedded multiprocessor systems is proposed. It employs a solution representation that encodes both task assignment and ordering into a single chromosome and hence significantly
reduces the search space and problem complexity. It is shown that a task assignment and scheduling that result in better performance do not necessarily save power, and hence, integrating task scheduling and voltage scheduling is crucial for fully exploiting the energy-saving potential
of an embedded multiprocessor implementation
Mapping of portable parallel programs
An efficient parallel program designed for a parallel architecture includes a detailed outline of accurate assignments of concurrent computations onto processors, and data transfers onto communication links, such that the overall execution time is minimized. This process may be complex depending on the application task and the target multiprocessor architecture. Furthermore, this process is to be repeated for every different architecture even though the application task may be the same. Consequently, this has a major impact on the ever increasing cost of software development for multiprocessor systems. A remedy for this problem would be to design portable parallel programs which can be mapped efficiently onto any computer system. In this dissertation, we present a portable programming tool called Cluster-M. The three components of Cluster-M are the Specification Module, the Representation Module, and the Mapping Module. In the Specification Module, for a given problem, a machine-independent program is generated and represented in the form of a clustered task graph called Spec graph. Similarly, in the Representation Module, for a given architecture or heterogeneous suite of computers, a clustered system graph called Rep graph is generated. The Mapping Module is responsible for efficient mapping of Spec graphs onto Rep graphs. As part of this module, we present the first algorithm which produces a near-optimal mapping of an arbitrary non-uniform machine-independent task graph with M modules, onto an arbitrary non-uniform task-independent system graph having N processors, in 0(M P) time, where P = max(M, N). Our experimental results indicate that Cluster-M produces better or similar mapping results compared to other leading techniques which work only for restricted task or system graphs
A Survey of Techniques For Improving Energy Efficiency in Embedded Computing Systems
Recent technological advances have greatly improved the performance and
features of embedded systems. With the number of just mobile devices now
reaching nearly equal to the population of earth, embedded systems have truly
become ubiquitous. These trends, however, have also made the task of managing
their power consumption extremely challenging. In recent years, several
techniques have been proposed to address this issue. In this paper, we survey
the techniques for managing power consumption of embedded systems. We discuss
the need of power management and provide a classification of the techniques on
several important parameters to highlight their similarities and differences.
This paper is intended to help the researchers and application-developers in
gaining insights into the working of power management techniques and designing
even more efficient high-performance embedded systems of tomorrow
- โฆ