1,401 research outputs found

    Modeling and Mapping of Optimized Schedules for Embedded Signal Processing Systems

    Get PDF
    The demand for Digital Signal Processing (DSP) in embedded systems has been increasing rapidly due to the proliferation of multimedia- and communication-intensive devices such as pervasive tablets and smart phones. Efficient implementation of embedded DSP systems requires integration of diverse hardware and software components, as well as dynamic workload distribution across heterogeneous computational resources. The former implies increased complexity of application modeling and analysis, but also brings enhanced potential for achieving improved energy consumption, cost or performance. The latter results from the increased use of dynamic behavior in embedded DSP applications. Furthermore, parallel programming is highly relevant in many embedded DSP areas due to the development and use of Multiprocessor System-On-Chip (MPSoC) technology. The need for efficient cooperation among different devices supporting diverse parallel embedded computations motivates high-level modeling that expresses dynamic signal processing behaviors and supports efficient task scheduling and hardware mapping. Starting with dynamic modeling, this thesis develops a systematic design methodology that supports functional simulation and hardware mapping of dynamic reconfiguration based on Parameterized Synchronous Dataflow (PSDF) graphs. By building on the DIF (Dataflow Interchange Format), which is a design language and associated software package for developing and experimenting with dataflow-based design techniques for signal processing systems, we have developed a novel tool for functional simulation of PSDF specifications. This simulation tool allows designers to model applications in PSDF and simulate their functionality, including use of the dynamic parameter reconfiguration capabilities offered by PSDF. With the help of this simulation tool, our design methodology helps to map PSDF specifications into efficient implementations on field programmable gate arrays (FPGAs). Furthermore, valid schedules can be derived from the PSDF models at runtime to adapt hardware configurations based on changing data characteristics or operational requirements. Under certain conditions, efficient quasi-static schedules can be applied to reduce overhead and enhance predictability in the scheduling process. Motivated by the fact that scheduling is critical to performance and to efficient use of dynamic reconfiguration, we have focused on a methodology for schedule design, which complements the emphasis on automated schedule construction in the existing literature on dataflow-based design and implementation. In particular, we have proposed a dataflow-based schedule design framework called the dataflow schedule graph (DSG), which provides a graphical framework for schedule construction based on dataflow semantics, and can also be used as an intermediate representation target for automated schedule generation. Our approach to applying the DSG in this thesis emphasizes schedule construction as a design process rather than an outcome of the synthesis process. Our approach employs dataflow graphs for representing both application models and schedules that are derived from them. By providing a dataflow-integrated framework for unambiguously representing, analyzing, manipulating, and interchanging schedules, the DSG facilitates effective codesign of dataflow-based application models and schedules for execution of these models. As multicore processors are deployed in an increasing variety of embedded image processing systems, effective utilization of resources such as multiprocessor systemon-chip (MPSoC) devices, and effective handling of implementation concerns such as memory management and I/O become critical to developing efficient embedded implementations. However, the diversity and complexity of applications and architectures in embedded image processing systems make the mapping of applications onto MPSoCs difficult. We help to address this challenge through a structured design methodology that is built upon the DSG modeling framework. We refer to this methodology as the DEIPS methodology (DSG-based design and implementation of Embedded Image Processing Systems). The DEIPS methodology provides a unified framework for joint consideration of DSG structures and the application graphs from which they are derived, which allows designers to integrate considerations of parallelization and resource constraints together with the application modeling process. We demonstrate the DEIPS methodology through cases studies on practical embedded image processing systems

    병렬 및 λΆ„μ‚° μž„λ² λ””λ“œ μ‹œμŠ€ν…œμ„ μœ„ν•œ λͺ¨λΈ 기반 μ½”λ“œ 생성 ν”„λ ˆμž„μ›Œν¬

    Get PDF
    ν•™μœ„λ…Όλ¬Έ(박사)--μ„œμšΈλŒ€ν•™κ΅ λŒ€ν•™μ› :κ³΅κ³ΌλŒ€ν•™ 컴퓨터곡학뢀,2020. 2. ν•˜μˆœνšŒ.μ†Œν”„νŠΈμ›¨μ–΄ 섀계 생산성 및 μœ μ§€λ³΄μˆ˜μ„±μ„ ν–₯μƒμ‹œν‚€κΈ° μœ„ν•΄ λ‹€μ–‘ν•œ μ†Œν”„νŠΈμ›¨μ–΄ 개발 방법둠이 μ œμ•ˆλ˜μ—ˆμ§€λ§Œ, λŒ€λΆ€λΆ„μ˜ μ—°κ΅¬λŠ” μ‘μš© μ†Œν”„νŠΈμ›¨μ–΄λ₯Ό ν•˜λ‚˜μ˜ ν”„λ‘œμ„Έμ„œμ—μ„œ λ™μž‘μ‹œν‚€λŠ” 데에 μ΄ˆμ μ„ λ§žμΆ”κ³  μžˆλ‹€. λ˜ν•œ, μž„λ² λ””λ“œ μ‹œμŠ€ν…œμ„ κ°œλ°œν•˜λŠ” 데에 ν•„μš”ν•œ μ§€μ—°μ΄λ‚˜ μžμ› μš”κ΅¬ 사항에 λŒ€ν•œ λΉ„κΈ°λŠ₯적 μš”κ΅¬ 사항을 κ³ λ €ν•˜μ§€ μ•Šκ³  있기 λ•Œλ¬Έμ— 일반적인 μ†Œν”„νŠΈμ›¨μ–΄ 개발 방법둠을 μž„λ² λ””λ“œ μ†Œν”„νŠΈμ›¨μ–΄λ₯Ό κ°œλ°œν•˜λŠ” 데에 μ μš©ν•˜λŠ” 것은 μ ν•©ν•˜μ§€ μ•Šλ‹€. 이 λ…Όλ¬Έμ—μ„œλŠ” 병렬 및 λΆ„μ‚° μž„λ² λ””λ“œ μ‹œμŠ€ν…œμ„ λŒ€μƒμœΌλ‘œ ν•˜λŠ” μ†Œν”„νŠΈμ›¨μ–΄λ₯Ό λͺ¨λΈλ‘œ ν‘œν˜„ν•˜κ³ , 이λ₯Ό μ†Œν”„νŠΈμ›¨μ–΄ λΆ„μ„μ΄λ‚˜ κ°œλ°œμ— ν™œμš©ν•˜λŠ” 개발 방법둠을 μ†Œκ°œν•œλ‹€. 우리의 λͺ¨λΈμ—μ„œ μ‘μš© μ†Œν”„νŠΈμ›¨μ–΄λŠ” κ³„μΈ΅μ μœΌλ‘œ ν‘œν˜„ν•  수 μžˆλŠ” μ—¬λŸ¬ 개의 νƒœμŠ€ν¬λ‘œ 이루어져 있으며, ν•˜λ“œμ›¨μ–΄ ν”Œλž«νΌκ³Ό λ…λ¦½μ μœΌλ‘œ λͺ…μ„Έν•œλ‹€. νƒœμŠ€ν¬ κ°„μ˜ 톡신 및 λ™κΈ°ν™”λŠ” λͺ¨λΈμ΄ μ •μ˜ν•œ κ·œμ•½μ΄ μ •ν•΄μ Έ 있고, μ΄λŸ¬ν•œ κ·œμ•½μ„ 톡해 μ‹€μ œ ν”„λ‘œκ·Έλž¨μ„ μ‹€ν–‰ν•˜κΈ° 전에 μ†Œν”„νŠΈμ›¨μ–΄ μ—λŸ¬λ₯Ό 정적 뢄석을 톡해 확인할 수 있고, μ΄λŠ” μ‘μš©μ˜ 검증 λ³΅μž‘λ„λ₯Ό μ€„μ΄λŠ” 데에 κΈ°μ—¬ν•œλ‹€. μ§€μ •ν•œ ν•˜λ“œμ›¨μ–΄ ν”Œλž«νΌμ—μ„œ λ™μž‘ν•˜λŠ” ν”„λ‘œκ·Έλž¨μ€ νƒœμŠ€ν¬λ“€μ„ ν”„λ‘œμ„Έμ„œμ— λ§€ν•‘ν•œ 이후에 μžλ™μ μœΌλ‘œ ν•©μ„±ν•  수 μžˆλ‹€. μœ„μ˜ λͺ¨λΈ 기반 μ†Œν”„νŠΈμ›¨μ–΄ 개발 λ°©λ²•λ‘ μ—μ„œ μ‚¬μš©ν•˜λŠ” ν”„λ‘œκ·Έλž¨ ν•©μ„±κΈ°λ₯Ό λ³Έ λ…Όλ¬Έμ—μ„œ μ œμ•ˆν•˜μ˜€λŠ”λ°, λͺ…μ„Έν•œ ν”Œλž«νΌ μš”κ΅¬ 사항을 λ°”νƒ•μœΌλ‘œ 병렬 및 λΆ„μ‚° μž„λ² λ””λ“œ μ‹œμŠ€ν…œμ„μ—μ„œ λ™μž‘ν•˜λŠ” μ½”λ“œλ₯Ό μƒμ„±ν•œλ‹€. μ—¬λŸ¬ 개의 μ •ν˜•μ  λͺ¨λΈλ“€μ„ κ³„μΈ΅μ μœΌλ‘œ ν‘œν˜„ν•˜μ—¬ μ‘μš©μ˜ 동적 ν–‰νƒœλ₯Ό λ‚˜νƒ€κ³ , ν•©μ„±κΈ°λŠ” μ—¬λŸ¬ λͺ¨λΈλ‘œ κ΅¬μ„±λœ 계측적인 λͺ¨λΈλ‘œλΆ€ν„° 병렬성을 κ³ λ €ν•˜μ—¬ νƒœμŠ€ν¬λ₯Ό μ‹€ν–‰ν•  수 μžˆλ‹€. λ˜ν•œ, ν”„λ‘œκ·Έλž¨ ν•©μ„±κΈ°μ—μ„œ λ‹€μ–‘ν•œ ν”Œλž«νΌμ΄λ‚˜ λ„€νŠΈμ›Œν¬λ₯Ό 지원할 수 μžˆλ„λ‘ μ½”λ“œλ₯Ό κ΄€λ¦¬ν•˜λŠ” 방법도 보여주고 μžˆλ‹€. λ³Έ λ…Όλ¬Έμ—μ„œ μ œμ‹œν•˜λŠ” μ†Œν”„νŠΈμ›¨μ–΄ 개발 방법둠은 6개의 ν•˜λ“œμ›¨μ–΄ ν”Œλž«νΌκ³Ό 3 μ’…λ₯˜μ˜ λ„€νŠΈμ›Œν¬λ‘œ κ΅¬μ„±λ˜μ–΄ μžˆλŠ” μ‹€μ œ κ°μ‹œ μ†Œν”„νŠΈμ›¨μ–΄ μ‹œμŠ€ν…œ μ‘μš© μ˜ˆμ œμ™€ 이쒅 λ©€ν‹° ν”„λ‘œμ„Έμ„œλ₯Ό ν™œμš©ν•˜λŠ” 원격 λ”₯ λŸ¬λ‹ 예제λ₯Ό μˆ˜ν–‰ν•˜μ—¬ 개발 λ°©λ²•λ‘ μ˜ 적용 κ°€λŠ₯성을 μ‹œν—˜ν•˜μ˜€λ‹€. λ˜ν•œ, ν”„λ‘œκ·Έλž¨ ν•©μ„±κΈ°κ°€ μƒˆλ‘œμš΄ ν”Œλž«νΌμ΄λ‚˜ λ„€νŠΈμ›Œν¬λ₯Ό μ§€μ›ν•˜κΈ° μœ„ν•΄ ν•„μš”λ‘œ ν•˜λŠ” 개발 λΉ„μš©λ„ μ‹€μ œ μΈ‘μ • 및 μ˜ˆμΈ‘ν•˜μ—¬ μƒλŒ€μ μœΌλ‘œ 적은 λ…Έλ ₯으둜 μƒˆλ‘œμš΄ ν”Œλž«νΌμ„ 지원할 수 μžˆμŒμ„ ν™•μΈν•˜μ˜€λ‹€. λ§Žμ€ μž„λ² λ””λ“œ μ‹œμŠ€ν…œμ—μ„œ μ˜ˆμƒμΉ˜ λͺ»ν•œ ν•˜λ“œμ›¨μ–΄ μ—λŸ¬μ— λŒ€ν•΄ 결함을 κ°λ‚΄ν•˜λŠ” 것을 ν•„μš”λ‘œ ν•˜κΈ° λ•Œλ¬Έμ— 결함 감내에 λŒ€ν•œ μ½”λ“œλ₯Ό μžλ™μœΌλ‘œ μƒμ„±ν•˜λŠ” 연ꡬ도 μ§„ν–‰ν•˜μ˜€λ‹€. λ³Έ κΈ°λ²•μ—μ„œ 결함 감내 섀정에 따라 νƒœμŠ€ν¬ κ·Έλž˜ν”„λ₯Ό μˆ˜μ •ν•˜λŠ” 방식을 ν™œμš©ν•˜μ˜€μœΌλ©°, 결함 κ°λ‚΄μ˜ λΉ„κΈ°λŠ₯적 μš”κ΅¬ 사항을 μ‘μš© κ°œλ°œμžκ°€ μ‰½κ²Œ μ μš©ν•  수 μžˆλ„λ‘ ν•˜μ˜€λ‹€. λ˜ν•œ, 결함 감내 μ§€μ›ν•˜λŠ” 것과 κ΄€λ ¨ν•˜μ—¬ μ‹€μ œ μˆ˜λ™μœΌλ‘œ κ΅¬ν˜„ν–ˆμ„ κ²½μš°μ™€ λΉ„κ΅ν•˜μ˜€κ³ , 결함 μ£Όμž… 도ꡬλ₯Ό μ΄μš©ν•˜μ—¬ 결함 λ°œμƒ μ‹œλ‚˜λ¦¬μ˜€λ₯Ό μž¬ν˜„ν•˜κ±°λ‚˜, μž„μ˜λ‘œ 결함을 μ£Όμž…ν•˜λŠ” μ‹€ν—˜μ„ μˆ˜ν–‰ν•˜μ˜€λ‹€. λ§ˆμ§€λ§‰μœΌλ‘œ 결함 감내λ₯Ό μ‹€ν—˜ν•  λ•Œμ— ν™œμš©ν•œ 결함 μ£Όμž… λ„κ΅¬λŠ” λ³Έ λ…Όλ¬Έμ˜ 또 λ‹€λ₯Έ κΈ°μ—¬ 사항 쀑 ν•˜λ‚˜λ‘œ λ¦¬λˆ…μŠ€ ν™˜κ²½μœΌλ‘œ λŒ€μƒμœΌλ‘œ μ‘μš© μ˜μ—­ 및 컀널 μ˜μ—­μ— 결함을 μ£Όμž…ν•˜λŠ” 도ꡬλ₯Ό κ°œλ°œν•˜μ˜€λ‹€. μ‹œμŠ€ν…œμ˜ 견고성을 κ²€μ¦ν•˜κΈ° μœ„ν•΄ 결함을 μ£Όμž…ν•˜μ—¬ 결함 μ‹œλ‚˜λ¦¬μ˜€λ₯Ό μž¬ν˜„ν•˜λŠ” 것은 널리 μ‚¬μš©λ˜λŠ” λ°©λ²•μœΌλ‘œ, λ³Έ λ…Όλ¬Έμ—μ„œ 개발된 결함 μ£Όμž… λ„κ΅¬λŠ” μ‹œμŠ€ν…œμ΄ λ™μž‘ν•˜λŠ” 도쀑에 μž¬ν˜„ κ°€λŠ₯ν•œ 결함을 μ£Όμž…ν•  수 μžˆλŠ” 도ꡬ이닀. 컀널 μ˜μ—­μ—μ„œμ˜ 결함 μ£Όμž…μ„ μœ„ν•΄ 두 μ’…λ₯˜μ˜ 결함 μ£Όμž… 방법을 μ œκ³΅ν•˜λ©°, ν•˜λ‚˜λŠ” 컀널 GNU 디버거λ₯Ό μ΄μš©ν•œ 방법이고, λ‹€λ₯Έ ν•˜λ‚˜λŠ” ARM ν•˜λ“œμ›¨μ–΄ 브레이크포인트λ₯Ό ν™œμš©ν•œ 방법이닀. μ‘μš© μ˜μ—­μ—μ„œ 결함을 μ£Όμž…ν•˜κΈ° μœ„ν•΄ GDB 기반 결함 μ£Όμž… 방법을 μ΄μš©ν•˜μ—¬ 동일 μ‹œμŠ€ν…œ ν˜Ήμ€ 원격 μ‹œμŠ€ν…œμ˜ μ‘μš©μ— 결함을 μ£Όμž…ν•  수 μžˆλ‹€. 결함 μ£Όμž… 도ꡬ에 λŒ€ν•œ μ‹€ν—˜μ€ ODROID-XU4 λ³΄λ“œμ—μ„œ μ§„ν–‰ν•˜μ˜€λ‹€.While various software development methodologies have been proposed to increase the design productivity and maintainability of software, they usually focus on the development of application software running on a single processing element, without concern about the non-functional requirements of an embedded system such as latency and resource requirements. In this thesis, we present a model-based software development method for parallel and distributed embedded systems. An application is specified as a set of tasks that follow a set of given rules for communication and synchronization in a hierarchical fashion, independently of the hardware platform. Having such rules enables us to perform static analysis to check some software errors at compile time to reduce the verification difficulty. Platform-specific program is synthesized automatically after mapping of tasks onto processing elements is determined. The program synthesizer is also proposed to generate codes which satisfies platform requirements for parallel and distributed embedded systems. As multiple models which can express dynamic behaviors can be depicted hierarchically, the synthesizer supports to manage multiple task graphs with a different hierarchy to run tasks with parallelism. Also, the synthesizer shows methods of managing codes for heterogeneous platforms and generating various communication methods. The viability of the proposed software development method is verified with a real-life surveillance application that runs on six processing elements with three remote communication methods, and remote deep learning example is conducted to use heterogeneous multiprocessing components on distributed systems. Also, supporting a new platform and network requires a small effort by measuring and estimating development costs. Since tolerance to unexpected errors is a required feature of many embedded systems, we also support an automatic fault-tolerant code generation. Fault tolerance can be applied by modifying the task graph based on the selected fault tolerance configurations, so the non-functional requirement of fault tolerance can be easily adopted by an application developer. To compare the effort of supporting fault tolerance, manual implementation of fault tolerance is performed. Also, the fault tolerance method is tested with the fault injection tool to emulate fault scenarios and inject faults randomly. Our fault injection tool, which has used for testing our fault-tolerance method, is another work of this thesis. Emulating fault scenarios by intentionally injecting faults is commonly used to test and verify the robustness of a system. To emulate faults on an embedded system, we present a run-time fault injection framework that can inject a fault on both a kernel and application layer of Linux-based systems. For injecting faults on a kernel layer, two complementary fault injection techniques are used. One is based on Kernel GNU Debugger, and the other is using a hardware breakpoint supported by the ARM architecture. For application-level fault injection, the GDB-based fault injection method is used to inject a fault on a remote application. The viability of the proposed fault injection tool is proved by real-life experiments with an ODROID-XU4 system.Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Contribution 6 1.3 Dissertation Organization 8 Chapter 2 Background 9 2.1 HOPES: Hope of Parallel Embedded Software 9 2.1.1 Software Development Procedure 9 2.1.2 Components of HOPES 12 2.2 Universal Execution Model 13 2.2.1 Task Graph Specification 13 2.2.2 Dataflow specification of an Application 15 2.2.3 Task Code Specification and Generic APIs 21 2.2.4 Meta-data Specification 23 Chapter 3 Program Synthesis for Parallel and Distributed Embedded Systems 24 3.1 Motivational Example 24 3.2 Program Synthesis Overview 26 3.3 Program Synthesis from Hierarchically-mixed Models 30 3.4 Platform Code Synthesis 33 3.5 Communication Code Synthesis 36 3.6 Experiments 40 3.6.1 Development Cost of Supporting New Platforms and Networks 40 3.6.2 Program Synthesis for the Surveillance System Example 44 3.6.3 Remote GPU-accelerated Deep Learning Example 46 3.7 Document Generation 48 3.8 Related Works 49 Chapter 4 Model Transformation for Fault-tolerant Code Synthesis 56 4.1 Fault-tolerant Code Synthesis Techniques 56 4.2 Applying Fault Tolerance Techniques in HOPES 61 4.3 Experiments 62 4.3.1 Development Cost of Applying Fault Tolerance 62 4.3.2 Fault Tolerance Experiments 62 4.4 Random Fault Injection Experiments 65 4.5 Related Works 68 Chapter 5 Fault Injection Framework for Linux-based Embedded Systems 70 5.1 Background 70 5.1.1 Fault Injection Techniques 70 5.1.2 Kernel GNU Debugger 71 5.1.3 ARM Hardware Breakpoint 72 5.2 Fault Injection Framework 74 5.2.1 Overview 74 5.2.2 Architecture 75 5.2.3 Fault Injection Techniques 79 5.2.4 Implementation 83 5.3 Experiments 90 5.3.1 Experiment Setup 90 5.3.2 Performance Comparison of Two Fault Injection Methods 90 5.3.3 Bit-flip Fault Experiments 92 5.3.4 eMMC Controller Fault Experiments 94 Chapter 6 Conclusion 97 Bibliography 99 μš” μ•½ 108Docto

    Schedulability analysis of timed CSP models using the PAT model checker

    Get PDF
    Timed CSP can be used to model and analyse real-time and concurrent behaviour of embedded control systems. Practical CSP implementations combine the CSP model of a real-time control system with prioritized scheduling to achieve efficient and orderly use of limited resources. Schedulability analysis of a timed CSP model of a system with respect to a scheduling scheme and a particular execution platform is important to ensure that the system design satisfies its timing requirements. In this paper, we propose a framework to analyse schedulability of CSP-based designs for non-preemptive fixed-priority multiprocessor scheduling. The framework is based on the PAT model checker and the analysis is done with dense-time model checking on timed CSP models. We also provide a schedulability analysis workflow to construct and analyse, using the proposed framework, a timed CSP model with scheduling from an initial untimed CSP model without scheduling. We demonstrate our schedulability analysis workflow on a case study of control software design for a mobile robot. The proposed approach provides non-pessimistic schedulability results

    A Power-Efficient Methodology for Mapping Applications on Multi-Processor System-on-Chip Architectures

    Get PDF
    This work introduces an application mapping methodology and case study for multi-processor on-chip architectures. Starting from the description of an application in standard sequential code (e.g. in C), first the application is profiled, parallelized when possible, then its components are moved to hardware implementation when necessary to satisfy performance and power constraints. After mapping, with the use of hardware objects to handle concurrency, the application power consumption can be further optimized by a task-based scheduler for the remaining software part, without the need for operating system support. The key contributions of this work are: a methodology for high-level hardware/software partitioning that allows the designer to use the same code for both hardware and software models for simulation, providing nevertheless preliminary estimations for timing and power consumption; and a task-based scheduling algorithm that does not require operating system support. The methodology has been applied to the co-exploration of an industrial case study: an MPEG4 VGA real-time encoder

    SCALABLE TECHNIQUES FOR SCHEDULING AND MAPPING DSP APPLICATIONS ONTO EMBEDDED MULTIPROCESSOR PLATFORMS

    Get PDF
    A variety of multiprocessor architectures has proliferated even for off-the-shelf computing platforms. To make use of these platforms, traditional implementation frameworks focus on implementing Digital Signal Processing (DSP) applications using special platform features to achieve high performance. However, due to the fast evolution of the underlying architectures, solution redevelopment is error prone and re-usability of existing solutions and libraries is limited. In this thesis, we facilitate an efficient migration of DSP systems to multiprocessor platforms while systematically leveraging previous investment in optimized library kernels using dataflow design frameworks. We make these library elements, which are typically tailored to specialized architectures, more amenable to extensive analysis and optimization using an efficient and systematic process. In this thesis we provide techniques to allow such migration through four basic contributions: 1. We propose and develop a framework to explore efficient utilization of Single Instruction Multiple Data (SIMD) cores and accelerators available in heterogeneous multiprocessor platforms consisting of General Purpose Processors (GPPs) and Graphics Processing Units (GPUs). We also propose new scheduling techniques by applying extensive block processing in conjunction with appropriate task mapping and task ordering methods that match efficiently with the underlying architecture. The approach gives the developer the ability to prototype a GPU-accelerated application and explore its design space efficiently and effectively. 2. We introduce the concept of Partial Expansion Graphs (PEGs) as an implementation model and associated class of scheduling strategies. PEGs are designed to help realize DSP systems in terms of forms and granularities of parallelism that are well matched to the given applications and targeted platforms. PEGs also facilitate derivation of both static and dynamic scheduling techniques, depending on the amount of variability in task execution times and other operating conditions. We show how to implement efficient PEG-based scheduling methods using real time operating systems, and to re-use pre-optimized libraries of DSP components within such implementations. 3. We develop new algorithms for scheduling and mapping systems implemented using PEGs. Collectively, these algorithms operate in three steps. First, the amount of data parallelism in the application graph is tuned systematically over many iterations to profit from the available cores in the target platform. Then a mapping algorithm that uses graph analysis is developed to distribute data and task parallel instances over different cores while trying to balance the load of all processing units to make use of pipeline parallelism. Finally, we use a novel technique for performance evaluation by implementing the scheduler and a customizable solution on the programmable platform. This allows accurate fitness functions to be measured and used to drive runtime adaptation of schedules. 4. In addition to providing scheduling techniques for the mentioned applications and platforms, we also show how to integrate the resulting solution in the underlying environment. This is achieved by leveraging existing libraries and applying the GPP-GPU scheduling framework to augment a popular existing Software Defined Radio (SDR) development environment -- GNU Radio -- with a dataflow foundation and a stand-alone GPU-accelerated library. We also show how to realize the PEG model on real time operating system libraries, such as the Texas Instruments DSP/BIOS. A code generator that accepts a manual system designer solution as well as automatically configured solutions is provided to complete the design flow starting from application model to running system

    Implementation Effort and Parallelism - Metrics for Guiding Hardware/Software Partitioning in Embedded System Design

    Get PDF

    μ‹€μ‹œκ°„ μž„λ² λ””λ“œ μ‹œμŠ€ν…œμ„ μœ„ν•œ 동적 ν–‰μœ„ λͺ…μ„Έ 및 섀계 곡간 탐색 기법

    Get PDF
    ν•™μœ„λ…Όλ¬Έ (박사)-- μ„œμšΈλŒ€ν•™κ΅ λŒ€ν•™μ› : 전기·컴퓨터곡학뢀, 2016. 8. ν•˜μˆœνšŒ.ν•˜λ‚˜μ˜ 칩에 μ§‘μ λ˜λŠ” ν”„λ‘œμ„Έμ„œμ˜ κ°œμˆ˜κ°€ λ§Žμ•„μ§€κ³ , λ§Žμ€ κΈ°λŠ₯듀이 톡합됨에 따라, μ—°μ‚°μ–‘μ˜ λ³€ν™”, μ„œλΉ„μŠ€μ˜ ν’ˆμ§ˆ, μ˜ˆμƒμΉ˜ λͺ»ν•œ μ‹œμŠ€ν…œ μš”μ†Œμ˜ κ³ μž₯ λ“±κ³Ό 같은 λ‹€μ–‘ν•œ μš”μ†Œλ“€μ— μ˜ν•΄ μ‹œμŠ€ν…œμ˜ μƒνƒœκ°€ λ™μ μœΌλ‘œ λ³€ν™”ν•˜κ²Œ λœλ‹€. λ°˜λ©΄μ—, λ³Έ λ…Όλ¬Έμ—μ„œ 주된 관심사λ₯Ό κ°€μ§€λŠ” 슀마트 폰 μž₯μΉ˜μ—μ„œ 주둜 μ‚¬μš©λ˜λŠ” λΉ„λ””μ˜€, κ·Έλž˜ν”½ μ‘μš©λ“€μ˜ 경우, 계산 λ³΅μž‘λ„κ°€ μ§€μ†μ μœΌλ‘œ μ¦κ°€ν•˜κ³  μžˆλ‹€. λ”°λΌμ„œ, μ΄λ ‡κ²Œ λ™μ μœΌλ‘œ λ³€ν•˜λŠ” ν–‰μœ„λ₯Ό κ°€μ§€λ©΄μ„œλ„ 병렬성을 λ‚΄μ œν•œ 계산 집약적인 연산을 ν¬ν•¨ν•˜λŠ” λ³΅μž‘ν•œ μ‹œμŠ€ν…œμ„ κ΅¬ν˜„ν•˜κΈ° μœ„ν•΄μ„œλŠ” 체계적인 섀계 방법둠이 κ³ λ„λ‘œ μš”κ΅¬λœλ‹€. λͺ¨λΈ 기반 방법둠은 병렬 μž„λ² λ””λ“œ μ†Œν”„νŠΈμ›¨μ–΄ κ°œλ°œμ„ μœ„ν•œ λŒ€ν‘œμ μΈ 방법 쀑 ν•˜λ‚˜μ΄λ‹€. 특히, μ‹œμŠ€ν…œ λͺ…μ„Έ, 정적 μ„±λŠ₯ 뢄석, 섀계 곡간 탐색, 그리고 μžλ™ μ½”λ“œ μƒμ„±κΉŒμ§€μ˜ λͺ¨λ“  섀계 단계λ₯Ό μ§€μ›ν•˜λŠ” 병렬 μž„λ² λ””λ“œ μ†Œν”„νŠΈμ›¨μ–΄ 섀계 ν™˜κ²½μœΌλ‘œμ„œ, HOPES ν”„λ ˆμž„μ›Œν¬κ°€ μ œμ‹œλ˜μ—ˆλ‹€. λ‹€λ₯Έ 섀계 ν™˜κ²½λ“€κ³ΌλŠ” λ‹€λ₯΄κ²Œ, 이기쒅 λ©€ν‹°ν”„λ‘œμ„Έμ„œ μ•„ν‚€ν…μ²˜μ—μ„œμ˜ 일반적인 μˆ˜ν–‰ λͺ¨λΈλ‘œμ„œ, 곡톡 쀑간 μ½”λ“œ (CIC) 라고 λΆ€λ₯΄λŠ” ν”„λ‘œκ·Έλž˜λ° ν”Œλž«νΌμ΄λΌλŠ” μƒˆλ‘œμš΄ κ°œλ…μ„ μ†Œκ°œν•˜μ˜€λ‹€. CIC νƒœμŠ€ν¬ λͺ¨λΈμ€ ν”„λ‘œμ„ΈμŠ€ λ„€νŠΈμ›Œν¬ λͺ¨λΈμ— κΈ°λ°˜ν•˜κ³  μžˆμ§€λ§Œ, SDF λͺ¨λΈλ‘œ ꡬ체화될 수 있기 λ•Œλ¬Έμ—, 병렬 처리뿐만 μ•„λ‹ˆλΌ 정적 뢄석이 μš©μ΄ν•˜λ‹€λŠ” μž₯점을 가진닀. ν•˜μ§€λ§Œ, SDF λͺ¨λΈμ€ μ‘μš©μ˜ 동적인 ν–‰μœ„λ₯Ό λͺ…μ„Έν•  수 μ—†λ‹€λŠ” ν‘œν˜„μƒμ˜ μ œμ•½μ„ 가진닀. μ΄λŸ¬ν•œ μ œμ•½μ„ κ·Ήλ³΅ν•˜κ³ , μ‹œμŠ€ν…œμ˜ 동적 ν–‰μœ„λ₯Ό μ‘μš© 외뢀와 λ‚΄λΆ€λ‘œ κ΅¬λΆ„ν•˜μ—¬ λͺ…μ„Έν•˜κΈ° μœ„ν•΄, λ³Έ λ…Όλ¬Έμ—μ„œλŠ” 데이터 ν”Œλ‘œμš°μ™€ μœ ν•œμƒνƒœκΈ° (FSM) λͺ¨λΈμ— κΈ°λ°˜ν•˜μ—¬ ν™•μž₯된 CIC νƒœμŠ€ν¬ λͺ¨λΈμ„ μ œμ•ˆν•œλ‹€. μƒμœ„ μˆ˜μ€€μ—μ„œλŠ”, 각 μ‘μš©μ€ 데이터 ν”Œλ‘œμš° νƒœμŠ€ν¬λ‘œ λͺ…μ„Έ 되며, 동적 ν–‰μœ„λŠ” μ‘μš©λ“€μ˜ μˆ˜ν–‰μ„ κ°λ…ν•˜λŠ” μ œμ–΄ νƒœμŠ€ν¬λ‘œ λͺ¨λΈ λœλ‹€. 데이터 ν”Œλ‘œμš° νƒœμŠ€ν¬ λ‚΄λΆ€λŠ”, μœ ν•œμƒνƒœκΈ° 기반의 SADF λͺ¨λΈκ³Ό μœ μ‚¬ν•œ ν˜•νƒœλ‘œ 동적 ν–‰μœ„κ°€ λͺ…μ„Έ λœλ‹€SDF νƒœμŠ€ν¬λŠ” 볡수개의 ν–‰μœ„λ₯Ό κ°€μ§ˆ 수 있으며, λͺ¨λ“œ μ „ν™˜κΈ° (MTM)이라고 λΆˆλ¦¬λŠ” μœ ν•œ μƒνƒœκΈ°μ˜ ν…Œμ΄λΈ” ν˜•νƒœμ˜ λͺ…μ„Έλ₯Ό 톡해 SDF κ·Έλž˜ν”„μ˜ λͺ¨λ“œ μ „ν™˜ κ·œμΉ™μ„ λͺ…μ„Έ ν•œλ‹€. 이λ₯Ό MTM-SDF κ·Έλž˜ν”„λΌκ³  λΆ€λ₯΄λ©°, 볡수 λͺ¨λ“œ 데이터 ν”Œλ‘œμš° λͺ¨λΈ 쀑 ν•˜λ‚˜λΌ κ΅¬λΆ„λœλ‹€. μ‘μš©μ€ μœ ν•œν•œ ν–‰μœ„ (λ˜λŠ” λͺ¨λ“œ)λ₯Ό 가지며, 각 ν–‰μœ„ (λͺ¨λ“œ)λŠ” SDF κ·Έλž˜ν”„λ‘œ ν‘œν˜„λ˜λŠ” 것을 κ°€μ •ν•œλ‹€. 이λ₯Ό 톡해 λ‹€μ–‘ν•œ ν”„λ‘œμ„Έμ„œ κ°œμˆ˜μ— λŒ€ν•΄ λ‹¨μœ„μ‹œκ°„λ‹Ή μ²˜λ¦¬λŸ‰μ„ μ΅œλŒ€ν™”ν•˜λŠ” 컴파일-μ‹œκ°„ μŠ€μΌ€μ€„λ§μ„ μˆ˜ν–‰ν•˜κ³ , μŠ€μΌ€μ€„ κ²°κ³Όλ₯Ό μ €μž₯ν•  수 μžˆλ„λ‘ ν•œλ‹€. λ˜ν•œ, 볡수 λͺ¨λ“œ 데이터 ν”Œλ‘œμš° κ·Έλž˜ν”„λ₯Ό μœ„ν•œ λ©€ν‹°ν”„λ‘œμ„Έμ„œ μŠ€μΌ€μ€„λ§ 기법을 μ œμ‹œν•œλ‹€. 볡수 λͺ¨λ“œ 데이터 ν”Œλ‘œμš° κ·Έλž˜ν”„λ₯Ό μœ„ν•œ λͺ‡λͺ‡ μŠ€μΌ€μ€„λ§ 기법듀이 μ‘΄μž¬ν•˜μ§€λ§Œ, λͺ¨λ“œ 사이에 νƒœμŠ€ν¬ 이주λ₯Ό ν—ˆμš©ν•œ 기법듀은 μ‘΄μž¬ν•˜μ§€ μ•ŠλŠ”λ‹€. ν•˜μ§€λ§Œ νƒœμŠ€ν¬ 이주λ₯Ό ν—ˆμš©ν•˜κ²Œ 되면 μžμ› μš”κ΅¬λŸ‰μ„ 쀄일 수 μžˆλ‹€λŠ” λ°œκ²¬μ„ 톡해, λ³Έ λ…Όλ¬Έμ—μ„œλŠ” λͺ¨λ“œ μ‚¬μ΄μ˜ νƒœμŠ€ν¬ 이주λ₯Ό ν—ˆμš©ν•˜λŠ” 볡수 λͺ¨λ“œ 데이터 ν”Œλ‘œμš° κ·Έλž˜ν”„λ₯Ό μœ„ν•œ λ©€ν‹°ν”„λ‘œμ„Έμ„œ μŠ€μΌ€μ€„λ§ 기법을 μ œμ•ˆν•œλ‹€. μœ μ „ μ•Œκ³ λ¦¬μ¦˜μ— κΈ°λ°˜ν•˜μ—¬, μ œμ•ˆν•˜λŠ” 기법은 μžμ› μš”κ΅¬λŸ‰μ„ μ΅œμ†Œν™”ν•˜κΈ° μœ„ν•΄ 각 λͺ¨λ“œμ— ν•΄λ‹Ήν•˜λŠ” λͺ¨λ“  SDF κ·Έλž˜ν”„λ₯Ό λ™μ‹œμ— μŠ€μΌ€μ€„ ν•œλ‹€. 주어진 λ‹¨μœ„ μ‹œκ°„λ‹Ή μ²˜λ¦¬λŸ‰ μ œμ•½μ„ λ§Œμ‘±μ‹œν‚€κΈ° μœ„ν•΄, μ œμ•ˆν•˜λŠ” 기법은 각 λͺ¨λ“œ λ³„λ‘œ μ‹€μ œ μ²˜λ¦¬λŸ‰ μš”κ΅¬λŸ‰μ„ κ³„μ‚°ν•˜λ©°, μ²˜λ¦¬λŸ‰μ˜ λΆˆκ·œμΉ™μ„±μ„ μ™„ν™”ν•˜κΈ° μœ„ν•œ 좜λ ₯ λ²„νΌμ˜ 크기λ₯Ό κ³„μ‚°ν•œλ‹€. λͺ…μ„Έλœ νƒœμŠ€ν¬ κ·Έλž˜ν”„μ™€ μŠ€μΌ€μ€„ κ²°κ³Όλ‘œλΆ€ν„°, HOPES ν”„λ ˆμž„μ›Œν¬λŠ” λŒ€μƒ μ•„ν‚€ν…μ²˜λ₯Ό μœ„ν•œ μžλ™ μ½”λ“œ 생성을 μ§€μ›ν•œλ‹€. 이λ₯Ό μœ„ν•΄ μžλ™ μ½”λ“œ μƒμ„±κΈ°λŠ” CIC νƒœμŠ€ν¬ λͺ¨λΈμ˜ ν™•μž₯된 νŠΉμ§•λ“€μ„ μ§€μ›ν•˜λ„λ‘ ν™•μž₯λ˜μ—ˆλ‹€. μ‘μš© μˆ˜μ€€μ—μ„œλŠ” MTM-SDF κ·Έλž˜ν”„λ₯Ό 주어진 정적 μŠ€μΌ€μ€„λ§ κ²°κ³Όλ₯Ό λ”°λ₯΄λŠ” λ©€ν‹°ν”„λ‘œμ„Έμ„œ μ½”λ“œλ₯Ό μƒμ„±ν•˜λ„λ‘ ν™•μž₯λ˜μ—ˆλ‹€. λ˜ν•œ, λ„€ 가지 μ„œλ‘œ λ‹€λ₯Έ μŠ€μΌ€μ€„λ§ μ •μ±… (fully-static, self-timed, static-assignment, fully-dynamic)에 λŒ€ν•œ λ©€ν‹°ν”„λ‘œμ„Έμ„œ μ½”λ“œ 생성을 μ§€μ›ν•œλ‹€. μ‹œμŠ€ν…œ μˆ˜μ€€μ—μ„œλŠ” μ§€μ›ν•˜λŠ” μ‹œμŠ€ν…œ μš”μ²­ API에 λŒ€ν•œ μ‹€μ œ κ΅¬ν˜„ μ½”λ“œλ₯Ό μƒμ„±ν•˜λ©°, 정적 μŠ€μΌ€μ€„ 결과와 νƒœμŠ€ν¬λ“€μ˜ μ œμ–΄ κ°€λŠ₯ν•œ 속성듀에 λŒ€ν•œ 자료 ꡬ쑰 μ½”λ“œλ₯Ό μƒμ„±ν•œλ‹€. 볡수 λͺ¨λ“œ λ©€ν‹°λ―Έλ””μ–΄ 터미널 예제λ₯Ό ν†΅ν•œ 기초적인 μ‹€ν—˜λ“€μ„ 톡해, μ œμ•ˆν•˜λŠ” λ°©λ²•λ‘ μ˜ 타당성을 보인닀.As the number of processors in a chip increases, and more functions are integrated, the system status will change dynamically due to various factors such as the workload variation, QoS requirement, and unexpected component failure. On the other hand, computation-complexity of user applications is also steadily increasingvideo and graphics applications are two major driving forces in smart mobile devices, which define the main application domain of interest in this dissertation. So, a systematic design methodology is highly required to implement such complex systems which contain dynamically changed behavior as well as computation-intensive workload that can be parallelized. A model-based approach is one of representative approaches for parallel embedded software development. Especially, HOPES framework is proposed which is a design environment for parallel embedded software supporting the overall design steps: system specification, performance estimation, design space exploration, and automatic code generation. Distinguished from other design environments, it introduces a novel concept of programming platform, called CIC (Common Intermediate Code) that can be understood as a generic execution model of heterogeneous multiprocessor architecture. The CIC task model is based on a process network model, but it can be refined to the SDF (Synchronous Data Flow) model, since it has a very desirable features for static analyzability as well as parallel processing. However, the SDF model has a typical weakness of expression capability, especially for the system-level specification and dynamically changed behavior of an application. To overcome this weakness, in this dissertation, we propose an extended CIC task model based on dataflow and FSM models to specify the dynamic behavior of the system distinguishing inter- and intra-application dynamism. At the top-level, each application is specified by a dataflow task and the dynamic behavior is modeled as a control task that supervises the execution of applications. Inside a dataflow task, it specifies the dynamic behavior using a similar way as FSM-based SADFan SDF task may have multiple behaviors and a tabular specification of an FSM, called MTM (Mode Transition Machine), describes the mode transition rules for the SDF graph. We call it to MTM-SDF model which is classified as multi-mode dataflow models in the dissertation. It assumes that an application has a finite number of behaviors (or modes) and each behavior (mode) is represented by an SDF graph. It enables us to perform compile-time scheduling of each graph to maximize the throughput varying the number of allocated processors, and store the scheduling information. Also, a multiprocessor scheduling technique is proposed for a multi-mode dataflow graph. While there exist several scheduling techniques for multi-mode dataflow models, no one allows task migration between modes. By observing that the resource requirement can be additionally reduced if task migration is allowed, we propose a multiprocessor scheduling technique of a multi-mode dataflow graph considering task migration between modes. Based on a genetic algorithm, the proposed technique schedules all SDF graphs in all modes simultaneously to minimize the resource requirement. To satisfy the throughput constraint, the proposed technique calculates the actual throughput requirement of each mode and the output buffer size for tolerating throughput jitter. For the specified task graph and scheduling results, the CIC translator generates parallelized code for the target architecture. Therefore the CIC translator is extended to support extended features of the CIC task model. In application-level, it is extended to support multiprocessor code generation for an MTM-SDF graph considering the given static scheduling results. Also, multiprocessor code generation of four different scheduling policies are supported for an MTM-SDF graph: fully-static, self-timed, static-assignment, and fully-dynamic. In system-level, the CIC translator is extended to support code generation for implementation of system request APIs and data structures for the static scheduling results and configurable task parameters. Through preliminary experiments with a multi-mode multimedia terminal example, the viability of the proposed methodology is verified.Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Contribution 7 1.3 Dissertation organization 9 Chapter 2 Background 10 2.1 Related work 10 2.1.1 Compiler-based approach 10 2.1.2 Language-based approach 11 2.1.3 Model-based approach 15 2.2 HOPES framework 19 2.3 Common Intermediate Code (CIC) Model 21 Chapter 3 Dynamic Behavior Specification 26 3.1 Problem definition 26 3.1.1 System-level dynamic behavior 26 3.1.2 Application-level dynamic behavior 27 3.2 Related work 28 3.3 Motivational example 31 3.4 Control task specification for system-level dynamism 33 3.4.1 Internal specification 33 3.4.2 Action scripts 38 3.5 MTM-SDF specification for application-level dynamism 44 3.5.1 MTM specification 44 3.5.2 Task graph specification 45 3.5.3 Execution semantic of an MTM-SDF graph 46 Chapter 4 Multiprocessor Scheduling of an Multi-mode Dataflow Graph 50 4.1 Related work 51 4.2 Motivational example 56 4.2.1 Throughput requirement calculation considering mode transition delay 56 4.2.2 Task migration between mode transition 58 4.3 Problem definition 61 4.4 Throughput requirement analysis 65 4.4.1 Mode transition delay 66 4.4.2 Arrival curves of the output buffer 70 4.4.3 Buffer size determination 71 4.4.4 Throughput requirement analysis 73 4.5 Proposed MMDF scheduling framework 75 4.5.1 Optimization problem 75 4.5.2 GA configuration 76 4.5.3 Fitness function 78 4.5.4 Local optimization technique 79 4.6 Experimental results 81 4.6.1 MMDF scheduling technique 83 4.6.2 Scalability of the Proposed Framework 88 Chapter 5 Multiprocessor Code Generation for the Extended CIC Model 89 5.1 CIC translator 89 5.2 Code generation for application-level dynamism 91 5.2.1 Function call-style code generation (fully-static, self-timed) 94 5.2.2 Thread-style code generation (static-assignment, fully-dynamic) 98 5.3 Code generation for system-level dynamism 101 5.4 Experimental results 105 Chapter 6 Conclusion and Future Work 107 Bibliography 109 초둝 125Docto

    MURAC: A unified machine model for heterogeneous computers

    Get PDF
    Includes bibliographical referencesHeterogeneous computing enables the performance and energy advantages of multiple distinct processing architectures to be efficiently exploited within a single machine. These systems are capable of delivering large performance increases by matching the applications to architectures that are most suited to them. The Multiple Runtime-reconfigurable Architecture Computer (MURAC) model has been proposed to tackle the problems commonly found in the design and usage of these machines. This model presents a system-level approach that creates a clear separation of concerns between the system implementer and the application developer. The three key concepts that make up the MURAC model are a unified machine model, a unified instruction stream and a unified memory space. A simple programming model built upon these abstractions provides a consistent interface for interacting with the underlying machine to the user application. This programming model simplifies application partitioning between hardware and software and allows the easy integration of different execution models within the single control ow of a mixed-architecture application. The theoretical and practical trade-offs of the proposed model have been explored through the design of several systems. An instruction-accurate system simulator has been developed that supports the simulated execution of mixed-architecture applications. An embedded System-on-Chip implementation has been used to measure the overhead in hardware resources required to support the model, which was found to be minimal. An implementation of the model within an operating system on a tightly-coupled reconfigurable processor platform has been created. This implementation is used to extend the software scheduler to allow for the full support of mixed-architecture applications in a multitasking environment. Different scheduling strategies have been tested using this scheduler for mixed-architecture applications. The design and implementation of these systems has shown that a unified abstraction model for heterogeneous computers provides important usability benefits to system and application designers. These benefits are achieved through a consistent view of the multiple different architectures to the operating system and user applications. This allows them to focus on achieving their performance and efficiency goals by gaining the benefits of different execution models during runtime without the complex implementation details of the system-level synchronisation and coordination
    • …
    corecore