481 research outputs found
μ€μκ° μλ² λλ μμ€ν μ μν λμ νμ λͺ μΈ λ° μ€κ³ κ³΅κ° νμ κΈ°λ²
νμλ
Όλ¬Έ (λ°μ¬)-- μμΈλνκ΅ λνμ : μ κΈ°Β·μ»΄ν¨ν°κ³΅νλΆ, 2016. 8. νμν.νλμ μΉ©μ μ§μ λλ νλ‘μΈμμ κ°μκ° λ§μμ§κ³ , λ§μ κΈ°λ₯λ€μ΄ ν΅ν©λ¨μ λ°λΌ, μ°μ°μμ λ³ν, μλΉμ€μ νμ§, μμμΉ λͺ»ν μμ€ν
μμμ κ³ μ₯ λ±κ³Ό κ°μ λ€μν μμλ€μ μν΄ μμ€ν
μ μνκ° λμ μΌλ‘ λ³ννκ² λλ€. λ°λ©΄μ, λ³Έ λ
Όλ¬Έμμ μ£Όλ κ΄μ¬μ¬λ₯Ό κ°μ§λ μ€λ§νΈ ν° μ₯μΉμμ μ£Όλ‘ μ¬μ©λλ λΉλμ€, κ·Έλν½ μμ©λ€μ κ²½μ°, κ³μ° 볡μ‘λκ° μ§μμ μΌλ‘ μ¦κ°νκ³ μλ€. λ°λΌμ, μ΄λ κ² λμ μΌλ‘ λ³νλ νμλ₯Ό κ°μ§λ©΄μλ λ³λ ¬μ±μ λ΄μ ν κ³μ° μ§μ½μ μΈ μ°μ°μ ν¬ν¨νλ 볡μ‘ν μμ€ν
μ ꡬννκΈ° μν΄μλ 체κ³μ μΈ μ€κ³ λ°©λ²λ‘ μ΄ κ³ λλ‘ μꡬλλ€.
λͺ¨λΈ κΈ°λ° λ°©λ²λ‘ μ λ³λ ¬ μλ² λλ μννΈμ¨μ΄ κ°λ°μ μν λνμ μΈ λ°©λ² μ€ νλμ΄λ€. νΉν, μμ€ν
λͺ
μΈ, μ μ μ±λ₯ λΆμ, μ€κ³ κ³΅κ° νμ, κ·Έλ¦¬κ³ μλ μ½λ μμ±κΉμ§μ λͺ¨λ μ€κ³ λ¨κ³λ₯Ό μ§μνλ λ³λ ¬ μλ² λλ μννΈμ¨μ΄ μ€κ³ νκ²½μΌλ‘μ, HOPES νλ μμν¬κ° μ μλμλ€. λ€λ₯Έ μ€κ³ νκ²½λ€κ³Όλ λ€λ₯΄κ², μ΄κΈ°μ’
λ©ν°νλ‘μΈμ μν€ν
μ²μμμ μΌλ°μ μΈ μν λͺ¨λΈλ‘μ, κ³΅ν΅ μ€κ° μ½λ (CIC) λΌκ³ λΆλ₯΄λ νλ‘κ·Έλλ° νλ«νΌμ΄λΌλ μλ‘μ΄ κ°λ
μ μκ°νμλ€. CIC νμ€ν¬ λͺ¨λΈμ νλ‘μΈμ€ λ€νΈμν¬ λͺ¨λΈμ κΈ°λ°νκ³ μμ§λ§, SDF λͺ¨λΈλ‘ ꡬ체νλ μ μκΈ° λλ¬Έμ, λ³λ ¬ μ²λ¦¬λΏλ§ μλλΌ μ μ λΆμμ΄ μ©μ΄νλ€λ μ₯μ μ κ°μ§λ€. νμ§λ§, SDF λͺ¨λΈμ μμ©μ λμ μΈ νμλ₯Ό λͺ
μΈν μ μλ€λ ννμμ μ μ½μ κ°μ§λ€.
μ΄λ¬ν μ μ½μ 극볡νκ³ , μμ€ν
μ λμ νμλ₯Ό μμ© μΈλΆμ λ΄λΆλ‘ ꡬλΆνμ¬ λͺ
μΈνκΈ° μν΄, λ³Έ λ
Όλ¬Έμμλ λ°μ΄ν° νλ‘μ°μ μ νμνκΈ° (FSM) λͺ¨λΈμ κΈ°λ°νμ¬ νμ₯λ CIC νμ€ν¬ λͺ¨λΈμ μ μνλ€. μμ μμ€μμλ, κ° μμ©μ λ°μ΄ν° νλ‘μ° νμ€ν¬λ‘ λͺ
μΈ λλ©°, λμ νμλ μμ©λ€μ μνμ κ°λ
νλ μ μ΄ νμ€ν¬λ‘ λͺ¨λΈ λλ€. λ°μ΄ν° νλ‘μ° νμ€ν¬ λ΄λΆλ, μ νμνκΈ° κΈ°λ°μ SADF λͺ¨λΈκ³Ό μ μ¬ν ννλ‘ λμ νμκ° λͺ
μΈ λλ€SDF νμ€ν¬λ 볡μκ°μ νμλ₯Ό κ°μ§ μ μμΌλ©°, λͺ¨λ μ νκΈ° (MTM)μ΄λΌκ³ λΆλ¦¬λ μ ν μνκΈ°μ ν
μ΄λΈ ννμ λͺ
μΈλ₯Ό ν΅ν΄ SDF κ·Έλνμ λͺ¨λ μ ν κ·μΉμ λͺ
μΈ νλ€. μ΄λ₯Ό MTM-SDF κ·ΈλνλΌκ³ λΆλ₯΄λ©°, 볡μ λͺ¨λ λ°μ΄ν° νλ‘μ° λͺ¨λΈ μ€ νλλΌ κ΅¬λΆλλ€. μμ©μ μ νν νμ (λλ λͺ¨λ)λ₯Ό κ°μ§λ©°, κ° νμ (λͺ¨λ)λ SDF κ·Έλνλ‘ ννλλ κ²μ κ°μ νλ€. μ΄λ₯Ό ν΅ν΄ λ€μν νλ‘μΈμ κ°μμ λν΄ λ¨μμκ°λΉ μ²λ¦¬λμ μ΅λννλ μ»΄νμΌ-μκ° μ€μΌμ€λ§μ μννκ³ , μ€μΌμ€ κ²°κ³Όλ₯Ό μ μ₯ν μ μλλ‘ νλ€.
λν, 볡μ λͺ¨λ λ°μ΄ν° νλ‘μ° κ·Έλνλ₯Ό μν λ©ν°νλ‘μΈμ μ€μΌμ€λ§ κΈ°λ²μ μ μνλ€. 볡μ λͺ¨λ λ°μ΄ν° νλ‘μ° κ·Έλνλ₯Ό μν λͺλͺ μ€μΌμ€λ§ κΈ°λ²λ€μ΄ μ‘΄μ¬νμ§λ§, λͺ¨λ μ¬μ΄μ νμ€ν¬ μ΄μ£Όλ₯Ό νμ©ν κΈ°λ²λ€μ μ‘΄μ¬νμ§ μλλ€. νμ§λ§ νμ€ν¬ μ΄μ£Όλ₯Ό νμ©νκ² λλ©΄ μμ μꡬλμ μ€μΌ μ μλ€λ λ°κ²¬μ ν΅ν΄, λ³Έ λ
Όλ¬Έμμλ λͺ¨λ μ¬μ΄μ νμ€ν¬ μ΄μ£Όλ₯Ό νμ©νλ 볡μ λͺ¨λ λ°μ΄ν° νλ‘μ° κ·Έλνλ₯Ό μν λ©ν°νλ‘μΈμ μ€μΌμ€λ§ κΈ°λ²μ μ μνλ€. μ μ μκ³ λ¦¬μ¦μ κΈ°λ°νμ¬, μ μνλ κΈ°λ²μ μμ μꡬλμ μ΅μννκΈ° μν΄ κ° λͺ¨λμ ν΄λΉνλ λͺ¨λ SDF κ·Έλνλ₯Ό λμμ μ€μΌμ€ νλ€. μ£Όμ΄μ§ λ¨μ μκ°λΉ μ²λ¦¬λ μ μ½μ λ§μ‘±μν€κΈ° μν΄, μ μνλ κΈ°λ²μ κ° λͺ¨λ λ³λ‘ μ€μ μ²λ¦¬λ μꡬλμ κ³μ°νλ©°, μ²λ¦¬λμ λΆκ·μΉμ±μ μννκΈ° μν μΆλ ₯ λ²νΌμ ν¬κΈ°λ₯Ό κ³μ°νλ€.
λͺ
μΈλ νμ€ν¬ κ·Έλνμ μ€μΌμ€ κ²°κ³Όλ‘λΆν°, HOPES νλ μμν¬λ λμ μν€ν
μ²λ₯Ό μν μλ μ½λ μμ±μ μ§μνλ€. μ΄λ₯Ό μν΄ μλ μ½λ μμ±κΈ°λ CIC νμ€ν¬ λͺ¨λΈμ νμ₯λ νΉμ§λ€μ μ§μνλλ‘ νμ₯λμλ€. μμ© μμ€μμλ MTM-SDF κ·Έλνλ₯Ό μ£Όμ΄μ§ μ μ μ€μΌμ€λ§ κ²°κ³Όλ₯Ό λ°λ₯΄λ λ©ν°νλ‘μΈμ μ½λλ₯Ό μμ±νλλ‘ νμ₯λμλ€. λν, λ€ κ°μ§ μλ‘ λ€λ₯Έ μ€μΌμ€λ§ μ μ±
(fully-static, self-timed, static-assignment, fully-dynamic)μ λν λ©ν°νλ‘μΈμ μ½λ μμ±μ μ§μνλ€. μμ€ν
μμ€μμλ μ§μνλ μμ€ν
μμ² APIμ λν μ€μ ꡬν μ½λλ₯Ό μμ±νλ©°, μ μ μ€μΌμ€ κ²°κ³Όμ νμ€ν¬λ€μ μ μ΄ κ°λ₯ν μμ±λ€μ λν μλ£ κ΅¬μ‘° μ½λλ₯Ό μμ±νλ€.
볡μ λͺ¨λ λ©ν°λ―Έλμ΄ ν°λ―Έλ μμ λ₯Ό ν΅ν κΈ°μ΄μ μΈ μ€νλ€μ ν΅ν΄, μ μνλ λ°©λ²λ‘ μ νλΉμ±μ 보μΈλ€.As the number of processors in a chip increases, and more functions are integrated, the system status will change dynamically due to various factors such as the workload variation, QoS requirement, and unexpected component failure. On the other hand, computation-complexity of user applications is also steadily increasingvideo and graphics applications are two major driving forces in smart mobile devices, which define the main application domain of interest in this dissertation. So, a systematic design methodology is highly required to implement such complex systems which contain dynamically changed behavior as well as computation-intensive workload that can be parallelized.
A model-based approach is one of representative approaches for parallel embedded software development. Especially, HOPES framework is proposed which is a design environment for parallel embedded software supporting the overall design steps: system specification, performance estimation, design space exploration, and automatic code generation. Distinguished from other design environments, it introduces a novel concept of programming platform, called CIC (Common Intermediate Code) that can be understood as a generic execution model of heterogeneous multiprocessor architecture. The CIC task model is based on a process network model, but it can be refined to the SDF (Synchronous Data Flow) model, since it has a very desirable features for static analyzability as well as parallel processing. However, the SDF model has a typical weakness of expression capability, especially for the system-level specification and dynamically changed behavior of an application.
To overcome this weakness, in this dissertation, we propose an extended CIC task model based on dataflow and FSM models to specify the dynamic behavior of the system distinguishing inter- and intra-application dynamism. At the top-level, each application is specified by a dataflow task and the dynamic behavior is modeled as a control task that supervises the execution of applications. Inside a dataflow task, it specifies the dynamic behavior using a similar way as FSM-based SADFan SDF task may have multiple behaviors and a tabular specification of an FSM, called MTM (Mode Transition Machine), describes the mode transition rules for the SDF graph. We call it to MTM-SDF model which is classified as multi-mode dataflow models in the dissertation. It assumes that an application has a finite number of behaviors (or modes) and each behavior (mode) is represented by an SDF graph. It enables us to perform compile-time scheduling of each graph to maximize the throughput varying the number of allocated processors, and store the scheduling information.
Also, a multiprocessor scheduling technique is proposed for a multi-mode dataflow graph. While there exist several scheduling techniques for multi-mode dataflow models, no one allows task migration between modes. By observing that the resource requirement can be additionally reduced if task migration is allowed, we propose a multiprocessor scheduling technique of a multi-mode dataflow graph considering task migration between modes. Based on a genetic algorithm, the proposed technique schedules all SDF graphs in all modes simultaneously to minimize the resource requirement. To satisfy the throughput constraint, the proposed technique calculates the actual throughput requirement of each mode and the output buffer size for tolerating throughput jitter.
For the specified task graph and scheduling results, the CIC translator generates parallelized code for the target architecture. Therefore the CIC translator is extended to support extended features of the CIC task model. In application-level, it is extended to support multiprocessor code generation for an MTM-SDF graph considering the given static scheduling results. Also, multiprocessor code generation of four different scheduling policies are supported for an MTM-SDF graph: fully-static, self-timed, static-assignment, and fully-dynamic. In system-level, the CIC translator is extended to support code generation for implementation of system request APIs and data structures for the static scheduling results and configurable task parameters.
Through preliminary experiments with a multi-mode multimedia terminal example, the viability of the proposed methodology is verified.Chapter 1 Introduction 1
1.1 Motivation 1
1.2 Contribution 7
1.3 Dissertation organization 9
Chapter 2 Background 10
2.1 Related work 10
2.1.1 Compiler-based approach 10
2.1.2 Language-based approach 11
2.1.3 Model-based approach 15
2.2 HOPES framework 19
2.3 Common Intermediate Code (CIC) Model 21
Chapter 3 Dynamic Behavior Specification 26
3.1 Problem definition 26
3.1.1 System-level dynamic behavior 26
3.1.2 Application-level dynamic behavior 27
3.2 Related work 28
3.3 Motivational example 31
3.4 Control task specification for system-level dynamism 33
3.4.1 Internal specification 33
3.4.2 Action scripts 38
3.5 MTM-SDF specification for application-level dynamism 44
3.5.1 MTM specification 44
3.5.2 Task graph specification 45
3.5.3 Execution semantic of an MTM-SDF graph 46
Chapter 4 Multiprocessor Scheduling of an Multi-mode Dataflow Graph 50
4.1 Related work 51
4.2 Motivational example 56
4.2.1 Throughput requirement calculation considering mode transition delay 56
4.2.2 Task migration between mode transition 58
4.3 Problem definition 61
4.4 Throughput requirement analysis 65
4.4.1 Mode transition delay 66
4.4.2 Arrival curves of the output buffer 70
4.4.3 Buffer size determination 71
4.4.4 Throughput requirement analysis 73
4.5 Proposed MMDF scheduling framework 75
4.5.1 Optimization problem 75
4.5.2 GA configuration 76
4.5.3 Fitness function 78
4.5.4 Local optimization technique 79
4.6 Experimental results 81
4.6.1 MMDF scheduling technique 83
4.6.2 Scalability of the Proposed Framework 88
Chapter 5 Multiprocessor Code Generation for the Extended CIC Model 89
5.1 CIC translator 89
5.2 Code generation for application-level dynamism 91
5.2.1 Function call-style code generation (fully-static, self-timed) 94
5.2.2 Thread-style code generation (static-assignment, fully-dynamic) 98
5.3 Code generation for system-level dynamism 101
5.4 Experimental results 105
Chapter 6 Conclusion and Future Work 107
Bibliography 109
μ΄λ‘ 125Docto
Modeling, Analysis, and Hard Real-time Scheduling of Adaptive Streaming Applications
In real-time systems, the application's behavior has to be predictable at
compile-time to guarantee timing constraints. However, modern streaming
applications which exhibit adaptive behavior due to mode switching at run-time,
may degrade system predictability due to unknown behavior of the application
during mode transitions. Therefore, proper temporal analysis during mode
transitions is imperative to preserve system predictability. To this end, in
this paper, we initially introduce Mode Aware Data Flow (MADF) which is our new
predictable Model of Computation (MoC) to efficiently capture the behavior of
adaptive streaming applications. Then, as an important part of the operational
semantics of MADF, we propose the Maximum-Overlap Offset (MOO) which is our
novel protocol for mode transitions. The main advantage of this transition
protocol is that, in contrast to self-timed transition protocols, it avoids
timing interference between modes upon mode transitions. As a result, any mode
transition can be analyzed independently from the mode transitions that
occurred in the past. Based on this transition protocol, we propose a hard
real-time analysis as well to guarantee timing constraints by avoiding
processor overloading during mode transitions. Therefore, using this protocol,
we can derive a lower bound and an upper bound on the earliest starting time of
the tasks in the new mode during mode transitions in such a way that hard
real-time constraints are respected.Comment: Accepted for presentation at EMSOFT 2018 and for publication in IEEE
Transactions on Computer-Aided Design of Integrated Circuits and Systems
(TCAD) as part of the ESWEEK-TCAD special issu
An accurate analysis for guaranteed performance of multiprocessor streaming applications
Already for more than a decade, consumer electronic devices have been available for entertainment, educational, or telecommunication tasks based on multimedia streaming applications, i.e., applications that process streams of audio and video samples in digital form. Multimedia capabilities are expected to become more and more commonplace in portable devices. This leads to challenges with respect to cost efficiency and quality. This thesis contributes models and analysis techniques for improving the cost efficiency, and therefore also the quality, of multimedia devices. Portable consumer electronic devices should feature flexible functionality on the one hand and low power consumption on the other hand. Those two requirements are conflicting. Therefore, we focus on a class of hardware that represents a good trade-off between those two requirements, namely on domain-specific multiprocessor systems-on-chip (MP-SoC). Our research work contributes to dynamic (i.e., run-time) optimization of MP-SoC system metrics. The central question in this area is how to ensure that real-time constraints are satisfied and the metric of interest such as perceived multimedia quality or power consumption is optimized. In these cases, we speak of quality-of-service (QoS) and power management, respectively. In this thesis, we pursue real-time constraint satisfaction that is guaranteed by the system by construction and proven mainly based on analytical reasoning. That approach is often taken in real-time systems to ensure reliable performance. Therefore the performance analysis has to be conservative, i.e. it has to use pessimistic assumptions on the unknown conditions that can negatively influence the system performance. We adopt this hypothesis as the foundation of this work. Therefore, the subject of this thesis is the analysis of guaranteed performance for multimedia applications running on multiprocessors. It is very important to note that our conservative approach is essentially different from considering only the worst-case state of the system. Unlike the worst-case approach, our approach is dynamic, i.e. it makes use of run-time characteristics of the input data and the environment of the application. The main purpose of our performance analysis method is to guide the run-time optimization. Typically, a resource or quality manager predicts the execution time, i.e., the time it takes the system to process a certain number of input data samples. When the execution times get smaller, due to dependency of the execution time on the input data, the manager can switch the control parameter for the metric of interest such that the metric improves but the system gets slower. For power optimization, that means switching to a low-power mode. If execution times grow, the manager can set parameters so that the system gets faster. For QoS management, for example, the application can be switched to a different quality mode with some degradation in perceived quality. The real-time constraints are then never violated and the metrics of interest are kept as good as possible. Unfortunately, maintaining system metrics such as power and quality at the optimal level contradicts with our main requirement, i.e., providing performance guarantees, because for this one has to give up some quality or power consumption. Therefore, the performance analysis approach developed in this thesis is not only conservative, but also accurate, so that the optimization of the metric of interest does not suffer too much from conservativity. This is not trivial to realize when two factors are combined: parallel execution on multiple processors and dynamic variation of the data-dependent execution delays. We achieve the goal of conservative and accurate performance estimation for an important class of multiprocessor platforms and multimedia applications. Our performance analysis technique is realizable in practice in QoS or power management setups. We consider a generic MP-SoC platform that runs a dynamic set of applications, each application possibly using multiple processors. We assume that the applications are independent, although it is possible to relax this requirement in the future. To support real-time constraints, we require that the platform can provide guaranteed computation, communication and memory budgets for applications. Following important trends in system-on-chip communication, we support both global buses and networks-on-chip. We represent every application as a homogeneous synchronous dataflow (HSDF) graph, where the application tasks are modeled as graph nodes, called actors. We allow dynamic datadependent actor execution delays, which makes HSDF graphs very useful to express modern streaming applications. Our reason to consider HSDF graphs is that they provide a good basic foundation for analytical performance estimation. In this setup, this thesis provides three major contributions: 1. Given an application mapped to an MP-SoC platform, given the performance guarantees for the individual computation units (the processors) and the communication unit (the network-on-chip), and given constant actor execution delays, we derive the throughput and the execution time of the system as a whole. 2. Given a mapped application and platform performance guarantees as in the previous item, we extend our approach for constant actor execution delays to dynamic datadependent actor delays. 3. We propose a global implementation trajectory that starts from the application specification and goes through design-time and run-time phases. It uses an extension of the HSDF model of computation to reflect the design decisions made along the trajectory. We present our model and trajectory not only to put the first two contributions into the right context, but also to present our vision on different parts of the trajectory, to make a complete and consistent story. Our first contribution uses the idea of so-called IPC (inter-processor communication) graphs known from the literature, whereby a single model of computation (i.e., HSDF graphs) are used to model not only the computation units, but also the communication unit (the global bus or the network-on-chip) and the FIFO (first-in-first-out) buffers that form a βglueβ between the computation and communication units. We were the first to propose HSDF graph structures for modeling bounded FIFO buffers and guaranteed throughput network connections for the network-on-chip communication in MP-SoCs. As a result, our HSDF models enable the formalization of the on-chip FIFO buffer capacity minimization problem under a throughput constraint as a graph-theoretic problem. Using HSDF graphs to formalize that problem helps to find the performance bottlenecks in a given solution to this problem and to improve this solution. To demonstrate this, we use the JPEG decoder application case study. Also, we show that, assuming constant β worst-case for the given JPEG image β actor delays, we can predict execution times of JPEG decoding on two processors with an accuracy of 21%. Our second contribution is based on an extension of the scenario approach. This approach is based on the observation that the dynamic behavior of an application is typically composed of a limited number of sub-behaviors, i.e., scenarios, that have similar resource requirements, i.e., similar actor execution delays in the context of this thesis. The previous work on scenarios treats only single-processor applications or multiprocessor applications that do not exploit all the flexibility of the HSDF model of computation. We develop new scenario-based techniques in the context of HSDF graphs, to derive the timing overlap between different scenarios, which is very important to achieve good accuracy for general HSDF graphs executing on multiprocessors. We exploit this idea in an application case study β the MPEG-4 arbitrarily-shaped video decoder, and demonstrate execution time prediction with an average accuracy of 11%. To the best of our knowledge, for the given setup, no other existing performance technique can provide a comparable accuracy and at the same time performance guarantees
λ³λ ¬ λ° λΆμ° μλ² λλ μμ€ν μ μν λͺ¨λΈ κΈ°λ° μ½λ μμ± νλ μμν¬
νμλ
Όλ¬Έ(λ°μ¬)--μμΈλνκ΅ λνμ :곡과λν μ»΄ν¨ν°κ³΅νλΆ,2020. 2. νμν.μννΈμ¨μ΄ μ€κ³ μμ°μ± λ° μ μ§λ³΄μμ±μ ν₯μμν€κΈ° μν΄ λ€μν μννΈμ¨μ΄ κ°λ° λ°©λ²λ‘ μ΄ μ μλμμ§λ§, λλΆλΆμ μ°κ΅¬λ μμ© μννΈμ¨μ΄λ₯Ό νλμ νλ‘μΈμμμ λμμν€λ λ°μ μ΄μ μ λ§μΆκ³ μλ€. λν, μλ² λλ μμ€ν
μ κ°λ°νλ λ°μ νμν μ§μ°μ΄λ μμ μꡬ μ¬νμ λν λΉκΈ°λ₯μ μꡬ μ¬νμ κ³ λ €νμ§ μκ³ μκΈ° λλ¬Έμ μΌλ°μ μΈ μννΈμ¨μ΄ κ°λ° λ°©λ²λ‘ μ μλ² λλ μννΈμ¨μ΄λ₯Ό κ°λ°νλ λ°μ μ μ©νλ κ²μ μ ν©νμ§ μλ€.
μ΄ λ
Όλ¬Έμμλ λ³λ ¬ λ° λΆμ° μλ² λλ μμ€ν
μ λμμΌλ‘ νλ μννΈμ¨μ΄λ₯Ό λͺ¨λΈλ‘ νννκ³ , μ΄λ₯Ό μννΈμ¨μ΄ λΆμμ΄λ κ°λ°μ νμ©νλ κ°λ° λ°©λ²λ‘ μ μκ°νλ€. μ°λ¦¬μ λͺ¨λΈμμ μμ© μννΈμ¨μ΄λ κ³μΈ΅μ μΌλ‘ ννν μ μλ μ¬λ¬ κ°μ νμ€ν¬λ‘ μ΄λ£¨μ΄μ Έ μμΌλ©°, νλμ¨μ΄ νλ«νΌκ³Ό λ
립μ μΌλ‘ λͺ
μΈνλ€. νμ€ν¬ κ°μ ν΅μ λ° λκΈ°νλ λͺ¨λΈμ΄ μ μν κ·μ½μ΄ μ ν΄μ Έ μκ³ , μ΄λ¬ν κ·μ½μ ν΅ν΄ μ€μ νλ‘κ·Έλ¨μ μ€ννκΈ° μ μ μννΈμ¨μ΄ μλ¬λ₯Ό μ μ λΆμμ ν΅ν΄ νμΈν μ μκ³ , μ΄λ μμ©μ κ²μ¦ 볡μ‘λλ₯Ό μ€μ΄λ λ°μ κΈ°μ¬νλ€. μ§μ ν νλμ¨μ΄ νλ«νΌμμ λμνλ νλ‘κ·Έλ¨μ νμ€ν¬λ€μ νλ‘μΈμμ 맀νν μ΄νμ μλμ μΌλ‘ ν©μ±ν μ μλ€.
μμ λͺ¨λΈ κΈ°λ° μννΈμ¨μ΄ κ°λ° λ°©λ²λ‘ μμ μ¬μ©νλ νλ‘κ·Έλ¨ ν©μ±κΈ°λ₯Ό λ³Έ λ
Όλ¬Έμμ μ μνμλλ°, λͺ
μΈν νλ«νΌ μꡬ μ¬νμ λ°νμΌλ‘ λ³λ ¬ λ° λΆμ° μλ² λλ μμ€ν
μμμ λμνλ μ½λλ₯Ό μμ±νλ€. μ¬λ¬ κ°μ μ νμ λͺ¨λΈλ€μ κ³μΈ΅μ μΌλ‘ νννμ¬ μμ©μ λμ ννλ₯Ό λνκ³ , ν©μ±κΈ°λ μ¬λ¬ λͺ¨λΈλ‘ ꡬμ±λ κ³μΈ΅μ μΈ λͺ¨λΈλ‘λΆν° λ³λ ¬μ±μ κ³ λ €νμ¬ νμ€ν¬λ₯Ό μ€νν μ μλ€. λν, νλ‘κ·Έλ¨ ν©μ±κΈ°μμ λ€μν νλ«νΌμ΄λ λ€νΈμν¬λ₯Ό μ§μν μ μλλ‘ μ½λλ₯Ό κ΄λ¦¬νλ λ°©λ²λ 보μ¬μ£Όκ³ μλ€. λ³Έ λ
Όλ¬Έμμ μ μνλ μννΈμ¨μ΄ κ°λ° λ°©λ²λ‘ μ 6κ°μ νλμ¨μ΄ νλ«νΌκ³Ό 3 μ’
λ₯μ λ€νΈμν¬λ‘ ꡬμ±λμ΄ μλ μ€μ κ°μ μννΈμ¨μ΄ μμ€ν
μμ© μμ μ μ΄μ’
λ©ν° νλ‘μΈμλ₯Ό νμ©νλ μ격 λ₯ λ¬λ μμ λ₯Ό μννμ¬ κ°λ° λ°©λ²λ‘ μ μ μ© κ°λ₯μ±μ μννμλ€. λν, νλ‘κ·Έλ¨ ν©μ±κΈ°κ° μλ‘μ΄ νλ«νΌμ΄λ λ€νΈμν¬λ₯Ό μ§μνκΈ° μν΄ νμλ‘ νλ κ°λ° λΉμ©λ μ€μ μΈ‘μ λ° μμΈ‘νμ¬ μλμ μΌλ‘ μ μ λ
Έλ ₯μΌλ‘ μλ‘μ΄ νλ«νΌμ μ§μν μ μμμ νμΈνμλ€.
λ§μ μλ² λλ μμ€ν
μμ μμμΉ λͺ»ν νλμ¨μ΄ μλ¬μ λν΄ κ²°ν¨μ κ°λ΄νλ κ²μ νμλ‘ νκΈ° λλ¬Έμ κ²°ν¨ κ°λ΄μ λν μ½λλ₯Ό μλμΌλ‘ μμ±νλ μ°κ΅¬λ μ§ννμλ€. λ³Έ κΈ°λ²μμ κ²°ν¨ κ°λ΄ μ€μ μ λ°λΌ νμ€ν¬ κ·Έλνλ₯Ό μμ νλ λ°©μμ νμ©νμμΌλ©°, κ²°ν¨ κ°λ΄μ λΉκΈ°λ₯μ μꡬ μ¬νμ μμ© κ°λ°μκ° μ½κ² μ μ©ν μ μλλ‘ νμλ€. λν, κ²°ν¨ κ°λ΄ μ§μνλ κ²κ³Ό κ΄λ ¨νμ¬ μ€μ μλμΌλ‘ ꡬννμ κ²½μ°μ λΉκ΅νμκ³ , κ²°ν¨ μ£Όμ
λꡬλ₯Ό μ΄μ©νμ¬ κ²°ν¨ λ°μ μλ리μ€λ₯Ό μ¬ννκ±°λ, μμλ‘ κ²°ν¨μ μ£Όμ
νλ μ€νμ μννμλ€.
λ§μ§λ§μΌλ‘ κ²°ν¨ κ°λ΄λ₯Ό μ€νν λμ νμ©ν κ²°ν¨ μ£Όμ
λꡬλ λ³Έ λ
Όλ¬Έμ λ λ€λ₯Έ κΈ°μ¬ μ¬ν μ€ νλλ‘ λ¦¬λ
μ€ νκ²½μΌλ‘ λμμΌλ‘ μμ© μμ λ° μ»€λ μμμ κ²°ν¨μ μ£Όμ
νλ λꡬλ₯Ό κ°λ°νμλ€. μμ€ν
μ κ²¬κ³ μ±μ κ²μ¦νκΈ° μν΄ κ²°ν¨μ μ£Όμ
νμ¬ κ²°ν¨ μλ리μ€λ₯Ό μ¬ννλ κ²μ λ리 μ¬μ©λλ λ°©λ²μΌλ‘, λ³Έ λ
Όλ¬Έμμ κ°λ°λ κ²°ν¨ μ£Όμ
λꡬλ μμ€ν
μ΄ λμνλ λμ€μ μ¬ν κ°λ₯ν κ²°ν¨μ μ£Όμ
ν μ μλ λꡬμ΄λ€. 컀λ μμμμμ κ²°ν¨ μ£Όμ
μ μν΄ λ μ’
λ₯μ κ²°ν¨ μ£Όμ
λ°©λ²μ μ 곡νλ©°, νλλ 컀λ GNU λλ²κ±°λ₯Ό μ΄μ©ν λ°©λ²μ΄κ³ , λ€λ₯Έ νλλ ARM νλμ¨μ΄ λΈλ μ΄ν¬ν¬μΈνΈλ₯Ό νμ©ν λ°©λ²μ΄λ€. μμ© μμμμ κ²°ν¨μ μ£Όμ
νκΈ° μν΄ GDB κΈ°λ° κ²°ν¨ μ£Όμ
λ°©λ²μ μ΄μ©νμ¬ λμΌ μμ€ν
νΉμ μ격 μμ€ν
μ μμ©μ κ²°ν¨μ μ£Όμ
ν μ μλ€. κ²°ν¨ μ£Όμ
λꡬμ λν μ€νμ ODROID-XU4 보λμμ μ§ννμλ€.While various software development methodologies have been proposed to increase the design productivity and maintainability of software, they usually focus on the development of application software running on a single processing element, without concern about the non-functional requirements of an embedded system such as latency and resource requirements.
In this thesis, we present a model-based software development method for parallel and distributed embedded systems. An application is specified as a set of tasks that follow a set of given rules for communication and synchronization in a hierarchical fashion, independently of the hardware platform. Having such rules enables us to perform static analysis to check some software errors at compile time to reduce the verification difficulty. Platform-specific program is synthesized automatically after mapping of tasks onto processing elements is determined.
The program synthesizer is also proposed to generate codes which satisfies platform requirements for parallel and distributed embedded systems. As multiple models which can express dynamic behaviors can be depicted hierarchically, the synthesizer supports to manage multiple task graphs with a different hierarchy to run tasks with parallelism. Also, the synthesizer shows methods of managing codes for heterogeneous platforms and generating various communication methods. The viability of the proposed software development method is verified with a real-life surveillance application that runs on six processing elements with three remote communication methods, and remote deep learning example is conducted to use heterogeneous multiprocessing components on distributed systems. Also, supporting a new platform and network requires a small effort by measuring and estimating development costs.
Since tolerance to unexpected errors is a required feature of many embedded systems, we also support an automatic fault-tolerant code generation. Fault tolerance can be applied by modifying the task graph based on the selected fault tolerance configurations, so the non-functional requirement of fault tolerance can be easily adopted by an application developer. To compare the effort of supporting fault tolerance, manual implementation of fault tolerance is performed. Also, the fault tolerance method is tested with the fault injection tool to emulate fault scenarios and inject faults randomly.
Our fault injection tool, which has used for testing our fault-tolerance method, is another work of this thesis. Emulating fault scenarios by intentionally injecting faults is commonly used to test and verify the robustness of a system. To emulate faults on an embedded system, we present a run-time fault injection framework that can inject a fault on both a kernel and application layer of Linux-based systems. For injecting faults on a kernel layer, two complementary fault injection techniques are used. One is based on Kernel GNU Debugger, and the other is using a hardware breakpoint supported by the ARM architecture. For application-level fault injection, the GDB-based fault injection method is used to inject a fault on a remote application. The viability of the proposed fault injection tool is proved by real-life experiments with an ODROID-XU4 system.Chapter 1 Introduction 1
1.1 Motivation 1
1.2 Contribution 6
1.3 Dissertation Organization 8
Chapter 2 Background 9
2.1 HOPES: Hope of Parallel Embedded Software 9
2.1.1 Software Development Procedure 9
2.1.2 Components of HOPES 12
2.2 Universal Execution Model 13
2.2.1 Task Graph Specification 13
2.2.2 Dataflow specification of an Application 15
2.2.3 Task Code Specification and Generic APIs 21
2.2.4 Meta-data Specification 23
Chapter 3 Program Synthesis for Parallel and Distributed Embedded Systems 24
3.1 Motivational Example 24
3.2 Program Synthesis Overview 26
3.3 Program Synthesis from Hierarchically-mixed Models 30
3.4 Platform Code Synthesis 33
3.5 Communication Code Synthesis 36
3.6 Experiments 40
3.6.1 Development Cost of Supporting New Platforms and Networks 40
3.6.2 Program Synthesis for the Surveillance System Example 44
3.6.3 Remote GPU-accelerated Deep Learning Example 46
3.7 Document Generation 48
3.8 Related Works 49
Chapter 4 Model Transformation for Fault-tolerant Code Synthesis 56
4.1 Fault-tolerant Code Synthesis Techniques 56
4.2 Applying Fault Tolerance Techniques in HOPES 61
4.3 Experiments 62
4.3.1 Development Cost of Applying Fault Tolerance 62
4.3.2 Fault Tolerance Experiments 62
4.4 Random Fault Injection Experiments 65
4.5 Related Works 68
Chapter 5 Fault Injection Framework for Linux-based Embedded Systems 70
5.1 Background 70
5.1.1 Fault Injection Techniques 70
5.1.2 Kernel GNU Debugger 71
5.1.3 ARM Hardware Breakpoint 72
5.2 Fault Injection Framework 74
5.2.1 Overview 74
5.2.2 Architecture 75
5.2.3 Fault Injection Techniques 79
5.2.4 Implementation 83
5.3 Experiments 90
5.3.1 Experiment Setup 90
5.3.2 Performance Comparison of Two Fault Injection Methods 90
5.3.3 Bit-flip Fault Experiments 92
5.3.4 eMMC Controller Fault Experiments 94
Chapter 6 Conclusion 97
Bibliography 99
μ μ½ 108Docto
Temporal analysis and scheduling of hard real-time radios running on a multi-processor
On a multi-radio baseband system, multiple independent transceivers must share the resources of a multi-processor, while meeting each its own hard real-time requirements. Not all possible combinations of transceivers are known at compile time, so a solution must be found that either allows for independent timing analysis or relies on runtime timing analysis. This thesis proposes a design flow and software architecture that meets these challenges, while enabling features such as independent transceiver compilation and dynamic loading, and taking into account other challenges such as ease of programming, efficiency, and ease of validation. We take data flow as the basic model of computation, as it fits the application domain, and several static variants (such as Single-Rate, Multi-Rate and Cyclo-Static) have been shown to possess strong analytical properties. Traditional temporal analysis of data flow can provide minimum throughput guarantees for a self-timed implementation of data flow. Since transceivers may need to guarantee strictly periodic execution and meet latency requirements, we extend the analysis techniques to show that we can enforce strict periodicity for an actor in the graph; we also provide maximum latency analysis techniques for periodic, sporadic and bursty sources. We propose a scheduling strategy and an automatic scheduling flow that enable the simultaneous execution of multiple transceivers with hard-realtime requirements, described as Single-Rate Data Flow (SRDF) graphs. Each transceiver has its own execution rate and starts and stops independently from other transceivers, at times unknown at compile time, on a multiprocessor. We show how to combine scheduling and mapping decisions with the input application data flow graph to generate a worst-case temporal analysis graph. We propose algorithms to find a mapping per transceiver in the form of clusters of statically-ordered actors, and a budget for either a Time Division Multiplex (TDM) or Non-Preemptive Non-Blocking Round Robin (NPNBRR) scheduler per cluster per transceiver. The budget is computed such that if the platform can provide it, then the desired minimum throughput and maximum latency of the transceiver are guaranteed, while minimizing the required processing resources. We illustrate the use of these techniques to map a combination of WLAN and TDS-CDMA receivers onto a prototype Software-Defined Radio platform. The functionality of transceivers for standards with very dynamic behavior β such as WLAN β cannot be conveniently modeled as an SRDF graph, since SRDF is not capable of expressing variations of actor firing rules depending on the values of input data. Because of this, we propose a restricted, customized data flow model of computation, Mode-Controlled Data Flow (MCDF), that can capture the data-value dependent behavior of a transceiver, while allowing rigorous temporal analysis, and tight resource budgeting. We develop a number of analysis techniques to characterize the temporal behavior of MCDF graphs, in terms of maximum latencies and throughput. We also provide an extension to MCDF of our scheduling strategy for SRDF. The capabilities of MCDF are then illustrated with a WLAN 802.11a receiver model. Having computed budgets for each transceiver, we propose a way to use these budgets for run-time resource mapping and admissibility analysis. During run-time, at transceiver start time, the budget for each cluster of statically-ordered actors is allocated by a resource manager to platform resources. The resource manager enforces strict admission control, to restrict transceivers from interfering with each otherβs worst-case temporal behaviors. We propose algorithms adapted from Vector Bin-Packing to enable the mapping at start time of transceivers to the multi-processor architecture, considering also the case where the processors are connected by a network on chip with resource reservation guarantees, in which case we also find routing and resource allocation on the network-on-chip. In our experiments, our resource allocation algorithms can keep 95% of the system resources occupied, while suffering from an allocation failure rate of less than 5%. An implementation of the framework was carried out on a prototype board. We present performance and memory utilization figures for this implementation, as they provide insights into the costs of adopting our approach. It turns out that the scheduling and synchronization overhead for an unoptimized implementation with no hardware support for synchronization of the framework is 16.3% of the cycle budget for a WLAN receiver on an EVP processor at 320 MHz. However, this overhead is less than 1% for mobile standards such as TDS-CDMA or LTE, which have lower rates, and thus larger cycle budgets. Considering that clock speeds will increase and that the synchronization primitives can be optimized to exploit the addressing modes available in the EVP, these results are very promising
- β¦