Abstract-In this work, we propose a design flow for efficient implementation of embedded feedback control systems targeted for multi-core platforms. We consider a composable tile-based architecture as an implementation platform and realise the proposed design flow onto one instance of this architecture. The proposed design flow implements the feedback loops in a datadriven fashion leading to time-varying sampling periods with short average sampling period. Our design flow is composed of two phases: (i) representing the timing behaviour imposed by the platform by a finite and known set of sampling periods, which is achieved exploiting the composability of the platform, and (ii) a linear matrix inequality (LMI) based platform-aware control algorithm that explicitly takes the derived platform timing characteristics and the shorter average sampling period into account. Our results show that the platform-aware implementation outperforms traditional control design flows (i.e., almost 2 times) in terms of quality of control (QoC).
I. INTRODUCTION
An efficient implementation of embedded control systems requires considering the trade-off between resource utilisation and quality of control (QoC) [1] . The resources can be computation (e.g., processor), communication (e.g., communication bus) and memory of the embedded implementation platforms. The QoC can be measured by different system parameters (e.g., settling time, peak overshoot) depending on high-level requirements. In general, a shorter sampling period translates into a higher QoC [2] , [3] , [4] . However, a shorter sampling period implies a higher requirement on resources (i.e., higher computation on processors, larger data on the bus). In this context, a cost-effective design requires sharing a resource among multiple applications [5] . Further, resource sharing generally introduces interferences between applications. For feedback control systems, such inter-application interference might cause undesirable delays in the loop which might degrade the QoC and potentially destabilize the system.
To mitigate inter-application interference, the platforms need to offer composability such that the applications are functionally and temporally independent of each other [6] . Using a time-division multiplex (TDM) based policy in the platform is one possible option to achieve composability. As an implementation platform, we consider an existing platform -Composable and Predictable System on Chip (CompSOC)-that uses a TDM based policy to achieve composability [7] . In this work, we study an efficient implementation of feedback control systems onto such composable platforms.
In traditional implementations of control systems, resources are allocated to the control tasks in such a way that they are executed periodically assuring uniform sampling periods. Such periodic execution eases the control algorithm design and is addressed by the literature on discrete-time/sampled-data systems [8] . In this work, we show that such implementation (ensuring periodic sampling) might not be efficient in terms of resource utilisation and might be sub-optimal in terms of QoC for a given resource allocation. We refer to such implementation as baseline design (see Fig. 1 (a) ). As opposed to the baseline design, we propose a data-driven implementation of feedback control loops (see Fig. 1 (b) ) allowing the control tasks to be executed as fast as possible. While such implementation is generally not suitable for sharing resources with other applications, the composability of the CompSOC platform allows for temporal isolation among co-existing applications and lets one application to run without interference in its allocated time slots. On the one hand, the proposed data-driven execution results in a shorter average sampling period. On the other hand, the control loops encounter variable sampling periods due to an occasional unavailability of resources (i.e., when other applications use them). We utilise the inherent predictable behaviour of the platform to represent the variation in sampling periods by a finite and precisely known set of possible sampling periods. Further, we propose a novel linear matrix inequality (LMI) based control law that utilises the platform-specific timing information. We refer to the proposed design and implementation as platform-aware design. We show by experimentation that the QoC obtained using the proposed platform-aware approach is significantly higher than the one we achieve by the baseline design.
Contributions:
The key novelty of our work is the design flow: First, we show that the platform timing behaviour results in a finite and known set of possible sampling periods. Next, we formulate the controller design as an LMI explicitly for these platform-derived sampling periods. In this process, we identify the sampling period that is closer to the average sampling period as a nominal sampling period. While occasional large sampling periods (due to the resource unavailability) can potentially destabilise the system, the proposed LMI-based design guarantees closed-loop stability for the platform-derived finite set of sampling periods. Thus, the design provides a flavour of an "average-case based control" design and implementation as opposed to traditional worst-case based design. Since the worst-case behaviour from a platform generally occurs only rarely and the system mostly runs near to the average-case behaviour, the overall efficiency of the proposed design is higher. We validate our claims by using a large experiment set and show that our approach provides a QoC that is almost 2 times higher than the traditional (i.e., baseline) design.
This article is organised as follows: Section II explains the feedback control loops that we consider. The composable platform and its timing model are described in Section III. Section IV describes mapping and execution of composable control applications onto the platform. Section V explains the proposed control law and the design flow. The experimental results and conclusions are presented in Sections VI and VII, respectively.
II. FEEDBACK CONTROL LOOPS
A control application is responsible for regulating a continuous-time plant (i.e., a system or subsystem in the physical world) defined by:
where x(t) is the state of the plant, y(t) is the output of the plant, and u(t) is the control input that is computed by a control algorithm or controller. A c , B c , and C c are the socalled state, input, and output matrices, respectively. These matrices model the behaviour of the system dynamics.
We are concerned with the regulation problem. That is, the control objective is to design u(t) such that y(t) → 0 with time. The time it takes for the system output y(t) to reach and stay in a closed region (e.g., < 2%) around the reference value is the settling time of a feedback control loop. We define QoC (denoted by φ) as inversely proportional to the settling time,
Thus, a higher QoC φ implies a shorter settling time.
Input saturation: In a real-life implementation, the control input is constrained by a maximum value (e.g., voltage, current limits of the power source). We consider that the input |u(t)| ≤ U MAX . That is, the maximum allowed input signal is U MAX . 
A. Embedded implementations
The interval between two consecutive executions of sensing tasks T s k and T s k+1 is known as the sampling period h k :
Within each sampling period h k , the control operations are executed sequentially (i.e., T 
With the execution time description and definitions of the control tasks within a sampling period, it is then possible to derive implementation schemes for the control tasks.
Data-driven implementation: in the data-driven approach, the control tasks are executed right after the termination of their preceding tasks. That is,
where is time granularity supported by the embedded platform and considering that all the resources are allocated to the application. This means that there is no interruption between the executions. In this work, we take to be 1 clock cycle of the processor. Here, the sampling period h k can vary from one sampling period to the other, depending on the resource availability and the execution times of T s k , T c k and T a k (see Fig. 1 (b) ). The advantage of such an implementation is a shorter average sampling period which can potentially be translated into higher QoC. However, the sampling period varies over time resulting in a switched system. Such switching behaviour can destabilise the overall closed-loop system [9] and therefore needs to be taken into account in the control design phase.
B. Control task model
Considering the data-driven implementation approach, it is necessary to derive a control task model that explains the behaviour of system dynamics and control input as time evolves. The task partitioning can be done in different ways. To this end, we consider the task partitioning illustrated in Fig. 1 (b) an example to illustrate our approach. It should be noted that our approach is also applicable to another task partitioning. As can be seen, the sampling period
varies from one cycle to the next one. As indicated in Eq. (5) and using Eq.(6), the sensor-to-actuator delay is defined as
As shown in Fig. 1 (b) , the control input u(t) is held until the next update, i.e., during τ k ,
Using the model presented in [10] , we have
where
In Eq. (9), we assume that
T obtaining the augmented higher order system
and I is the identity matrix.
C. Control law
The control input u[k] is a state feedback controller of the following form,
where K k is the state feedback gain at the k-th sample. The closed-loop system using Eq. (10) is given by,
Switching behaviour: with the control law (12), the closedloop system keeps on switching as follows:
The above switching behaviour can lead to system instability. Therefore, the design of K k must guarantee stability of the overall system, and provide a higher QoC at the same time.
III. COMPOSABLE MULTI-CORE PLATFORM
We consider a tile-based architecture that offers configuration with multi-processors (processor tiles), interconnections (Network-on-Chip (NoC)) and memories (memory tiles) within the same platform. An example architecture is shown in Fig. 2 . Each processor tile is mainly composed of a MicroBlaze softcore processor. The monitor tile is specialized for debugging purposes. The memory tile contains the external memory interface and controller, and the NoC provides interconnection between the tiles. The rest of the parameters in Fig. 2 are explained in the following. 
A. Virtual processors
In the above platform, composability is achieved by virtualising the processor resource. For this purpose, we use CoMik (Composable and Predictable Micro-kernel) that creates multiple virtual processors (VPs) that can be used as dedicated resources [11] . Each VP's utilisation of the underlying physical processor is allocated in a TDM manner. The VPs are cycle-accurately temporally isolated. That is, the activities on concurrent virtual processors do not affect each other's timing even by a cycle. In the following subsection, we describe how such cycle-accurate temporal isolation is achieved by CoMik.
B. Cycle accurate temporal isolation
CoMik's TDM scheduling is regulated by a periodic interrupt that indicates a virtual processor (state) context swap. Fig. 3 illustrates how CoMik performs VPs context swap. In this example, an interrupt arrives at time I. This interrupt cannot be handled immediately since the processor is uninterruptible for a duration of U. This causes a jitter of time J. U is variable depending on the critical region or multi-cycle instruction and thus, J is also variable. Next, control is passed to CoMik's interrupt routine that performs the VPs context swap. The context of the previous VP is stored. This includes storing the state of the physical processor's registers etc on the stack. Subsequently, the next VP is scheduled. As shown in Fig. 3 , the context swapping takes time T (i.e., transition). T can be variable due to variation in scheduling time. Clearly, the start of the next VP depends on J and T. To achieve cycleaccurate isolation, we split the TDM period into a fix duration CoMik slot of length ω cycles and partition (or VP) slot of length ψ cycles. The CoMik slot starts at the time the context change interrupt is raised and lasts for a fixed duration such that
where R is 2 cycles to take into account the instruction fetch and decode stages of the pipeline, enabling the VP to start where it left off. A definitive upper bound max(T ) can be derived for the duration of the transition time. The jitter bound max(J) is a design decision that restricts the maximum length of the partition-level critical region. This jitter bound should last minimally long enough to accommodate the processor's longest multi-cycle instruction. Increasing max(J) also increases the necessary duration of the CoMik slot ω. The processor utilisation at the application-level is given by
which indicates that a longer ω is not desired to achieve higher utilisation. The trade-off is between a longer worst-case critical region and the decreasing CoMik slot overhead.
C. Application scheduling
An application is executed in the allocated partition slot (or VP) and is paused every time a new CoMik slot starts. The execution is only resumed in the next partition slot assigned for the same application. In Fig. 4 this situation is illustrated by dividing a TDM period into three partitions and CoMik slots, where two partition slots are assigned for one application. The execution of an application is swapped between its partition slot, the next CoMik slot and possibly other application's partition slot. This further results in two time domains. The global-time or wall-time that counts for every single clock cycle of execution in the platform, and the partition-time that counts for every single clock cycle that has taken place within a partition slot. In the platform, we have specific timers for both time domains. This eases applications development with timing requirements.
D. Platform timing model
With the above platform description, we consider a TDM table consisting of N partition slots with N ≥ 1. Each partition slot has a length of ψ clock cycles and further, each CoMik slot has a length of ω clock cycles. The total length of a TDM table is given by N × (ψ + ω), and with a set of applications Λ where 0 < |Λ| ≤ N , the resource allocated to an application λ ∈ Λ is given by the function S(λ) : Λ → N. Fig. 4 shows the partition slots allocation in either distributed (i.e., slots can be separated by other partition or CoMik slots) or contiguous (i.e., one after the other separated by one CoMik slot) ways for two applications Λ = {λ 1 , λ 2 } with N = 3. In this example, S(λ 1 ) = 2 and S(λ 2 ) = 1. That is, two partition slots are allocated to application λ 1 and one partition slot is allocated to application λ 2 . Since each partition slot has ψ clock cycles, λ 1 is executed on S(λ 1 )·ψ clock cycles in each TDM period. A general expression of the allocated resource (as a fraction of the total resource) to an application is given by,
In summary, the virtualisation capability of the platform enables the development and execution of applications by scheduling them into customisable partition slots. This will allow the application designer to take into account the timing properties of the platform (e.g., slot lengths and resource allocation) in order to independently develop the application on this platform and ensuring that it will not interfere with other applications. Such composable nature of the platform further allows multiple design teams to develop and verify their applications independently. Next, we describe how we design and implement our proposed platform-aware controller on this composable platform.
IV. COMPOSABLE EMBEDDED CONTROL SYSTEMS
Since the platform allows for independent development by functional and temporal separation, we focus only on the control application λ c ∈ Λ. We consider a representative single-input single-output (SISO) plant dynamics for illustration. We implement the control system onto the platform with two synchronous processor tiles, one memory tile, one monitor tile, and the NoC (see Fig. 2 ). For the proposed datadriven implementation, we use the task partitioning illustrated in Fig. 1 (b) . The tasks T a and T s are mapped onto one processor tile. The task T c is mapped onto another processor tile. As already described, the timing plays a crucial role in the design of the control law and next, we derive the timing behaviour experienced by the control loop.
A. Timing properties of feedback loops
We are interested in characterising the exact timing behaviour the control application λ c will experience in the platform with the above implementation. Based on the platform properties, we obtain a finite set of possible sampling periods h k and subsequently, we utilise them in the design of the control law (detailed in the next section). Towards this, we first define the application execution time
where e s , e a and e c are the execution times of T s , T a and T c , respectively. δ sc and δ ca are the delays (e.g., communication time over NoC) given by other operations in the sensorto-computing and computing-to-actuator paths. Further, we consider the application execution time e < ψ. That is, we choose the partition slot ψ to be longer than the application execution. The application runs within its allocated partition slots which are scheduled within a TDM period, and the resulting sampling periods depend on the resource allocation (i.e., S(λ c ), distributed/contiguous allocation).
Distributed resource allocation: with distributed resource allocation (see Fig. 4 ), there is a maximum of N + 1 possible sampling periods independent of the number of slots allocated to the application. The possible sampling periods are
where 2 ≤ i ≤ N + 1. As illustrated in Fig. 5 , the addition of multiple slots (i.e., partition and CoMik slots) in between the execution of the application introduces timing cases that are related to the amount of partitions slots N . For instance, an application with distributed allocated resources and N = 3 might have h 1 = e, h 2 = e + ω, h 3 = e + 2ω + ψ, and h 4 = e + 3ω + 2ψ sampling period cases. Contiguous resource allocation: contiguous resource allocation (see Fig. 4 ) results in a reduced subset of sampling periods of the distributed resource allocation. In the following, we derive the possible sampling periods for this case.
Case I: (h 1 = e): Fig. 6 (a) illustrates the scenario where the feedback loop is executed within a partition slot.
Case II (h 2 = e + ω): For S(λ c ) > 1, the execution of the control loop might spread over two partition slots resulting in h 2 = e+ω due to the interruption by the CoMik slot. Fig. 6 (b) illustrates this scenario.
Case III (h 3 = e + (N − S(λ c ))ψ + (N − S(λ c ) + 1)ω): Fig. 6 (c) illustrates the scenario when the execution of a feedback loop is spread over two TDM periods.
In this work, we consider a contiguous resource allocation. Therefore, as illustrated above, the sampling period h k of λ c switches between the elements set H = {h 1 , h 2 , h 3 }. For a given platform configuration, the set H is known at the design time.
B. Average sampling period
With contiguous resource allocation described in Section IV-A, the ratio between the frequencies of occurrence h 1 , h 2 and h 3 is approximately given by,
That is, S(λ c ) implies that the control loop can be executed Thus, the control loop runs with h 1 sampling period for
S(λc)ψ e
− S(λ c ) times. Thus, the average sampling period is given by,
From Equation (20), it is clear that h 1 occurs more frequently compared to h 2 and h 3 . Since h 1 is significantly shorter than h 2 and h 3 , h avg of the closed-loop system is closer to h 1 . We consider h 1 as the nominal sampling period that we further use in the design of the platform-aware controller. We utilise the above platform-derived behaviour in the design of the controller.
V. PLATFORM-AWARE CONTROLLER DESIGN
We have a system whose sampling period switches between elements of a known and finite set H = {h 1 , h 2 , h 3 } (derived from the analysis in Section IV-A). In this section, we present a platform-aware design method that utilises this platformderived timing behaviour in the design of the controller to improve the QoC of the control application λ c ∈ Λ. In view of the switched systems in Eq. (13), we consider three discretetime switching (sub)systems,
where [12] [13] ) Consider A k to be discrete-time LTI systems of the form (22). V (z) = z T P z is the Common Quadratic Lyapunov Function (CQLF) of the systems A k if there exist P = P T > 0, Q = Q T > 0 and P is the simultaneous solution of the discrete-time Lyapunov equations,
A. Background: CQLF
The existence of a CQLF is the necessary and the sufficient condition for the stability of the system with switching subsystems (22).
B. LMI based design
As already mentioned, we choose h 1 as a nominal sampling period h n . We design the controller gain K n corresponding to h n = h 1 such that the closed-loop system A cl,n in Eq. (24) is stable with higher QoC.
The design of K n can be done with a traditional design method such as Linear Quadratic Regulator (LQR) or a pole-placement technique [8] . The sampling period h k can be other than h n . For h 2 , h 3 ∈ H, we design feedback gains K 2 and K 3 such that the overall switching behaviour is stable (see Fig. 7 ). 
then the systems in Eq. (22) have a CQLF with the following feedback gain
In summary, we apply gains K n , K 2 and K 3 for the sampling periods h 1 , h 2 and h 3 respectively. If the system runs with only nominal sampling period h n , the closed-loop system provides a high QoC -which can be achieved by optimal design of K n -with state-of-the-art design methods. Theorem 5.3 for designing K k guarantees that the closedloop system will be stable in the presence of the switching between the h k , h n ∈ H. Since the system runs with h n more frequently, the QoC φ pa only degrades by a small margin (which is shown in the experimental results). 1 Proof omitted for space reasons. 
C. Platform-aware design flow
Since the QoC is dependent on the choice of poles that is used to design the gain K n for the nominal sampling period, we need to find the poles for which (i) |u[k]| ≤ U MAX , and (ii) solution exists for K 2 and K 3 in Theorem 5.3. Further, since we are interested in improving QoC, we choose the poles (among those which satisfy (i) and (ii)) for which we achieve the shortest settling time and the maximum QoC φ pa as per Eq 2. Fig. 7 shows the control design flow for the proposed platform-aware approach. Toward this, we first discretize the design space for poles and obtain a set of possible optimal poles. For each of them, we apply the above design flow.
D. Baseline design flow
The baseline design uses the periodic sampling period and the control gain K is designed using a pole-placement technique. In this case, we choose the poles for which (i) |u[k]| ≤ U MAX (ii) we achieve the shortest settling time and the maximum QoC φ bl as per Eq 2. Overall design flow is illustrated in Fig. 8 . Similar to the platform-aware design flow, we first discretise the design space for poles and for each of them, we apply the above design flow.
VI. EXPERIMENTAL RESULTS
We illustrate the applicability of our proposed platformaware design flow considering an automotive cruise control system [14] . The purpose of the system is to maintain a constant vehicle speed despite external disturbances. This is achieved by comparing the desired speed, and adjusting the engine throttle angle according to a control law. The continuous-time system model (according to Eq. 1) of this cruise control system is given by,
A c = 
, C c = [ 1.00 0.00 0.00 ] (28) where we consider a control objective to drive the vehicle's speed x 1 (t) to 0 as fast as possible (i.e., short settling time and higher QoC φ). The control tasks are implemented on the platform as described in Section IV. That is, two processor tiles were used to map the control tasks at a clock frequency of 120 MHz. The duration of one clock cycle is 
With the above system description and platform characteristics, we experimentally found that the execution of the control application is e ≈ 110 μs. We simulated both the platformaware and baseline design flows. For both design flows, we use an identical input signal saturation U MAX = 10 8 . As already described, we discretise the design space for poles. To keep the size of the design space reasonable, each pole value varied 0.05 units within the range of 0 and 1 discrete values. In total, this choice gave as 11931 poles combinations. These poles are used for the nominal sampling period in the platform-aware design, and the baseline design. We conducted three different experiments for both design flows.
• Exp. 1 (S(λ c ) = 1): one partition slot is allocated for the control application which gives a 10% of partition slots usage within a TDM period. Fig . 9 . Simulation of the cruise control system response (y(t)) for the three experiments conducted following the platform-aware design flow.
• Exp. 2 (S(λ c ) = 2): two partition slots were allocated for the control application which gives a 20% of partition slots usage within a TDM period.
• Exp. 3 (S(λ c ) = 5): five partition slots were allocated for the control application which gives a 50% of partition slots usage within a TDM period.
A. Platform-aware design
As illustrated in Fig. 7 , the simulation is started by defining the platform parameters (i.e., partition and CoMik slot durations, clock frequency, and TDM period length), the control input constraint U MAX , the sampling period set H which is derived from the platform parameters and the resources allocated to the control application. In the platform-aware approach, we do not need a periodic sampling period and therefore the partition slots were assigned contiguously as shown in Fig. 11 . Table I summarises the results obtained for the three different experiments. The system settling time varies depending on the allocated resource S(λ c ). The resulting sampling periods due to the execution in the platform are shown and we also show the settling time and QoC φ pa for the specific pole. For 10% resource, the settling time is longer (28.8ms) than the settling time with 20% and 50% resource allocation. Intuitively, a higher QoC is expected for a higher allocated resource. This is because a higher resource allocation S(λ c ) implies a shorter average sampling period h avg and the proposed platform-aware approach exploits that knowledge in the choice of the nominal sampling period. Further, we simulated the system response which is plotted in Fig. 9 , and show the system response in each experiment. It can be noticed that a remarkable performance improvement is achieved as the h avg gets shorter (e.g., see Exp. Fig . 10 . Simulation of the cruise control system response (y(t)) for the three experiments conducted following the baseline design flow.
B. Baseline design
In the baseline approach (see Fig. 8 ), we need periodic sampling. Thus, the partition slots were assigned in a distributed manner (see Fig. 11 ) in order to guarantee equal time intervals between two consecutive sensing operations. Table II summarises the results obtained for the three different experiments. The resulting sampling periods and the settling time due to the execution in the platform is shown. The system response for the three experiments is shown in Fig. 10 , where the optimal poles found in each experiment are [0.94 0.14 0.14 0.14], [0.94 0.14 0.14 0.14] and [0.94 0.14 0.24 0.49] in Exp.1, Exp.2 and Exp.3 respectively. It should be noticed that a shorter average sampling period (achieved by higher S(λ c )) leads to a shorter settling time (i.e., higher QoC φ bl ).
C. Discussion
Intuitively, a higher resource allocation should provide a higher QoC φ. In the platform-aware design the QoC φ pa varies from φ pa ≈ 34 in Exp.1 to φ pa ≈ 37 in Exp.3. Similarly, in the baseline design the QoC increasingly varies from φ bl ≈ 3 in Exp.1 to φ bl ≈ 19 in Exp. 3. In all cases, the platform-aware design outperforms the baseline design. That is, from the results, registered in Tables I and II , the platform-aware yields a QoC φ pa > (8.7 × φ bl ) (in Exp. 1), φ pa > (4.5 × φ bl ) (in Exp. 2) and φ pa > (1.9 × φ bl ) (in Exp. 3). Clearly, the platform-aware design provides almost 2 times or more QoC compared to the baseline design. It is further notable from the results that the margin of QoC (i.e., φ pa − φ bl ) between baseline and platform-aware is lower with higher resource allocation.
VII. CONCLUSIONS
In this work, we presented a platform-aware design of feedback control loops considering a composable multi-core architecture as an implementation platform. The proposed method outperformed the traditional one by assuring short average sampling period. Our results further show how the resource allocation is reflected in the achievable QoC in the feedback control loops. Among the future extensions, we plan to exploit the periodicity of the platform-derived timing behaviour in design and implementation of the feedback control law.
