Memory-aware embedded control systems design by Chang, Wanli et al.
This is a repository copy of Memory-aware embedded control systems design.
White Rose Research Online URL for this paper:
http://eprints.whiterose.ac.uk/164760/
Version: Accepted Version
Article:
Chang, Wanli orcid.org/0000-0002-4053-8898, Goswami, Dip, Chakraborty, Samarjit et al. 
(3 more authors) (2016) Memory-aware embedded control systems design. IEEE 
Transactions on Computer-Aided Design of Integrated Circuits and Systems. pp. 586-599. 
ISSN 0278-0070 
https://doi.org/10.1109/TCAD.2016.2613933
eprints@whiterose.ac.uk
https://eprints.whiterose.ac.uk/
Reuse 
Items deposited in White Rose Research Online are protected by copyright, with all rights reserved unless 
indicated otherwise. They may be downloaded and/or printed for private study, or other acts as permitted by 
national copyright laws. The publisher or other rights holders may allow further reproduction and re-use of 
the full text version. This is indicated by the licence information on the White Rose Research Online record 
for the item. 
Takedown 
If you consider content in White Rose Research Online to be in breach of UK law, please notify us by 
emailing eprints@whiterose.ac.uk including the URL of the record and the reason for the withdrawal request. 
586 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 36, NO. 4, APRIL 2017
Memory-Aware Embedded Control Systems Design
Wanli Chang, Dip Goswami, Samarjit Chakraborty, Lei Ju, Chun Jason Xue, and Sidharta Andalam
Abstract—Control applications are often implemented on
highly cost-sensitive and resource-constrained embedded plat-
forms, such as microcontrollers with a small on-chip memory.
Typically, control algorithms are designed using model-based
approaches, where the details of the implementation platform
are completely ignored. As a result, optimizations that integrate
platform-level characteristics into the control algorithms design
are largely missing. With the emergence of cyber-physical systems
(CPS)-oriented thinking, there has lately been a strong inter-
est in co-design of control algorithms and their implementation
platforms, leading to work on networked control systems and
computation-aware control algorithms design. However, there
has so far been no work on integrating the characteristics of
a memory architecture into the design of control algorithms.
In this paper we, for the first time, show that accounting
for the impact of on-chip memory (or cache) reuse on the
performance of control applications motivates new techniques
for control algorithms design. This leads to significant improve-
ment in quality of control for given resource availability, or
more efficient implementations of embedded control applications.
We believe that this paper opens up a variety of possibilities
for memory-related optimizations of embedded control systems,
that will be pursued by researchers working on computer-aided
design for CPS.
Index Terms—Embedded control systems, memory analysis,
nonuniform sampling, quality of control (QoC).
I. INTRODUCTION
TRADITIONALLY, control algorithms design and theirimplementation were strictly separated, the former being
pursued by control theorists and the latter by embedded
systems engineers. As a result, characteristics of the imple-
mentation platform related to—computation, communication,
and memory hierarchy—are typically not accounted for in the
control algorithms design process. With the emergence of the
cyber-physical systems (CPS)-oriented thinking, there is now
a strong interest in the co-design of control algorithms and
their implementation platforms. Toward this, there has been
Manuscript received June 16, 2015; revised October 30, 2015, February 25,
2016, and August 24, 2016; accepted September 4, 2016. Date of publication
September 27, 2016; date of current version March 17, 2017. This work was
supported by the Singapore National Research Foundation through its Campus
for Research Excellence and Technological Enterprise Program. This paper
was recommended by Associate Editor X. S. Hu.
W. Chang is with the Cluster of Information Communication Technology,
Singapore Institute of Technology, Singapore 138683, and also with TUM
CREATE, Singapore 138602 (e-mail: wanli.chang@singaporetech.edu.sg).
D. Goswami is with the Eindhoven University of Technology,
5600 Eindhoven, The Netherlands.
S. Chakraborty is with TU Munich, 80333 Munich, Germany.
L. Ju is with the Department of Computer Science and Technology,
Shandong University, Jinan 250014, China.
C. J. Xue is with the City University of Hong Kong, Hong Kong.
S. Andalam is with the University of Auckland, 1142 Auckland,
New Zealand.
Digital Object Identifier 10.1109/TCAD.2016.2613933
Fig. 1. Embedded control system with a processor and on-chip memory for
program execution. Instructions are stored in the flash memory. Programmable
I/Os are used for communication among the controller, sensors, and actuators.
a lot of recent work on control algorithms design that takes
into account the characteristics of communication channels
in networked embedded control systems, or schedule control
tasks depending on real-time system states and disturbances to
improve the quality of control (QoC) [1]–[3]. However, there
has so far been no work that considers the characteristics of
the memory architecture in the implementation platform when
designing control algorithms.
In many cost-sensitive and resource-constrained embedded
platforms, that are used to implement control algorithms,
memory subsystems constitute an important component, and
on-chip memories (used as caches) contribute significantly
toward their cost. There have been many efforts on cache reuse
maximization, for improving the real-time performance, such
as the worst-case execution time (WCET), of embedded soft-
ware [4]. However, the characteristics of control systems and
metrics of control performance like QoC have not been directly
incorporated into these techniques. In this paper, we follow
a CPS-oriented approach and show that accounting for on-
chip memory behavior motivates new techniques for control
algorithms design, that are otherwise not used. These tech-
niques open up a number of possibilities for co-design and
co-optimization of control algorithms, code placement, and
memory-aware control tasks scheduling, that we believe this
paper will motivate other researchers to pursue.
The setup we consider is fairly general and occurs in sev-
eral application domains, one example of which is automotive
embedded systems. As illustrated in Fig. 1, the controller (such
as the XC23xxB Series microcontroller [5] from Infineon that
is popular in automotive systems) consists of an embedded
processor with an on-chip memory (referred to as cache from
here on) to run multiple control applications. The codes of
these applications are stored in an external memory (often a
flash memory), which has a large size and can thus accommo-
date all the application codes, but has high read/write latencies
(hundreds of processor cycles). Programmable I/O peripherals
are used for communication among the controller, sensors, and
actuators.
In such an embedded implementation platform, the overall
control loop for each application performs three operations.
1) Measuring the system states with sensors (measure).
0278-0070 c© 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of York. Downloaded on August 23,2020 at 17:41:15 UTC from IEEE Xplore.  Restrictions apply. 
CHANG et al.: CACHE-AWARE EMBEDDED CONTROL SYSTEMS DESIGN 587
2) Computing the control input by executing the control
application (compute).
3) Applying the computed control input to the plant
(actuate).
The sampling period is the time duration between two con-
secutive measure operations. In general, a shorter sampling
period implies faster response from the controller and thus
better QoC. Assuming that measure and actuate operations
require negligible time, the sampling period is constrained by
the WCET of the control application.
Given a collection of control applications (e.g., C1, C2, and
C3), it is conventional to run the control loops of them in a
round-robin fashion (C1, C2, C3, C1, C2, C3, . . .). Since the
codes for different control applications are different, the on-
chip cache, in this process is frequently refreshed. This results
in poor cache reuse and long WCET. In order to address
this issue, we propose a new sampling order for the control
applications, using which cache reuse is improved and the
WCET of each application is reduced. In particular, we study a
nonuniform sampling scheme, where the control loop of each
application is consecutively run multiple times—in order to
increase cache reuse, before moving on to the next application
(e.g., C1, C1, C1, C2, C2, C2, C3, C3, C3, . . .). We design con-
trollers for both the conventional memory-oblivious and the
proposed memory-aware sampling orders. We then show that
all else (e.g., the processor’s operating frequency) remaining
identical, the nonuniform sampling scheme results in signifi-
cant improvement in the QoC (20%–30%, which is significant
in highly cost-sensitive domains like automotive systems).
A. Contributions
As mentioned earlier, the impact of the memory architec-
ture on control algorithms design has not been studied until
now. The main contribution of this paper is a novel design
flow for memory-aware embedded control systems that brings
two very disparate classes of techniques—cache modeling and
program analysis on one hand, and controller design on the
other hand—together and quantifies the resulting benefit. The
idea is to shorten the sampling periods of the control appli-
cations with execution (i.e., sampling) orders that increase
the cache reuse. Such cache-aware execution of control appli-
cations implies nonuniform sampling with a shorter average
sampling period. Generally, the nonuniform sampling scheme
is undesirable in control systems, since the controller design
has to deal with switching instability, making it challenging
to optimize QoC. In our method, the key is that the sam-
pling order is a design parameter and known in the controller
design phase. Exploiting the knowledge of the sampling order
and a shorter average sampling period, we propose a controller
design technique that improves the QoC.
In the existing memory-conscious algorithms design (e.g., in
real-time tasks), the programs were treated as black boxes and
their functionality was not considered. In this paper, we seek
to address this and explicitly consider the properties of con-
trol algorithms. In particular, the optimal choice of the control
algorithms parameters, such as the gain values, are depen-
dent on their sampling periods and sensing-to-actuation delays,
which in turn are determined by the WCETs they experience.
In the case of other algorithms, their parameter values are not
updated on the basis of the WCETs. Hence, while memory-
conscious algorithms design for other domains stops at WCET
minimization, whether and how simultaneously modifying the
controller parameters in response to the reduced WCET leads
to improved QoC, is an open question. This paper makes the
first efforts to answer this question.
While we exploit existing program analysis techniques in
conjunction with cache modeling, our analysis focuses on esti-
mating the guaranteed WCET reduction due to consecutive
executions of the same program, which is required in comput-
ing the nonuniform sampling order for a feedback controller.
Estimating such WCET reduction has not been studied before,
mostly since until now there has been no useful context for
studying it. In addition, QoC-optimal controller design with
nonuniform sampling relies on our proposed technique that
exploits the shortened WCET in the memory-aware sampling
order. Hence, the contribution of this paper is the syner-
gistically integrated framework of memory-aware embedded
control systems design, which further provides new insights
and design options in line with the CPS-oriented design philos-
ophy, where the goal is to study control theory and embedded
systems with the same footing.
B. Paper Organization
Section II presents the literature review on
computation/communication-aware embedded control systems
design and effective use of memory for embedded system
performance improvement. Section III gives an overview of
the proposed method for memory-aware embedded control
systems design. In Section IV, the memory analysis technique
is discussed and illustrated with a motivational example.
Section V describes basics of feedback control applications
under consideration and explains how the controller can be
designed for the conventional memory-oblivious uniform
sampling order and the proposed memory-aware nonuni-
form sampling scheme. Experimental results are shown in
Sections VI and VII makes concluding remarks of this paper.
II. RELATED WORK
Recent research work takes into account the implementation
details, such as computation and communication, of embed-
ded control systems in order to improve QoC and reduce the
discrepancy between the design and implementation phases.
In [1], a novel controller design technique based on a hierarchy
of controllers is proposed, so that when the allocated execution
time is short, a low-level computationally light controller is
activated to achieve basic QoC and when the execution time is
long, a high-level computationally intensive controller is used
aiming for better QoC. The tradeoff between QoC and CPU
usage is explored in [2] by dynamic scheduling of multiple
self-triggered control tasks executed on one processor. The
conventional paradigm in networked embedded control sys-
tems regards the messages as periodic, which facilitates the
analysis and implementation, yet leads to conservative usage
of the communication bandwidth. An aperiodic strategy for
dynamic allocation of bandwidth according to the current state
of the plants and available resources is proposed in [3]. None
of the works have considered memory in the embedded control
systems design, which is the focus of this paper.
Authorized licensed use limited to: University of York. Downloaded on August 23,2020 at 17:41:15 UTC from IEEE Xplore.  Restrictions apply. 
588 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 36, NO. 4, APRIL 2017
Fig. 2. Memory analysis of an example with three control applications. Each
application is consecutively executed three times. After the first execution
Ci(1), some instructions in the cache can be reused and thus the WCETs of
the following two executions are shortened.
The tradeoff between memory and CPU usage in a feedback
scheduling system, which dynamically adjusts the sampling
periods of control tasks to maximize the overall QoC, is
explored in [6]. However, it is not considered how memory
can be exploited to improve the system performance. In [4],
the round-robin scheduling of multiple tasks on an embedded
operating system is dynamically tuned during program exe-
cution, adapting to changes in work load and external input
stimulus. As a result, cache misses are reduced and the system
performance is improved. Cache analysis for timing-related
computation has been studied [7]–[10]. The architectural influ-
ence on static timing analysis of embedded hard real-time
systems is described in [7]. The cache-related preemption
delay is analyzed in [8] for a multitask embedded system
with preemption, and extended to streaming applications in [9]
for cache-aware timing estimation. The set-associative cache
is considered in [10]. In this paper, we modify the exist-
ing approach to compute guaranteed WCET reduction due to
cache reuse between two consecutive executions of the same
application, which is then exploited by the tailored controller
design method in the memory-aware sampling order to achieve
better QoC.
Works in control theory literature with nonuniform sam-
pling focus on guaranteeing stability of the resulting switched
system [11]. Generally, theoretical tools such as common
quadratic Lyapunov functions and switched Lyapunov func-
tions tackle arbitrary switching between sampling periods to
assure stability of the overall closed-loop system. In this paper,
as opposed to arbitrary switching, the switching order is pre-
cisely known in the design phase, i.e., from the memory-aware
sampling order. We aim for further performance optimality
by exploiting this additional knowledge about the switching
behavior. In the field of optimal control, techniques such as lin-
ear quadratic regulator [12] are well-developed. By adjusting
the weights in the quadratic cost, a tradeoff between the input
magnitude and the settling time can be achieved. However,
these existing optimal control methods cannot be directly
applied in this paper, since first, they are not specifically tuned
for switched systems, and second, they do not explicitly con-
sider the constraint on the input signal, which exists in all
real-life systems. The combination of performance optimiza-
tion and input constraints is addressed by model predictive
control (MPC) techniques [13]—another well-developed area.
First, MPC performs online optimization in every sampling
Fig. 3. General timing model of a control loop.
Fig. 4. In S1, there is no cache reuse. The WCET of all executions for the
same application Ewci remains constant. The sampling period of every control
application h is uniform under this scheme. The sensing-to-actuation delay
τ sai is equal to E
wc
i .
period, making it computationally heavy and unsuitable for
being implemented on the resource-constrained embedded
platform, which is what we are studying in this paper. Second,
the control law in MPC techniques is nonlinear in nature due
to online optimization in every sample and existing litera-
ture on MPC does not explicitly handle switching between
multiple linear sub-systems. Building upon these previous
works discussed above, our method formulates an optimal
pole-placement problem, where poles of sub-systems are deci-
sion variables, the input constraint is explicitly respected and
the settling time is the optimization objective. Unlike MPC,
our controller design is performed off-line making scalability
a less important aspect.
III. OVERVIEW OF THE PROPOSED METHOD FOR
MEMORY-AWARE EMBEDDED CONTROL
SYSTEMS DESIGN
In the proposed memory-aware sampling order, each con-
trol application Ci is consecutively executed multiple times.
An example with three applications (C1, C2, and C3) is shown
in Fig. 2, where Ci(j) denotes the jth execution of the control
application Ci. Before the first execution Ci(1), the cache is
either empty (i.e., cold cache) or filled with instructions from
other applications, that are not used by Ci (equivalent to cold
cache). The WCET of Ci(1) can be computed by a number of
existing standard techniques [7], [14], [15]. Before the second
execution Ci(2), the instructions in the cache are from the same
application Ci and thus can be reused. This results in more
cache hits and hence shorter WCET. Depending on which path
the program takes, the amount of WCET reduction varies. This
requires a technique to compute the guaranteed WCET reduc-
tion of Ci(2) and Ci(3), independent of the path taken, which
is presented in Section IV. To summarize, the WCET reduc-
tion results from the cache reuse increase, which is enabled by
the change of the actual execution order of the applications,
compared with the conventional round-robin fashion.
After WCET results are computed, the next task is to
derive the control timing parameters (e.g., sampling periods
and sensing-to-actuation delays). The general timing model of
Authorized licensed use limited to: University of York. Downloaded on August 23,2020 at 17:41:15 UTC from IEEE Xplore.  Restrictions apply. 
CHANG et al.: CACHE-AWARE EMBEDDED CONTROL SYSTEMS DESIGN 589
Fig. 5. In S2, the WCETs of the same control application vary, due to cache reuse. The sampling period for a control application is nonuniform.
a control loop is illustrated in Fig. 3. The compute operation
executes the control application, which takes E time units. The
sampling period is denoted by h. The time interval between
the measure and the corresponding actuate operations in the
same sampling period is sensing-to-actuation delay τ sa, which
is equal to the WCET of the control application Ewc.
In particular, we explore the relationship between WCET
results and control parameters of two example sampling
schemes. As illustrated in Fig. 4, S1 is the conventional
memory-oblivious scheme and summarized as follows:
S1: C1(1)→ C2(1)→ C3(1)→ C1(2)→ C2(2)→ C3(2)→
C1(3)→ C2(3)→ C3(3)→ · · ·
There is no cache reuse in S1 as discussed above, consider-
ing that different control applications typically have different
instructions to execute. In other words, when Ci(j) starts exe-
cution, all instructions of Ci need to be brought into the cache
from the flash memory. Therefore
Ewci (1) = E
wc
i (2) = · · · = E
wc
i (1)
where Ewci (j) is the WCET of the jth execution for Ci. The
WCET of the application Ci is denoted by Ewci , since all exe-
cutions of the same application have equal WCET. Clearly, all
control applications run with a uniform sampling period of
h =
∑
i=1,2,3
Ewci . (2)
Moreover, for the sensing-to-actuation delay
τ sai = E
wc
i . (3)
As has been shown in Fig. 2, S2 is an example memory-
aware sampling order and summarized as
S2: C1(1)→ C1(2)→ C1(3)→ C2(1)→ C2(2)→ C2(3)→
C3(1)→ C3(2)→ C3(3)→ · · ·
As illustrated in Fig. 5, we denote the effective WCET tak-
ing into account the cache reuse with E¯wci (j). From the above
discussion
∀i ∈ {1, 2, 3}, E¯wci (1) = Ewci (4)
since there is no cache reuse for the first execution of every
application Ci(1). E¯wci (2) and E¯wci (3) are shorter than E¯wci (1)
due to cache reuse. The amounts of cache reuse are the same
for Ci(2) and Ci(3) in the worst case. Denoting the guaranteed
WCET reduction as E¯gi , we have
∀i ∈ {1, 2, 3}
E¯wci (2) = E¯
wc
i (3) = E¯wci (1)− E¯
g
i . (5)
From these varying WCETs, the sampling periods of all three
applications can be calculated. Taking C1 as an example, there
are three sampling periods h1(1), h1(2), and h1(3), which
repeat themselves periodically
h1(1) = E¯wc1 (1), h1(2) = E¯
wc
1 (2)
h1(3) = E¯wc1 (3)+ (6)
where  is computed as
 =
∑
i=2,3
∑
j=1,2,3
E¯wci ( j). (7)
Similar derivation can be done for C2 and C3. The average
sampling period of an application havg is
havg =
∑
i=1,2,3
∑
j=1,2,3 E¯
wc
i ( j)
3
< h. (8)
According to (4) and (5)
havg <
∑
i=1,2,3 3× Ewci
3
. (9)
From (2), we get
havg < h. (10)
Moreover, the corresponding sensing-to-actuation delay τ sai (j)
also varies with cache reuse as
∀i ∈ {1, 2, 3}
τ sai (1) = hi(1) = E¯wci (1)
τ sai (2) = hi(2) = E¯wci (2)
τ sai (3) = E¯wci (3). (11)
As all control parameters have been derived, we can see that
the sampling period hi(j) of a control application is nonuni-
form for the memory-aware scheme. The average sampling
period of S2 is shorter than the uniform sampling period of
S1 as shown in (8), due to WCET reduction resulting from
cache reuse. The sensing-to-actuation delay τ sai (j) varies as
shown in (11). The next task is developing a controller design
method to exploit shortened nonuniform sampling periods and
achieve better QoC, which is reported in Section V. For the
sake of comparison, the conventional controller design method
for the memory-oblivious uniform sampling scheme S1 is also
presented.
Authorized licensed use limited to: University of York. Downloaded on August 23,2020 at 17:41:15 UTC from IEEE Xplore.  Restrictions apply. 
590 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 36, NO. 4, APRIL 2017
Fig. 6. Motivational example for memory analysis. Five memory blocks are
mapped to four cache lines. Memory blocks executed by each basic block are
shown. RCSIN and RCSOUT in the initialization phase are illustrated.
IV. MEMORY ANALYSIS FOR CONSECUTIVE EXECUTIONS
OF CONTROL APPLICATION
In this section, the guaranteed WCET reduction for the sec-
ond and subsequent executions of a control application in a
memory-aware sampling scheme like S2. The technique is
derived from previous research [8], [9] and modified to suit
this paper. Similar to the data flow analysis, we start from
a control flow graph (CFG) and then set up data flow equa-
tions for each node of the CFG [i.e., reaching cache state
(RCS) and live cache state (LCS) computation in this context],
based on which the fixed-point computation is performed.
Afterward, the results of the fixed-point computation can be
used to calculate the guaranteed WCET reduction. Fig. 6
presents a motivational example to illustrate our approach.
Only instruction cache is considered in this paper.
A. Basics of Memory Analysis
In the typical automotive-use embedded control system
shown in Fig. 1, on-chip memory works as cache and flash
works as main memory. There are Nc cache lines, denoted
as CL = {c0, c1, . . . , cNc−1} and the main memory has Nm
blocks, denoted as M = {m0,m1, . . . ,mNm−1}. Each memory
block is mapped to a fixed cache line. The example in Fig. 6
has four cache lines and five memory blocks. A basic block
is a straight-line sequence of code with only one entry point
and one exit point. This restriction makes a basic block highly
amenable for program analysis. The presented CFG, consisting
of four basic blocks B = {b0, b1, b2, b3}, has all the three key
elements of a control program, i.e., sequential basic blocks,
branches, and a loop. Therefore, it is suitable for illustrating
our memory analysis technique.
There are three key terms in memory analysis that are
described as follows.
1) Cache States: A cache state cs is described as a vec-
tor of Nc elements. Each element cs[i], where i ∈
{0, 1, . . . ,Nc − 1}, represents the memory block in the
cache line ci. When the cache line ci holds the memory
block mj, where j ∈ {0, 1, . . . ,Nm − 1}, cs[i] = mj. If
ci is empty, we denote it as cs[i] = ⊥. If the memory
block is unknown, we denote it as cs[i] = ⊤. CS is the
set of all possible cache states.
2) Reaching Cache States: RCS of a basic block bk,
denoted as RCSbk , is the set of all possible cache states
when bk is reached via any incoming path.
3) Live Cache States: LCS of a basic block bk, denoted as
LCSbk , is the set of all possible first memory references
to cache lines at bk via any outgoing path.
Since our focus is on WCET reduction between two con-
secutive executions of Ci, e.g., Ci(1) and Ci(2), we need to
compute RCS of the exit point in Ci(1) and LCS of the
entry point in Ci(2). By comparing all possible pairs of cache
states, the guaranteed number of cache hits and thus WCET
reduction can be calculated. In the following, we first discuss
computation of RCS and LCS.
B. Computation of Cache States
In RCS computation, we first define genbk as the cache state
describing the last executed memory block in every cache line
for the basic block bk. Assuming that b0 in Fig. 6 executes
m0 and then m4, instead of only m0, the last executed memory
block in c0 is m4. Therefore, genb0 is [m4,⊥,⊥,⊥]. For the
example in Fig. 6
genb0 = [m0,⊥,⊥,⊥], genb1 = [⊥,m1,m2,m3]
genb2 = [⊥,⊥,m2,m3], genb3 = [m4,⊥,⊥,⊥]. (12)
There are two equations involved in the RCS computation that
calculate RCSIN and RCSOUT, where RCSIN of a basic block
bk is the RCS before bk is executed and RCSOUT is the set of
all possible cache states after bk is executed. First, RCSOUTbk
can be calculated from RCSINbk as
RCSOUTbk =
{
T (bk, cs)
∣∣cs ∈ RCSINbk
}
(13)
where T is a transfer function defined as follows. For any
cache state cs ∈ CS and basic block bk ∈ B, there is a cache
state cs′ = T (bk, cs), where for any cache line ci ∈ CL and
i ∈ {0, 1, . . . ,Nc − 1}
cs′[i] =
{
cs[i]: if genbk [i] = ⊥
genbk [i]: otherwise.
(14)
RCSINbk can be calculated as
RCSINbk =
⋃
p∈predecessor(bk)
RCSOUTp (15)
where predecessor(bk) is the set of all immediate predecessors
of bk.
The RCS computation is composed of two phases: initial-
ization and fixed-point computation. As illustrated with the
example in Fig. 6, the initialization phase starts from the entry
basic block b0 with RCSINb0 = {[⊤,⊤,⊤,⊤]}. The element
is ⊤ since our analysis is independent of the program exe-
cuted before b0. According to (13), RCSOUTb0 is calculated tobe {[m0,⊤,⊤,⊤]}. Since b0 is the only immediate predeces-
sor of b2, RCSINb2 is equal to RCS
OUT
b0 based on (15). Due to the
Authorized licensed use limited to: University of York. Downloaded on August 23,2020 at 17:41:15 UTC from IEEE Xplore.  Restrictions apply. 
CHANG et al.: CACHE-AWARE EMBEDDED CONTROL SYSTEMS DESIGN 591
TABLE I
RCS COMPUTATION FOR THE MOTIVATIONAL EXAMPLE
TABLE II
LCS COMPUTATION FOR THE MOTIVATIONAL EXAMPLE
self loop, b1 has both itself and b0 as immediate predecessor.
However, since RCSOUTb1 has not been initialized yet, RCS
IN
b1 is
equal to RCSOUTb0 . In the same manner, we compute RCS
OUT
b1 ,
RCSOUTb2 , RCS
IN
b3 , and RCS
OUT
b3 , following the program flow
as shown both in Fig. 6 and Table I. The initialization phase
is completed once all basic blocks have been visited.
The next phase is fixed-point computation. RCSIN and
RCSOUT of all basic blocks are computed iteratively
with (13) and (15). This phase is terminated once the fixed
point is reached, i.e., RCSIN and RCSOUT of all basic blocks
remain unchanged. We let the program RCS be the RCSOUT of
the exit basic block, i.e., RCS = RCSOUTb3 . Results are reported
in Table I.
The LCS computation can be done in a similar fashion.
genbk is defined as the cache state describing the first exe-
cuted memory block in every cache line for the basic block
bk. Taking the same assumption when defining genbk for RCS
computation that b0 in Fig. 6 executes m0 and then m4, instead
of only m0, the first executed memory block in c0 is m0.
Therefore, genb0 is [m0,⊥,⊥,⊥]. LCSIN of a basic block bk
is the LCS after bk is executed and can be derived from
LCSINbk =
⋃
s∈successor(bk)
LCSOUTs (16)
where successor(bk) is the set of all immediate successors of
bk. LCSOUT of bk is the LCS before bk is executed with
LCSOUTbk =
{
T (bk, cs)
∣∣cs ∈ LCSINbk
}
. (17)
LCS computation also comprises two phases of initialization
and fixed-point computation. The only difference is that the
initialization phase starts from the exit basic block and ends
in the entry basic block. Detailed results for the motivational
example are reported in Table II. We let the program LCS be
the LCSOUT of the entry basic block, i.e., LCS = LCSOUTb0 . It
is noted that since the presented cache analysis technique is
based on the fixed-point computation over the program CFG,
it inherently handles loop structures in the CFG.
C. Guaranteed WCET Reduction
Conceptually, the program RCS is the set of all possible
cache states after the program finishes execution by any exe-
cution path, and the program LCS is the set of all cache
states, where each cache state contains memory blocks that
may be first referenced after the program starts execution, for
any execution path to follow. Both RCS and LCS could con-
tain multiple cache states. Each pair with one cache state cs
from the program RCS and one cache state cs′ from the pro-
gram LCS represents one possible execution path between the
two consecutive executions. For any cache line ci in a pair, if
cs[i] is equal to cs′[i] and they are not equal to ⊤, then there
is certainly a hit and thus WCET reduction. Whether there
is a hit for a particular cache line can be determined by the
function H defined as follows.
∀cs ∈ CS, cs′ ∈ CS and ci ∈ CL, where i ∈ {0,
1, . . . ,Nc − 1}
H
(
cs, cs′, ci
)
=
{
1: if cs[i] = cs′[i] ∧ cs[i] 
= ⊥
0: otherwise. (18)
The number of hits can be counted with the function HT
defined as
∀cs ∈ CS and cs′ ∈ CS
HT
(
cs, cs′
)
=
Nc−1∑
i=0
H
(
cs, cs′, ci
)
. (19)
The guaranteed number of hits among all possibilities is
calculated as
G(RCS,LCS) = min
cs∈RCS,cs′∈LCS
(
HT
(
cs, cs′
))
. (20)
Given that the main memory access time and the cache
access time are, respectively, tm and tc, the guaranteed WCET
reduction is computed as
E¯g = G(RCS,LCS)× (tm − tc)
≈ G(RCS,LCS)× tm (21)
where the approximation can be taken if tc ≪ tm.
Authorized licensed use limited to: University of York. Downloaded on August 23,2020 at 17:41:15 UTC from IEEE Xplore.  Restrictions apply. 
592 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 36, NO. 4, APRIL 2017
For the motivational example, there are two cache states
in RCS (RCSOUTb3 ) and two cache states in LCS (LCSOUTb0 ).
In total, there are four pairs and the number of hits are
calculated to be 3, 2, 2, and 2 with (19). For instance,
HT ([m4,m1,m2,m3], [m0,m1,m2,m3]) = 3. Therefore, the
guaranteed number of hits is 2 according to (20), no mat-
ter which path the program takes. From (21), the guaranteed
WCET reduction is 2 × (tm − tc), or approximately 2 × tm,
when tc ≪ tm. It is noted that this result is obtained from the
small example used for illustration. We expect more WCET
reduction for larger realistic programs.
We remark that the direct-mapped cache (i.e., 1-way set-
associative cache) is assumed in Fig. 6. The presented tech-
nique can be adapted to handle set-associative cache. For
example, considering fully associative cache, when com-
puting RCSOUTb3 from RCS
IN
b3 , the memory block m4 can
be loaded to any cache line, which gives RCSOUTb3 five
more cache states, i.e., [m0,m4,m2,m3], [m0,m1,m4,m3],
[m0,⊤,m4,m3], [m0,m1,m2,m4], and [m0,⊤,m2,m4]. From
this, we can see that the number of cache states in RCS and
LCS is larger for set-associative cache, which means that the
guaranteed WCET reduction could be smaller. Details can be
found in [10].
Using the memory analysis technique presented in this sec-
tion, together with standard WCET analysis approaches, the
effective WCET of Ci(2) and subsequent executions of Ci can
be derived. Shorter WCET leads to smaller sampling period of
the control system, which will be exploited in the next section
to achieve better QoC.
V. CONTROLLER DESIGN FOR UNIFORM AND
NONUNIFORM SAMPLING
In this section, we first describe the basics of feedback
control applications under consideration. Then, it is explained
how the controller can be designed for the memory-oblivious
uniform sampling scheme. Lastly, the novel controller design
approach tailored for nonuniform sampling is reported. The
focus of this paper is on the linear state-feedback control of
linear systems. Our proposed memory-aware embedded con-
trol systems design method can also be applied to nonlinear
control law like MPC and nonlinear systems.
A. Basics of Feedback Control Applications
1) Plant Dynamics: A control application is responsible
for controlling a plant or dynamic system. In particular, we
consider linear single-input single-output (SISO) control appli-
cations, where the dynamic behavior is modeled by a set of
differential equations
x˙(t) = Ax(t)+ Bu(t)
y(t) = Cx(t) (22)
where x(t) ∈ Rn is the system state, y(t) is the system output
and u(t) is the control input. The number of system states is
n. There is often an overshoot constraint that y(t) cannot be
larger than a certain value. A, B, and C are system matrices
of appropriate dimensions. These system matrices are physical
properties of the plant. System poles are eigenvalues of A. In
a state-feedback control algorithm, u(t) is computed utilizing
x(t) (feedback signals) and then applied to the plant, which is
expected to achieve certain desired behavior.
2) Discretized Dynamics: As discussed in Section III, in an
embedded implementation platform, the overall control loop
performs three operations: measure x(t), compute u(t), and
apply u(t). Generally, these operations are performed only at
discrete time instants. The system is sampled at the measure
operations. With the sampling period h, the continuous-time
system in (22) can be transformed into a discrete-time system
x[k + 1] = Adx[k] + Bdu[k]
y[k] = Cdx[k] (23)
where sampling instants are t = tk (k = 1, 2, 3, . . .) and h =
tk+1 − tk. It is noted that h might not be a constant. x[k] and
u[k] are the values of x(t) and u(t) at t = tk and
Ad = eAh, Bd =
∫ h
0
(
eAtdt
)
· B, Cd = C. (24)
Clearly, Ad and Bd are dependent on the sampling period h.
We remark that the WCET of the control program is com-
puted based on the instruction cache analysis and does not
change with the exact model parameters of the system matri-
ces, since the instructions and the cache/main memory access
times remain unchanged.
3) System Controllability: Controllability of a discrete
system is defined as the ability to transfer the system from
any initial state x[0] = x0 to any desired final state x[kf ] = xf .
The controllability matrix is
CO =
[
Bd AdBd A2dBd · · · A
n−1
d Bd
]
. (25)
A system is controllable if and only if CO is invertible, or
equivalently, rank(CO) = n.
4) Feedback Controller: In this paper, we consider regu-
lation problems with constant output reference. That is, the
overall control objective is to make y[k] → r as soon as pos-
sible, where r is the reference value. Toward this, we need
to design u[k] utilizing the states x[k] in a state-feedback
controller. The general structure is as follows:
u[k] = K · x[k] + F · r (26)
where K is the feedback gain and F is the feedforward gain.
In this state-feedback control, the data input of the control
program mainly depends on the system state, which has n
components, with each of them deriving its value from a cer-
tain range. Combining all components, the number of possible
data inputs is very large, even after discretization.
5) Closed-Loop System: With the feedback controller as
shown in (26), the system dynamics in (23) becomes
x[k + 1] = (Ad + BdK)x[k] + BdFr (27)
that is, closed-loop dynamics. Different locations of closed-
loop system poles, i.e., eigenvalues of (Ad + BdK), result
in different system behaviors. In pole-placement, the desired
poles p can be decided with empirical or optimization tech-
niques. This method is feasible since we have freedom to
choose the feedback gain K. All eigenvalues of (Ad + BdK)
must have absolute values of less than unity in order to ensure
system stability [17].
Authorized licensed use limited to: University of York. Downloaded on August 23,2020 at 17:41:15 UTC from IEEE Xplore.  Restrictions apply. 
CHANG et al.: CACHE-AWARE EMBEDDED CONTROL SYSTEMS DESIGN 593
6) Feedback and Feedforward Gain: Once pole locations
are decided, we can construct the following characteristics
equation of z with these poles as roots:
zn + γ1z
n−1 + γ2z
n−2 + · · · + γn = 0. (28)
Then we define
γc(Ad) = And + γ1A
n−1
d + γ2A
n−2
d + · · · + γnI (29)
where I is the n-dimensional identity matrix. According to
Ackermann’s formula [16], the feedback gain to stabilize the
system is calculated as
K = [0 · · · 0 1] CO−1 γc(Ad). (30)
The static feedforward gain F is designed to achieve y[k] → r
and computed by
F = 1/
(
Cd(I − Ad − BdK)−1Bd
)
. (31)
7) Restricted Pole-Placement: If the system is controllable,
i.e., CO has full rank, there is no restriction on pole locations.
The feedback gain K can be determined with (30). If CO does
not have full rank, some of the poles cannot be modified with
any choice of K and thus are uncontrollable. Since CO is
not invertible, (30) does not work, either. In this case, if the
uncontrollable poles are stable (with absolute values of less
than unity), then the system is stabilizable. Restricted pole-
placement can be used for stabilizable systems in the way
that only controllable poles are placed in the desired locations
and uncontrollable poles remain untouched. Therefore, for the
embedded control systems design discussed in this paper, we
require the system to be at least stabilizable, if not controllable.
8) QoC: There are various possible metrics to quan-
tify QoC. In this paper, we consider settling time as our
performance index. The time it takes for the system output
y[k] to reach and stay in a closed region around the reference
value r (e.g., 0.98r to 1.02r) is the settling time of a control
loop. Shorter settling time implies better QoC.
9) Input Saturation: In almost every real-world system,
there is some maximum available input signal and the con-
troller needs to be designed such that the maximum value of
u[k] does not exceed this limit Umax, i.e., u[k] ≤ Umax. For
example, in electric motor control, the magnitude of the input
current is always limited.
B. Controller Design for Uniform Sampling
As can be derived from (2) and (3), for an application Ci
under the conventional memory-oblivious sampling scheme
S1, the constant sampling period h is larger than the constant
sensing-to-actuation delay τ sai . Therefore, the discrete-time
system in (23) becomes a sampled-data system [18] as
x[k + 1] = Adx[k] + B1
(
τ sai
)
u[k − 1] + B0
(
τ sai
)
u[k] (32)
where
B0
(
τ sai
)
=
∫ h−τ sai
0
eAtdt · B, B1
(
τ sai
)
=
∫ h
h−τ sai
eAtdt · B.
In (32), it is assumed that u[ − 1] = 0 for k = 0. We notice
that the system dynamics depends on both u[k] and u[k − 1].
Thus, we define a new system state z[k] = [ x[k] u[k − 1] ]T
and the transformed system becomes
z[k + 1] = AS1z[k] + BS1u[k]
y[k] = CS1z[k] (33)
where
AS1 =
[
Ad B1
(
τ sai
)
0 0
]
BS1 =
[
B0
(
τ sai
)
I
]T
,CS1 =
[
Cd 0
]
. (34)
Next, we apply the following input signal:
u[k] = KS1 · z[k] + FS1 · r. (35)
The closed-loop system is then
z[k + 1] = (AS1 + BS1KS1)z[k] + BS1FS1r. (36)
In order to find the poles resulting in the best QoC with
the pole-placement technique, we formulate a constrained
optimization problem. Decision variables are the controllable
closed-loop system poles, i.e., the controllable eigenvalues of
(AS1 + BS1KS1). The optimization objective is the QoC. One
constraint is that the closed-loop system is stable, i.e., the
decision variables have absolute values of less than unity.
The other constraint is the input saturation. This optimiza-
tion problem can be solved by searching poles in the decision
space and accelerated with heuristics, such as evolutionary
algorithms. The feedback gain KS1 is then calculated accord-
ing to (30). The feedforward gain FS1 is computed by (31).
As long as (AS1,BS1) is stabilizable, i.e., uncontrollable poles
have absolute values of less than unity, the above design is
feasible. This design method is adapted from sampled-data
systems literature and is suitable for the systems with known
sensing-to-actuation delay shorter than the sampling period.
C. Controller Design for Nonuniform Sampling
As has been presented, in our proposed memory-aware
embedded control systems design, the execution of the applica-
tions is reordered, such that the cache reuse can be increased.
The increased cache reuse shortens the WCETs of the con-
trol programs and results in a nonuniform sampling order
with a shorter average sampling period. In the following, we
show how the controller is tailored to exploit these shortened
WCETs to improve the QoC.
As discussed in Section III, the sampling period of a control
application Ci in S2 varies as follows:
hi(1)→ hi(2)→ hi(3)→ hi(1)→ hi(2)→ hi(3)→ · · ·
As illustrated in Fig. 7, the dynamics switches periodically
among the following three systems:
x[k + 1] = A1dx[k] + B1du[k] (37)
x[k + 2] = A2dx[k + 1] + B2du[k + 1] (38)
x[k + 3] = A3dx[k + 2] + B3du[k + 2]. (39)
The output is then
∀l ∈ {1, 2, 3}
y[k + l − 1] = Cldx[k + l − 1]. (40)
Authorized licensed use limited to: University of York. Downloaded on August 23,2020 at 17:41:15 UTC from IEEE Xplore.  Restrictions apply. 
594 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 36, NO. 4, APRIL 2017
Fig. 7. Periodically switched sampling periods for Ci in S2.
Among them
Ald = e
Ahi(l),Bld =
∫ hi(l)
0
(
eAtdt
)
· B,Cld = C. (41)
Next, we need to design u[k], u[k+ 1], and u[k+ 2] utilizing
the available feedback signals.
1) Design of u[k]: The latest feedback signal available for
computation of u[k] is x[k − 1]. Therefore, the input u[k] is
designed as follows:
u[k] = K11x[k − 1] + F11r. (42)
We define the new system state as
z[k] =
[
x[k]
x[k − 1]
]
. (43)
The system in (37) is then transformed into the following:
z[k + 1] = A11z[k]+ B11u[k]
y[k] = C11z[k] (44)
where
A11 =
[
A1d 0
I 0
]
B11 =
[
B1d 0
]T
,C11 =
[
C1d 0
]
. (45)
The bold letter 0 denotes the zero matrix of an appropriate
dimension. With the input in (42), the closed-loop dynamics
of the system (37) in the sampling period hi(1) becomes
z[k + 1] = A1clz[k]+ B11F11r (46)
where
A1cl =
[
A1d B
1
dK
1
1
I 0
]
. (47)
The design of F11 will be discussed later. As analyzed in
Section V-A, the closed-loop system must be stabilizable, i.e.,
uncontrollable poles of A1cl must have absolute values of less
than unity. Decision variables are controllable poles of A1cl and
used to compute K11 . Details of the design method for K
1
1 with
delayed input can be found in [19].
2) Design of u[k + 1]: With the augmented state shown
in (43), the system dynamics in (38) becomes
z[k + 2] = A21z[k + 1] + B21u[k + 1]
y[k + 1] = C21z[k + 1] (48)
where
A21 =
[
A2d 0
I 0
]
B21 =
[
B2d 0
]T
,C21 =
[
C2d 0
]
. (49)
By replacing z[k + 1] in (48) with (46), we obtain
z[k + 2] = A21A1clz[k]+ A21B11F11r + B21u[k + 1]. (50)
Since the latest feedback signal available for the computation
of u[k + 1] is x[k] as shown in Fig. 7, we use the following
input signal in the sampling period hi(2):
u[k + 1] = K21z[k]+ F21r. (51)
The overall closed-loop dynamics after the sampling period
h1(2) becomes
z[k + 2] = A2clz[k]+ A21B11F11r + B21F21r (52)
where
A2cl = A
2
1A
1
cl + B
2
1K
2
1 . (53)
Again the closed-loop system has to be stabilizable. Therefore,
uncontrollable poles of A2cl must have values of less than unity.
Decision variables are controllable poles of A2cl and used to
compute K21 according to (30). The design of F21 will be
discussed later.
3) Design of u[k + 2]: With the augmented state shown
in (43), the system dynamics in (39) becomes
z[k + 3] = A31z[k + 2] + B31u[k + 2]
y[k + 2] = C31z[k + 2] (54)
where
A31 =
[
A3d 0
I 0
]
B31 =
[
B3d 0
]T
,C31 =
[
C3d 0
]
. (55)
By replacing z[k + 2] in (54) with (52), we obtain
z[k + 3] = A31A2clz[k]+ B31u[k + 2] + A31A21B11F11r + A31B21F21r.
(56)
For the above system, we use the following input signal in the
sampling period hi(3):
u[k + 2] = K31z[k]+ F31r. (57)
The overall closed-loop dynamics after the sampling period
h1(3) becomes
z[k + 3] = A3clz[k]+ A31A21B11F11r + A31B21F21r + B31F31r (58)
where
A3cl = A
3
1A
2
cl + B
3
1K
3
1 . (59)
In order for the closed-loop system to be stabilizable, uncon-
trollable poles of A3cl must have values of less than unity.
Decision variables are controllable poles of A3cl and used to
compute K31 according to (30). To sum up, the pole-placement
is possible and the feedback gains K11 , K
2
1 , and K
3
1 can be
designed using the presented technique as long as A1cl, A
2
cl,
and A3cl are stabilizable.
The pole-placement can be used to find the values of all
decision variables, i.e., controllable poles of A1cl, A
2
cl, and A
3
cl,
resulting in the optimal QoC while ensuring the stability and
respecting the input saturation, either empirically or with opti-
mization techniques. Feedback gains can then be computed as
discussed before. In this paper, poles are empirically placed,
Authorized licensed use limited to: University of York. Downloaded on August 23,2020 at 17:41:15 UTC from IEEE Xplore.  Restrictions apply. 
CHANG et al.: CACHE-AWARE EMBEDDED CONTROL SYSTEMS DESIGN 595
Fig. 8. Experimental setup of servo motor position control.
since our focus is to show the benefits of memory-aware
embedded control systems design. It is possible to formulate
the pole-placement optimization problem for nonuniform sam-
pling, similar to the case of uniform sampling in Section V-B.
However, unlike uniform sampling, the number of dimensions
in the decision space increases linearly with the number of
consecutive executions in this nonuniform sampling order. It
is challenging to solve such a nonlinear and nonconvex opti-
mization problem with large decision space, which can be part
of future research.
4) Design of F11 , F21 , and F31: The overall system dynamics
for the entire cycle of three sampling periods hi(1), hi(2), and
hi(3) can be obtained with (46), (52), and (58)⎡
⎣ z[k + 1]z[k + 2]
z[k + 3]
⎤
⎦ = Acl ·
⎡
⎣ z[k]z[k]
z[k]
⎤
⎦+ Bcl · FS2 · r
⎡
⎣ y[k]y[k + 1]
y[k + 2]
⎤
⎦ = Ccl ·
⎡
⎣ z[k]z[k + 1]
z[k + 2]
⎤
⎦ (60)
where
Acl =
⎡
⎣A
1
cl 0 0
A2cl 0 0
A3cl 0 0
⎤
⎦
Bcl =
⎡
⎣ B
1
1 0 0
A21B
1
1 B
2
1 0
A31A
2
1B
1
1 A
3
1B
2
1 B
3
1
⎤
⎦
Ccl =
⎡
⎣C
1
1 0 0
0 C21 0
0 0 C31
⎤
⎦
FS2 =
[
F11 F
2
1 F
3
1
]T
. (61)
Similar to the computation of the feedforward gain in (31),
FS2 can be calculated as follows:
FS2 =
(
Ccl(I − Acl)−1Bcl
)−1
×
[
1 1 1
]T
. (62)
This controller design method is illustrated with the con-
trol application C1 under the sampling order S2 and can be
applied for all stabilizable linear SISO control applications
and memory-aware nonuniform sampling schemes.
As can be seen above, these gain values are determined in
the controller design involving sampling periods and sensing-
to-actuation delays, which are constrained by the WCETs.
The cache state contains the instructions to be executed and
changes as the program runs. Hence, the WCETs of the control
programs are related to the cache states. The control system
state changes depending on the control input, which is com-
puted by the control program, based on the data input, gain
TABLE III
EXPERIMENTAL CONFIGURATION FOR MEMORY ANALYSIS
values, and the instructions that are executed in the cache state.
Therefore, there is an indirect link that relates the cache states
to the system states.
VI. EXPERIMENTAL RESULTS
In this paper, we consider a typical automotive-use con-
troller, equipped with a processor, on-chip memory as cache
and flash as main memory, shown in Fig. 1.1 As a case study,
we work on three control applications: C1, C2, and C3. C1 is
position control of a servo motor that can be used, e.g., in a
steer-by-wire system [20]. The experimental setup is shown
in Fig. 8. C2 is speed control of a dc motor that can be used
in electric vehicle cruise control [21]. C3 is control of the
electronic wedge brake system developed by Siemens as a
brake-by-wire solution [22]. All three control applications run
on the same processor.
In this section, for the two sampling orders S1 and S2 as
described in Section III, first, the experimental configuration
and time stamps based on the analysis shown in Section IV
are presented. Then, the memory-oblivious and memory-aware
controller designs are discussed, with the servo motor position
control C1 as an example. Lastly, comparison in QoC of S1
and S2 for all three applications is reported.
A. Memory Analysis Experimental Configuration
and WCET Results
As shown in Table III, the processor clock frequency is
20MHz. The cache is set to have 128 cache lines and each
cache line is 16 bytes. When there is a cache hit, it takes
1 clock cycle to fetch the instruction and when there is a cache
miss, it takes 100 clock cycles. Based on standard WCET anal-
ysis techniques applied to the control programs, the WCETs
of all three applications without any cache reuse in S1 are
computed to be
Ewc1 = 907.55 µs,E
wc
2 = 645.25 µs,E
wc
3 = 749.15 µs.
Thus, the uniform sampling period as in (2) is
h =
∑
i=1,2,3
Ewci = 2301.95 µs.
The constant sensing-to-actuation delay τ sai is given by (3)
∀i ∈ {1, 2, 3}
τ sai = E
wc
i .
In S2, according to (4)
∀i ∈ {1, 2, 3}
E¯wci (1) = E
wc
i .
1For instance, Infineon XC23xxB Series has a single processor with a min-
imum operating frequency of 20 MHz. It is typically equipped with a small
size of on-chip SRAM memory and up to 256 kB flash memory.
Authorized licensed use limited to: University of York. Downloaded on August 23,2020 at 17:41:15 UTC from IEEE Xplore.  Restrictions apply. 
596 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 36, NO. 4, APRIL 2017
TABLE IV
WCET RESULTS WITH AND WITHOUT CACHE REUSE FOR ALL THREE CONTROL APPLICATIONS
Fig. 9. Time stamps of both sampling orders S1 and S2 based on WCET results.
Based on the memory analysis approach presented in
Section IV, the guaranteed numbers of cache hits for the three
applications are
G1 = 92, G2 = 95, G3 = 104.
From the memory configuration in Table III, the cache access
time is 1 processor clock cycle, i.e., tc = 0.05 µs, and the
main memory access time is 100 processor clock cycles, i.e.,
tm = 5 µs. Therefore, according to (21), the guaranteed WCET
reduction due to cache reuse for C1 can be calculated as
E¯g1 = G1 × (tm − tc) = 92 × (5− 0.05) µs = 455.4 µs.
Similarly, we have
E¯g2 = 470.25 µs, E¯
g
3 = 514.8 µs.
According to (5), we can compute the reduced effective
WCETs to be
∀j ∈ {2, 3}
E¯wc1 (j) = 452.15 µs, E¯wc2 (j) = 175 µs, E¯wc3 (j) = 234.35 µs.
Then the sampling periods can be obtained. Taking C1 as an
example, using (6) and (7)
h1(1) = 907.55 µs, h1(2) = 452.15 µs, h1(3) = 2665.25 µs.
As in (8), the average sampling period of all three applications
in S2 is calculated to be 1341.65 µs with 42% of reduc-
tion compared to the uniform sampling period in S1. The
corresponding sensing-to-actuation delay τ sai (j) is obtained
with (11). WCET results with and without cache reuse for
all three applications are shown in Table IV. Time stamps of
both S1 and S2 are presented in Fig. 9.
B. Control Applications
We evaluate QoC of all three control applications to show
the effectiveness of our proposed method. Details of memory-
oblivious and memory-aware controller design are presented
with C1 as an example. Controllers of C2 and C3 are designed
with the same method. For C1, as shown in Fig. 8, the shaft of
the servo motor (Harmonic Drive) [23] is attached to a rigid
stick with 300g of weight at the end. The position of the motor
shaft is measured by digital quadrature encoders attached to
the motor shaft. The motor provides a desired amount of torque
(computed by the control program) using a digital-to-analog
converter (DAC) via a servo amplifier (Maxon Motor) [24].
1) Control Objective: The above system of servo motor
position control has two states: x1(t), the angular position
and x2(t), the angular velocity of the shaft. Initially, they are
both set to be 0. The measure operation reads the quadrature
encoder to obtain x1(t) and x2(t); the compute operation exe-
cutes the control program and computes the input current u(t)
for the motor and the actuate operation is performed using
the DAC. The control objective is to keep the rigid load at
the angular position of 0.3 radian, i.e., r = 0.3rad. Since the
output is the angular position x1(t), we have y(t) = x1(t). The
dynamics of the above system is represented as in (22) with
A =
[
0 1
37 7.5
]
, B =
[
0
6450
]
, C = [ 1 0 ].
For C2, the control objective is for the dc motor to reach a
certain speed. For C3, the control objective is to achieve a
certain braking force.
2) Input Saturation: In C1, the maximum current that
the servo amplifier can supply to the motor is 1.5 ampere.
Therefore, |u(t)| ≤ 1.5 A must be respected by the controller.
In C2, the input is the effective voltage of the battery pack that
is used to power up the dc motor and cannot exceed 600 V. In
C3, the control input is the power output to the brake wedge
and the saturation is at 36 kW.
C. Controller Design for the Memory-Oblivious Uniform
Sampling Scheme
As described in Section III, S1 is the conventional memory-
oblivious sampling order. For C1, based on the timing results
in Section VI-A, we obtain AS1 and BS1 as in (34)
AS1 =
⎡
⎣1.0001 0.0023 0.01070.0844 0.9830 5.7764
0 0 0
⎤
⎦
BS1 =
⎡
⎣0.00628.9446
1
⎤
⎦, CS1 = [ 1 0 0 ].
By calculating the controllability matrix as in (25), (AS1,BS1)
is controllable. Therefore, pole-placement is possible and we
choose the pole locations by solving the optimization problem
p = [ 0.19 0.63 0.58 ].
Authorized licensed use limited to: University of York. Downloaded on August 23,2020 at 17:41:15 UTC from IEEE Xplore.  Restrictions apply. 
CHANG et al.: CACHE-AWARE EMBEDDED CONTROL SYSTEMS DESIGN 597
TABLE V
FEEDBACK AND FEEDFORWARD GAINS FOR ALL THREE CONTROL APPLICATIONS
With the above choice of poles, we obtain the controller gains
as in (30) and (31)
KS1 =
[
−3.7212 −0.0432 −0.1734
]
, FS1 = 3.7145.
Controllers of C2 and C3 are designed in the same way.
Feedback and feedforward gains for all three applications
under S1 are summarized in Table V.
D. Controller Design for Memory-Aware Nonuniform
Sampling Order
We consider S2 described in Section III as an example
memory-aware sampling order. The controller design of C1
is illustrated below.
1) Design of K11 : With delayed input in (42), the augmented
system matrices in (45) are given by
A11 =
⎡
⎢⎣
1 0.0009 0 0
0.0335 0.9932 0 0
1 0 0 0
0 1 0 0
⎤
⎥⎦
B11 =
[
0.0027 5.8367 0 0
]T
C11 =
[
1 0 0 0
]
.
The system (A11,B
1
1) is stabilizable with a delayed input
in (42). The closed-loop system poles are chosen to be
p1,1 =
[
0.86 0.51 0.6232
]
and
K11 =
[
−4.8825 −0.049
]
.
The closed-loop system matrix, as shown in (47), is given by
A1cl =
⎡
⎢⎣
1 0.0009 −0.013 −0.0001
0.0335 0.9932 −28.4978 −0.2863
1 0 0 0
0 1 0 0
⎤
⎥⎦.
2) Design of K21 : With input in (51), the augmented system
matrices in (50) are given by
A21A
1
cl =
⎡
⎢⎣
1 0.0014 −0.0258 −0.0003
0.0501 0.9899 −28.4016 −0.2853
1 0.0009 −0.013 −0.0001
0.0335 0.9932 −28.4978 −0.2863
⎤
⎥⎦
B21 =
[
0.0007 2.9105 0 0
]T
C21 =
[
1 0 0 0
]
.
It can be verified that (A21A
1
cl,B
2
1) is not controllable but sta-
bilizable since the uncontrollable pole is located at 0. The
controllable poles are placed at
p1,2 =
[
0.77 0.33 0.78
]
.
The above location of poles leads to the following feed-
back gain:
K21 =
[
−0.4174 0.0651 −0.4174 −0.0855
]
.
Next, the closed-loop system matrix A2cl after the sampling
period h1(2) can be computed as shown in (53)
A2cl =
⎡
⎢⎣
0.9998 0.0014 −0.0261 −0.0003
−1.1649 1.1795 −29.6166 −0.5341
1 0.0009 −0.013 −0.0001
0.0335 0.9932 −28.4978 −0.2863
⎤
⎥⎦.
3) Design of K31 : With input in (57), the augmented system
matrices in (56) are given by
A31A
2
cl =
⎡
⎢⎣
0.9968 0.0045 −0.1042 −0.0017
−1.0444 1.1564 −29.0371 −0.5236
0.9998 0.0014 −0.0261 −0.0003
−1.1649 1.1795 −29.6166 −0.5341
⎤
⎥⎦
B31 =
[
0.0227 17.013 0 0
]T
C31 =
[
1 0 0 0
]
.
Here, (A31A
2
cl,B
3
1) is not controllable but stabilizable since the
uncontrollable pole is located at 0. The controllable poles are
placed at
p1,3 =
[
0.12 0.08 0.3
]
.
The above location of poles leads to the following feedback
gain:
K31 =
[
−1.9086 −0.0617 −1.9086 0.0198
]
.
Next, the closed-loop system matrix A3cl after the sampling
period h1(3) as shown in (59) is
A3cl =
⎡
⎢⎣
0.9534 0.0031 −0.1476 −0.0013
−33.5156 0.1068 −61.5073 −0.1875
0.9998 0.0014 −0.0261 −0.0003
−1.1649 1.1795 −29.6166 −0.5341
⎤
⎥⎦.
The feedforward gain is computed by (62) as
FS2 =
[
4.8767 0.8291 3.8114
]T
.
Controllers of C2 and C3 are designed with the same method.
Feedback and feedforward gains of all three applications under
S2 are summarized in Table V. It can be seen that the gains of
C2 are small and that the gains of C3 are large. This difference
is mainly due to the fact that: 1) different control applications
have different control input saturations and 2) the system state
values of different control applications are in different ranges.
Authorized licensed use limited to: University of York. Downloaded on August 23,2020 at 17:41:15 UTC from IEEE Xplore.  Restrictions apply. 
598 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 36, NO. 4, APRIL 2017
Fig. 10. Control system output of three different sampling orders for the
control application C1.
Fig. 11. Control system output of three different sampling orders for the
control application C2.
E. QoC Improvement of All Control Applications
In order to show that S2 is not a singular sampling order that
is able to achieve better QoC, besides S1 and S2 introduced
in Section III, we evaluate the QoC of another memory-aware
sampling order S3 in the experiment
S3: C1(1)→ C1(2)→ C2(1)→ C2(2)→ C2(3)→ C3(1)→
C3(2)→ C3(3)→ C3(4)→ · · ·
The control system outputs of all three sampling orders for
all three applications are presented in Figs. 10–12. QoC com-
parison among S1, S2, and S3 for C1, C2, and C3 is reported
in Table VI. It can be seen that both memory-aware sam-
pling orders improve the QoC compared to the conventional
memory-oblivious sampling order in all control applications.
By comparing S2 and S3, S2 results in better QoC in C1,
but S3 achieves better QoC in C2 and C3. Neither of them
dominates the other in all applications. Intuitively, the more
frequent an application is sampled, the better performance
could be achieved. For C1, the average sampling period in S2
is 1341.65 µs as computed in Section VI-A, 29.5% smaller
than the average sampling period in S3 1903.58 µs, which
can be computed from the WCET results reported in Table IV.
Therefore, QoC of C1 in S2 is better than that in S3. Similarly,
the average sampling period in S3 for C3 is 951.79 µs, 29.1%
smaller than the average sampling period in S2. Therefore,
QoC of C3 in S3 is better than that in S2. For C2, the aver-
age sampling period in S3 is 1269.05 µs, which is close to
the average sampling period in S2. After the evaluation, we
find out that QoC in S3 is better than QoC in S2. It is an
interesting problem to find the optimal sampling order that
maximizes the system QoC, which will be the future work.
Considering the optimal sampling order, the QoC improvement
Fig. 12. Control system output of three different sampling orders for the
control application C3.
TABLE VI
QOC COMPARISON FOR ALL THREE APPLICATIONS
AMONG THREE SAMPLING ORDERS
from memory-aware embedded control systems design can be
even higher than what is presented in this paper.
VII. CONCLUSION
This paper follows the line of work considering imple-
mentation platforms during the design of embedded control
systems in order to improve QoC and reduce the deviation
between the design and implementation phases. In particu-
lar, we investigate the impact of a memory-aware sampling
order on control systems. While as the first paper considering
memory in embedded control systems design, we specifi-
cally exploit the reuse of direct-mapped cache, the presented
technique can be extended to handle other types of cache
(e.g., set-associative cache) and memory (e.g., scratchpad).
The proposed framework can be used to evaluate the impact
of other memory-related problems, such as cache locking and
scratchpad allocation, on control applications. The benefits of
using a cache-aware controller will be even more for compu-
tationally more expensive control algorithms, such as MPC.
However, the goal of this paper is to make a case for cache
memory-aware controller design. How such memory-aware
controllers may be designed for different classes of control
algorithms, we believe, would be a topic of future papers also
from other research groups.
From the control-theoretic perspective, the solution
proposed in this paper is targeting regulation problems with
constant output reference. As has been presented, we use static
feedforward gain to regulate the control output. A time-varying
reference (e.g., sinusoidal) requires a tracking control algo-
rithm, which uses dynamic feedforward gain to reflect the
variation in reference with time. A meaningful QoC for a
tracking problem is the sum of error over time. Due to the
nonuniform nature of sampling in this paper, further investi-
gation is required for a suitable tracking algorithm, which is
a part of future work.
After proving that the memory-aware sampling scheme does
have considerable influence on control system performance,
Authorized licensed use limited to: University of York. Downloaded on August 23,2020 at 17:41:15 UTC from IEEE Xplore.  Restrictions apply. 
CHANG et al.: CACHE-AWARE EMBEDDED CONTROL SYSTEMS DESIGN 599
the next problem to solve is finding the optimal sampling order
that maximizes the system QoC. It consists of two stages. First,
given a sampling order, we need to find the optimal poles that
maximize the system QoC while the input saturation constraint
is respected. This is a challenging problem to solve, due to
the nonconvexity, nonlinearity and many-dimensional decision
space as discussed before. Second, based on the results from
the first stage, the optimal sampling order needs to be found.
As the number of applications grows, the number of possible
periodic sampling orders increases exponentially. Considering
that the QoC evaluation of one sampling order is computa-
tionally heavy, brute force is practically infeasible. Heuristic
methods need to be developed that are able to achieve a flexi-
ble balance between optimality and scalability. This will be a
piece of future work along the research path of memory-aware
embedded control systems design.
REFERENCES
[1] L. Greco, D. Fontanelli, and A. Bicchi, “Design and stability analysis for
anytime control via stochastic scheduling,” IEEE Trans. Autom. Control,
vol. 56, no. 3, pp. 571–585, Mar. 2011.
[2] S. Samii, P. Eles, Z. Peng, P. Tabuada, and A. Cervin, “Dynamic
scheduling and control-quality optimization of self-triggered control
applications,” in Proc. RTSS, San Diego, CA, USA, 2010, pp. 95–104.
[3] A. Anta and P. Tabuada, “On the benefits of relaxing the periodicity
assumption for networked control systems over CAN,” in Proc. RTSS,
Washington, DC, USA, 2009, pp. 3–12.
[4] K. W. Batcher and R. A. Walker, “Dynamic round-robin task scheduling
to reduce cache misses for embedded systems,” in Proc. DATE, Munich,
Germany, 2008, pp. 260–263.
[5] Infineon. (2009). Product Brief. [Online]. Available: http://
www.infineon.com/dgdl/Pb_XC2300B.pdf?fileId=db3a30432a7fedfc012
ab3c3d7863706
[6] S. G. Robertz, D. Henriksson, and A. Cervin, “Memory-aware feedback
scheduling of control tasks,” in Proc. ETFA, Prague, Czech Republic,
2006, pp. 70–77.
[7] R. Wilhelm et al., “Memory hierarchies, pipelines, and buses for
future architectures in time-critical embedded systems,” IEEE Trans.
Comput.-Aided Design Integr. Circuits Syst., vol. 28, no. 7, pp. 966–978,
Jul. 2009.
[8] H. S. Negi, T. Mitra, and A. Roychoudhury, “Accurate estimation of
cache-related preemption delay,” in Proc. CODES, 2003, pp. 201–206.
[9] S. Chakraborty, T. Mitra, A. Roychoudhury, and L. Thiele, “Cache-aware
timing analysis of streaming applications,” Real Time Syst., vol. 41,
no. 1, pp. 52–85, 2009.
[10] J. C. Kleinsorge, H. Falk, and P. Marwedel, “A synergetic approach to
accurate analysis of cache-related preemption delay,” in Proc. EMSOFT,
Taipei, Taiwan, 2011, pp. 329–338.
[11] H. Lin and P. J. Antsaklis, “Stability and stabilizability of switched
linear systems: A survey of recent results,” IEEE Trans. Autom. Control,
vol. 54, no. 2, pp. 308–322, Feb. 2009.
[12] E. Lavretsky and K. A. Wise, Robust and Adaptive Control. London,
U.K.: Springer, 2013.
[13] J. B. Rawlings and D. Q. Mayne, Model Predictive Control: Theory and
Design. Madison, WI, USA: Nob Hill, 2009.
[14] R. Wilhelm et al., “The worst-case execution-time problem—Overview
of methods and survey of tools,” ACM Trans. Embedded Comput. Syst.,
vol. 7, no. 3, p. 36, 2008.
[15] S. Andalam, A. Girault, R. Sinha, P. Roop, and J. Reineke, “Precise
timing analysis for direct-mapped caches,” in Proc. DAC, Austin, TX,
USA, 2013, p. 148.
[16] J. Ackermann and V. Utkin, “Sliding mode control design based on
Ackermann’s formula,” IEEE Trans. Autom. Control, vol. 43, no. 2,
pp. 234–237, Feb. 1998.
[17] K. J. Åström and R. M. Murray, Feedback Systems: An Introduction for
Scientists and Engineers. Princeton, NJ, USA: Princeton Univ. Press,
2009.
[18] A. Y. Bhave and B. H. Krogh, “Performance bounds on state-feedback
controllers with network delay,” in Proc. CDC, Cancún, Mexico, 2008,
pp. 4608–4613.
[19] D. Goswami, R. Schneider, and S. Chakraborty, “Relaxing signal delay
constraints in distributed embedded controllers,” IEEE Trans. Control
Syst. Technol., vol. 22, no. 6, pp. 2337–2345, Nov. 2014.
[20] P. Yih, “Steer-by-wire: Implications for vehicle handling and safety,”
Ph.D. dissertation, Dept. Mech. Eng., Stanford Univ., Stanford, CA,
USA, 2005.
[21] W. Chang, A. Pröbstl, D. Goswami, M. Zamani, and S. Chakraborty,
“Battery-and aging-aware embedded control systems for electric
vehicles,” in Proc. RTSS, Rome, Italy, 2014, pp. 238–248.
[22] J. Fox et al., “Modeling and control of a single motor electronic wedge
brake,” Siemens VDO Autom., Siemens AG, Regensburg, Germany,
SAE Tech. Rep. 2007-01-0866, 2007.
[23] Harmonic Drive. (2016). Produktbeschreibung PMA. [Online].
Available: http://harmonicdrive.de/produkte/media/catalog/category/
pma_catalogue_3.pdf
[24] Maxon Motor. (2011). Product Specifications. [Online]. Available:
http://www.maxonmotor.com/medias/sys_master/root/8796918153246/
ADS-145391-11-EN-281-282.pdf
Wanli Chang received the Ph.D. degree in elec-
trical and computer engineering from the Technical
University of Munich, Munich, Germany.
He is a Lecturer with the Singapore Institute
of Technology, Singapore. His current research
interest includes resource-aware automotive control
systems.
Dip Goswami received the Ph.D. degree in elec-
trical and computer engineering from the National
University of Singapore, Singapore, in 2009.
He is an Assistant Professor with the
Eindhoven University of Technology, Eindhoven,
The Netherlands. His current research interests
include embedded control systems and
cyber-physical systems and robotics.
Samarjit Chakraborty received the Ph.D. degree
in electrical and computer engineering from ETH
Zurich, Zürich, Switzerland, in 2003.
He is a Professor of Electrical and Computer
Engineering with the Technical University of
Munich, Munich, Germany. His current research
interests include embedded and cyber-physical sys-
tems and software design.
Lei Ju received the Ph.D. degree from the National
University of Singapore, Singapore.
He is an Associate Professor with the School
of Computer Science and Technology, Shandong
University, Jinan, China. His current research
interests include memory architecture design and
optimization and heterogeneous computing.
Chun Jason Xue received the Ph.D. degree from
the University of Texas at Dallas, Richardson, TX,
USA, in 2007.
He is an Associate Professor with the Computer
Science Department, City University of Hong
Kong, Hong Kong. His current research interests
include nonvolatile memories, embedded, and real-
time systems.
Sidharta Andalam received the Ph.D. degree
from the University of Auckland, Auckland,
New Zealand.
He is a Research Fellow with the University
of Auckland. His current research interests include
embedded systems, hybrid systems, automotive soft-
ware, synchronous languages, and software platform
for developing Internet of Things applications.
Authorized licensed use limited to: University of York. Downloaded on August 23,2020 at 17:41:15 UTC from IEEE Xplore.  Restrictions apply. 
