Abstract-One major issue that hinders the use of performance analysis in industrial design processes is the pessimism inherent to any analysis technique that applies to realistic system models. Indeed, such analyses may conservatively declare unschedulable systems that will in fact never miss any deadlines. We advocate the need to compute not only tight upper bounds on worst-case behaviors but also tight lower bounds. As a first step, we focus on uniprocessor systems executing a set of sporadic or periodic hard real-time task chains. Each task has its own priority, and the chains are scheduled according to the fixed-priority pre-emptive scheduling policy. Computing the worst-case end-to-end latency (WCEL) of each chain is complex because of the intricate relationship between the task priorities. Compared to the state of the art, our analysis provides upper bounds on the WCEL in the more general case of asynchronous task chains, and also provides lower bounds on the WCEL both for synchronous and asynchronous chains. Our computed lower bounds correspond to actual system executions exhibiting a behavior that is as close to the worst case as possible, while all other approaches rely on simulations. Extensive experiments show the relevance of lower bounds on the worst-case behavior for the industrial design of real-time embedded systems.
Improving and Estimating the Precision of Bounds
on the Worst-Case Latency of Task Chains the need to compute also a lower bound on WCEL, 1 computed from execution scenarios guaranteed to be feasible. Such lower bounds can of course also be obtained by simulating the system with thousands of execution scenarios and keeping the largest value. Our experiments show that, in general, the lower bounds obtained through simulation are lower than our computed ones.
In this paper, we apply this principle to hard real-time systems consisting of chains of tasks executed on a single core processor under the fixed-priority pre-emptive (FPP) policy. Our task chains follow a periodic or a sporadic activation model, have arbitrary deadlines and can be synchronous or asynchronous. This system model is common to many industrial systems, e.g., on-board software (OBSW) in satellites or flight management systems (FMSs) in avionics: OBSW and FMS are both uniprocessor systems executing periodic and sporadic task chains. The FMS is scheduled according to the ARINC653 standard [1] that defines time partitions in which tasks are scheduled under FPP. For both systems, the strict certification constraints (DO178C for avionics [3] and ECSS-E-ST-40C for space [2] ) impose demonstrating the correctness of the timing behavior.
When tasks are independent and scheduled with FPP, computing the worst-case response time of each task is a well understood problem because the interference that each task may be subject to is limited to its higher-priority tasks. In the case of task chains, however, the problem is a lot more complex: a given task chain σ will be subject to the interference of any other task chain that contains at least one task of a priority higher than the lowest priority of the tasks in σ .
Recent work on the analysis of task chains [15] proposes a solution to this problem, but with significant overapproximation. Earlier work provides upper bounds [8] for synchronous chains but is restricted to synchronous chains. We propose here a novel solution that is both tighter than [15] and that applies to synchronous and asynchronous task chains. We define a priority order on chains that allows us to reason about latency analysis in a way that is similar to the response-time analysis of [7] for fixed-priority nonpre-emptive scheduling of independent tasks. Finally, and most importantly, we are able to compute also lower bounds on the WCEL of task chains. Interestingly, computing lower bounds turns out to be a much more complex problem for periodic task chains than for sporadic ones. Based on the computed lower bounds, we can estimate the precision of the computed upper bound on the latency of each task chain. This paper is organized as follows. Section II discusses related work and how our analysis improves over it. Section III introduces our system model. Sections IV-VII formalize our approach to compute upper bounds on the latency of task chains. Section VIII develops our method to compute lower bounds on task chain latencies. Section IX provides extensive experiments that show the usefulness of the combined use of upper and lower bounds on worst-case latencies. Section X concludes and discusses future work.
II. RELATED WORK
There exists a huge body of literature dealing with the realtime scheduling of tasks and the computation of worst-case response times and latencies. We focus in this section on the particular case of tasks with precedence constraints.
The two papers that are most closely related to this paper are [8] and [15] . Although [8] uses a different terminology (namely, tasks and subtasks instead of chains of tasks), the underlying model used in these two papers and ours is identical. Moreover, both papers address the problem of computing the WCEL of task chains on a single-core processor under the FPP scheduling policy, but they only focus on providing upper bounds and do not discuss at all the tightness of their bounds.
Harbour et al. [8] proposed a framework to analyze the schedulability of a real-time system consisting of a set of periodic synchronous tasks, where each task is itself a sequence of subtasks. They introduce a canonical form where consecutive subtasks have increasing priorities, and they prove that the latency of the task under study remains the same if it is put under canonical form. We improve their analysis in three directions. (i) We present a very precise formalization and we formally prove its correctness ( [8] provides no formal proof of correctness). (ii) Our analysis applies both to synchronous and to asynchronous chains with arbitrary deadlines, i.e., chains with self-interference of forthcoming instances (while [8, Assumption 3] excludes this case). (iii) We compute also lower bounds on the WCEL. Schlatow and Ernst [15] extended the compositional performance analysis (CPA) of [9] to chains of tasks. Compared to CPA, this reduces significantly the pessimism of the computed upper bounds on latencies. Still, the drawback of [15] is to use the same definition for the two distinct concepts of busy window and q-event busy time (see Section IV), which hinders the comprehension of the underlying mechanisms of the analysis. As we demonstrate in this paper, this incurs a significant pessimism. Compared to [15] , we greatly improve the WCEL analysis by computing tighter upper bounds and by providing also lower bounds (which allow the tightness of the WCEL to be measured).
As presented in [17] and then extended to more complex systems in [13] , offsets may be used to model precedence constraints: tasks are grouped into transactions such that tasks of the same transaction do not interfere with each other. Offsetbased latency analysis, which builds on top of task response time analysis, improves over standard latency analysis without dependencies. Still, [15] shows through experiments that the analysis in [15] (over which we improve) outperforms offsetbased analysis.
There is a body of research on parallel applications, where tasks are split into subtasks with precedence constraints that form a graph, in particular the fork-join model [11] , the synchronous parallel task model [14] , and the DAG-based task model [4] . Corresponding analyses thus address more complicated systems and their computed upper bounds are very conservative. We have found no contributions presenting a formal analysis of lower bounds for such systems. It would indeed be interesting to provide such an analysis.
In this paper, we focus on functional task chains, where the end of a task activates the upcoming one, in contrast to cause-effect chains as in [5] , where the dependencies between tasks are data dependencies. In a cause-effect chain, each task is activated independently but reads data produced by the previous task in the chain before executing. The systems we target ultimately are multiprocessor with functional chains on a processor and data dependencies between processors. In future work, we plan to extend our analysis to handle jointly functional and data dependencies.
III. SYSTEM MODEL Unless otherwise specified, all the parameters defined in the following have positive integer values. In particular, we assume a discrete time clock. We consider a uniprocessor real-time system S consisting of a finite set of m task chains scheduled with the FPP scheduling policy. All task chains are independent, meaning that two chains cannot share a task and there is neither task fork nor task join.
Definition 1 (Task):
A task τ i a is defined as a pair (π i a , C i a ) with π i a the priority and C i a the worst-case execution time (WCET) of τ i a .
Definition 2 (Task Chain):
A task chain σ a ∈ S, a ∈ {1, . . . , m}, is defined by:
• A finite sequence of n a distinct tasks denoted
n a a ) with precedence relations such that, for each i ∈ {1, . . . , n a − 1}, τ i+1 a is activated at the completion time of τ i a .
• An activation model (see Definitions 3 and 4) that specifies the activation instants of the first task in the chain τ 1 a .
• A relative deadline D a (see Definition 5 below).
• A synchronous or asynchronous execution policy (see Definition 6) . All priorities are assumed to be distinct. We use the convention that π i a > π j b means that τ i a has a higher priority than τ j b . As a result, τ i a may pre-empt τ j b when it arrives. Task chains are activated from external sources, which can be either periodic timers or various types of sensor devices. We model the activation patterns of chains using arrival functions, or their pseudoinverses called distance functions. These functions can be used to model sporadic as well as periodic activations [9] .
Definition 3 (Arrival Function):
A maximum (resp. minimum) arrival function η + a : N → N (resp. η − a ) returns, for any time interval , an upper (resp. lower) bound on the number of Priorities can be in any order, not necessarily ascending or descending. The following example provides some intuition regarding the complexity of the resulting timing behavior of task chains.
Example 1: Fig. 1 shows an execution of a system with four task chains that interfere according to complex patterns because their priorities are interleaved. Notation 3: For any value V, V, and V denote, respectively, an upper and a lower bound on V.
The schedulability of a real-time system S = {σ a } m a=1 is usually assessed by computing, for each chain σ a , an upper bound a on its WCEL, and checking that a ≤ D a . Computing a widely over-estimated a results in many systems being declared as non schedulable. The problem we address in this paper is therefore twofold. First, we provide a framework to compute upper bounds that are as tight as possible given the complexity of the analysis. Second, we compute, for each chain, a scenario (chosen to be close to the worst case) that exhibits a realizable value for its latency. This value constitutes a lower bound on its WCEL. We can thus measure the pessimism of our WCEL analysis.
IV. UPPER BOUNDS ON CHAIN LATENCIES
In this section, we develop the main concepts needed for computing upper bounds on task chain latencies. We start with an observation: any chain σ a has n a different priority levels (remember that n a denotes the number of tasks in σ a ), but for most of our WCEL analysis, we only need to consider the lowest priority task of each chain. These notations can also be applied to subchains. Note that neither [8] nor [15] use chain priorities, which makes their developments much harder to read.
Definition 9 (Priority of a Chain):
The priority of task chain σ a , denoted π a , is the priority of its lowest priority task
Task τ j b has a lower priority than a chain σ a if and only if π j b < π a . Since all task priorities are different, chain priorities define a total order over task chains.
Notation 4: We use p(a), resp. hp(a), to denote the set of chains with a strictly lower, resp. strictly higher, priority than σ a . Also, hpe(a) = hp(a)∪{σ a }. We denote p b (a) the set of tasks of σ b that have lower priority than σ a , i.e., p b (a) = {τ
A. Upper Bounds on Busy Windows
Most response time analyses for uniprocessor systems rely on some notion of busy window (or busy period) and this paper is no exception. We use here the same notion as in [8] , which extends the original concept of [12] to task chains.
Definition 10 (σ a -Busy Window):
A σ a -busy window is a maximal time interval during which there is always (at least) one instance of a task with priority higher than or equal to σ a that is pending, i.e., it has been previously activated but has not finished yet.
In particular, a σ a -busy window cannot be closed until all pending instances of σ a and higher-priority chains have finished their execution. Among lower-priority chains, only tasks with a priority higher than σ a are considered as part of a σ a -busy window.
Example 2: Our running example of Fig. 1 shows two σ abusy windows: the first one starts with the activation of σ a and ends with the completion of the second instance of σ d . The second one spans the execution of τ 4 b . We will see that, similar to busy-window approaches such as [7] , task instances (of any chain) can only interfere with instances of σ a that are in the same σ a -busy-window. It is therefore useful to have an upper bound on the length of a σ a -busy-window.
Definition 11 (Lower-Priority Interference):
We call lowerpriority interference and denote pI a ( ) the maximum amount of time that chains with priority lower than σ a may execute in any prefix of length of a σ a -busy window. The function pI a can be used to provide an upper bound on the amount of time σ a may be delayed by lower-priority chains inside a σ a -busy window. We will show in Section VI-A how to compute such an interference.
Theorem 1: Let σ a be a task chain. The length of any σ abusy window is upper bounded by the least fixed point BW a of
Proof: By definition, lower-priority chains execute for at most pI a (BW a ) in any prefix of a σ a -busy window of length BW a . Besides, there cannot be any instance of σ a or of a higher-priority chain pending at the beginning of a σ a -busy-window; and any such instance that is activated within a σ a -busy-window is by definition guaranteed to fully execute before the end of that σ a -busy-window. Each chain σ b ∈ hpe(a) therefore accounts for C b times its maximal number of instances activated within BW a , that is, η
Let σ a be a task chain. The number of activations of σ a in a σ a -busy-window is upper bounded by
The proof follows directly from Theorem 1.
The above results still hold if an upper bound is used for the lower-priority interference. In contrast, proving that the upper bound on the length of a σ a -busy-window is reachable (i.e., there exists a σ a -busy-window not greater than that length) requires to prove that the maximum lowerpriority interference and the maximum interference from higher-priority chains can be achieved in one single execution scenario (see Section VIII).
B. Upper Bounds on Busy Times
In order to upper bound the latency of σ a , and similar to, e.g., [7] , we need to first focus on the time it may take to finish executing q instances of σ a within a σ a -busy-window, for q ∈ [1, K a ]. The following definition is the adaptation to task chains of the concept with the same name introduced in [16] . A σ a -busy-window does not necessarily close when a σ a instance finishes execution because there may be, e.g., pending instances of higher-priority chains. This implies that, although two instances of σ a cannot overlap in a schedulable system with constrained deadlines, they may still be part of the same σ a -busy-window.
Definition 12 (q-Event Busy Time):
The q-event busy time of a chain σ a (resp. a task τ i a ), denoted B a (q) (resp. B i a (q)), is the maximum time duration it may take to finish processing the first q instances of σ a (resp. τ i a ) within any σ a -busy-window that contains at least q instances of σ a .
To upper bound the q-event busy time of σ a for q ∈ {1, . . . , K a }, we will upper bound the q-event busy time of some tasks in σ a , depending on their priority with respect to the priority of the chains in hp(a).
Theorem 2: The q-event event busy time of chain σ a is equal to the q-event event busy time of its last task:
Proof: This directly follows from the definitions. We can thus focus on upper bounding the q-event event busy time of the tasks in σ a . For that, we distinguish between the interference due to lower-and higher-priority chains, as well as possible interference from subsequent activations of σ a .
Definition 13 (q-Event Interference):
The q-event lowerpriority (resp. higher-priority) (resp. self) interference with respect to a task τ i a , denoted pI
is the maximum amount of time that chains with priority lower than σ a (resp. a chain σ b with π a < π b ) (resp. forthcoming instances of σ a ) may execute in any prefix of length of a σ a -busy window that finishes at the end of the qth execution of τ i a . Remember that if deadlines are constrained, or for synchronous chains, self-interference cannot happen.
Property 2: The q-event busy time of τ i a is upper bounded by
Proof: The maximum time it may take to fully process the first q instances of τ i a within a σ a -busy-window is upper bounded by the sum of the maximum time it takes to: (1) compute σ a entirely q − 1 times; (2) compute the qth instance of σ a until task τ i a (i.e., the WCET of the subchain σ a[1...i] ); (3) account for the interference due to: a) chains with lower priority than σ a ; b) chains with higher priority than σ a ; and c) subsequent activations of σ a .
This results holds also if one or several of the interference delays is upper bounded. To upper bound such q-event busy times, one needs to upper bound the corresponding interference. We will show in the next sections how to achieve this. Before that, let us show how upper bounds on q-event busy times are used to upper bound the latency of task chains.
C. Upper Bounds on Latencies
Once the busy times are upper bounded, upper bounds on the WCEL of task chains are easily obtained.
Theorem 3 (Worst-Case Latency):
The WCEL of task chain σ a is bounded by
Proof: Consider any instance σ x a of σ a . As a consequence of the definition of σ a -busy-window, σ x a is part of a (unique) σ a -busy-window. Thanks to Proposition 1, we know that there exists q ∈ [1, K a ] such that σ x a is the qth instance in its σ abusy-window. It follows directly from the definition of the q-event busy time that σ x a cannot finish later than B a (q) after the beginning of the σ a -busy-window. Besides, σ x a cannot be activated earlier than δ − a (q) after the beginning of the σ a -busywindow. Hence, the result.
The above result stands even if B a (q) is upper bounded. The next sections focus on upper bounding the lower-priority, higher-priority, and self-interference.
V. SEGMENTS
Like lower-priority tasks in [7] , lower-priority chains in our context may interfere in complex ways, as we discuss at the end of this section. To discuss lower-and higherpriority interference, we develop (and formalize) the concept of segment introduced in [15] .
Intuitively, a segment of a chain σ b with respect to a chain σ a such that π b < π a is a maximal subchain of σ b that may delay σ a . The task immediately before or immediately after a segment of σ b with respect to σ a has lower priority than σ a , i.e., the task belongs to p b (a).
Definition 14 (Inner Segment s b→a ):
An inner segment of σ b with respect to σ a is a subchain ) . Intuitively, the critical segment of σ b with respect to σ a is the segment of σ b that can interfere the most with σ a . We can now state one key property regarding segments, which will allow us to compute precisely the interference of lowerpriority chains. This result does not appear in [15] , which leads to pessimistic results. In contrast, this result is used in [8] , but without a formal proof, and only in the synchronous case. The above result does not make any assumption with respect to deadlines (constrained or not) or execution policy (synchronous or asynchronous). Other results are however simpler if constrained deadlines can be assumed. For that reason, we first present the constrained deadline case before providing the general results in Section VII.
VI. CONSTRAINED DEADLINE CASE
In this section, we assume that deadlines are constrained (see Definition 5) s.t. the distinction between synchronous and asynchronous chains is irrelevant (since two instances of the same chain cannot be pending at the same time). We provide formulas for lower-priority as well as higher-priority interference. The general case where deadlines are arbitrary is dealt with in the next section.
A. Lower-Priority Interference
A first key property on segments that is used to bound the interference from lower priority task chains on σ a is that only one segment per lower priority chain may interfere in any σ a -busy-window.
Property 4: Suppose that deadlines are constrained. Let σ a and σ b be task chains s.t. π b < π a . In any σ a -busy-window, σ b executes at most one segment, possibly circular.
Proof: A task between two segments of σ b is such that π k b < π a , so after executing a segment of σ b , the task that follows the segment will be pre-empted until the end of the σ a -busy-window.
Proposition 4 only holds for constrained deadlines. If deadlines are arbitrary, for asynchronous chains, header segments of later instances can also execute. Proposition 4 allows us to bound by a constant the interference incurred on chain σ a by its lower priority chains, supposing that deadlines are constrained.
Theorem 4: Suppose that all deadlines are constrained and let σ a be a chain. In any σ a -busy window, the set of chains with a lower priority than σ a execute for at most
Proof: According to Proposition 4, a chain with lower priority than σ a can execute in a σ a -busy window at most one segment. According to Proposition 3, no two chains with lower priority than σ a can execute a nonhead segment in any σ abusy window. It follows that the largest interference due to the | p(a)| lower-priority task chains is the maximum among all combinations of 1 critical segment and | p(a)| − 1 head segments, which is formalized by (4) .
Example 4: In our running example, Fig. 1 shows the worst-case lower-priority interference on chain σ a (from σ b and σ c ).
B. Higher-Priority Interference
In [15] , higher-priority chains are conservatively assumed to interfere for their entire execution time when they are activated during the execution of a chain σ a . In fact, the exact interference of higher-priority chains is more complex and we can provide tighter bounds than this, as illustrated in the following example.
Example 5: Refer to Fig. 1 , and consider the interference of σ d on σ a . Even if π a < π d , the second activation of σ d cannot arrive before part of σ a has finished executing, and it will thus only partially interfere with σ a .
We exploit this observation in the following to propose tighter bounds on higher-priority interference. Throughout this section, we assume given two chains σ a and σ b s.t. π a < π b .
Notation 5: For a given chain σ b ∈ hp(a), we denote by τ t a (b) the last task of σ a that has lower priority than σ b . We denote τ t a the last task of chain σ a that has lower priority than all chains in hp(a).
The 
Proof: Let [t 1 , t 2 ] be a time interval of length that starts at the same time as a σ a -busy window and finishes at the end of the qth execution of τ i a , as in the definition of interference. 
where (4). 2) Compute an upper bound on the longest σ a -busywindow using (1) and derive from it an upper bound on the maximum number K a of activations of σ a in any σ a -busy-window as in Proposition 1. 3) For q ∈ {1, . . . , K a }, compute B a (q) as follows.
• Compute B t a a (q) by using (6) inside of (2). This involves a fixed point iteration which can be initialized with (q − 1) × C a + C a [1...i] + pI a , as usual.
• For i ∈ { t a + 1, . . . , n a }, initialize the fixed point iteration with B i−1 a (q) + C i , and then iteratively compute B i a (q) by using (6) inside (2).
Example 6: In our running example, Fig. 1 shows the worst-case higher-priority interference on chain σ a . Note that the second instance of σ d does not entirely interfere with σ a .
VII. ARBITRARY DEADLINE CASE
When deadlines are arbitrary, interferences depend on the nature of chains, i.e., whether they are synchronous or asynchronous. In this section, we first discuss our upper bound on higher-priority interference, which is a fairly simple generalization of our upper bound on higher-priority interference for constrained deadlines.
A. Higher-Priority Interference
Let us show how arbitrary deadlines affect our upper bound on higher-priority interference. 
.
Proof:
The difference with Proposition 6 is that now all activations of σ b that arrive after the completion of τ t a (b) a may interfere, for a duration obtained as in (6) .
B. Lower-Priority Interference
The bound on lower-priority interference that is given in (4) for the constrained deadline case can be easily adapted to take into account the fact that asynchronous lower-priority chains can interfere for more than one header based on their arrival function.
Theorem 7: The lower-priority interference on a chain σ a in any prefix of length of a σ a -busy window is bounded by
where (4), except that asynchronous chains can execute several headers. Note that if the critical segment is circular for an asynchronous chain σ b , then its header part is already included in the last term of the sum.
Following the same reasoning as the one presented for higher-priority interference, we can refine this upper bound by noticing that activations of asynchronous chains that arrive within the interval may not be able to execute their full header. We therefore use the finer-grained notion of q-event lower-priority interference, as opposed to the lower-priority interference used above, which considers interference at the chain level instead of the task level.
Theorem 8: The q-event lower-priority interference with respect to task τ i a for
Proof: As in (8), only one chain in hp(a) may execute an inner segment and all other chains execute their header. For asynchronous chains, several headers may interfere. The header segment s head c→a of an asynchronous, lower-priority chain σ c interferes with σ a exactly like a higher-priority chain does.
C. Self-Interference
In the case of asynchronous chains, self-interference self I i,q a ( ) is the interference of the header of σ a on σ a itself. This is the same principle as the header interference of lower priority chains on σ a . Note that two instances of the same task τ i a may be pending at the same time. In this case, we apply a FIFO policy.
Theorem 9:
If σ a is asynchronous, the q-event selfinterference of σ a on τ i a for i ≤ t a and
Proof: All activations of σ a , after the qth one, that may arrive before τ t a a has completed, can interfere up to the header of σ a on itself. Subsequent activations can only interfere less, depending on how early they may arrive and which tasks are guaranteed to have completed by then. The reasoning is similar to previous proofs.
VIII. PRECISION OF THE ANALYSIS
We now focus on quantifying the precision of the upper bound on the latency. For that, we discuss now how to compute, for each chain σ a , a lower bound a on its WCEL -not to be confused with its BCEL: a expresses that there exists an actual system execution 2 in which the observed WCEL of σ a is equal or larger than a . If a and a are close, this means that the computed upper bound is a good approximation of the worst-case behavior of the system. For simplicity, we assume first that all deadlines are constrained. has completed, they execute entirely before the qth instance of τ i a completes.
2) Consider i > t a (b).
•
(q)), the above argument still holds.
• The remaining case is η
Let k be as in (6) . Because we are now working with lower bounds on the actual completion time of tasks, τ k a is the first task in σ a that cannot finish its qth execution before the first activation of σ b after
. Proposition 7 indicates a strategy for finding lower bounds on worst-case latencies, assuming that there is a way to lower bound lower-priority interference.
Corollary 1: If there exists a σ a -busy-window as in the above property such that the q-event interference of lowerpriority chains is lower bounded by pI
is a lower bound on the q-event busy time of task τ i a . Proof: Assume that σ a executes for its worst-case execution time. It cannot complete its qth instance of τ i a before t 1 +B i a (q) as it suffers at least pI This result naturally extends to lower bounds on latencies if we further assume that chain σ a is activated at t 1 and then again as early as permitted by its activation model. To compute these lower bounds, as higher priority interference is exact, we need to provide a lower bound pI i,q a (B i a (q)) on lower priority interference.
The rest of this section will be is devoted to computing pI i,q a . The principle is to exhibit a scenario that is guaranteed to be feasible, and to compute the lower bound pI i,q a from this scenario. We propose in fact to take the maximum over several scenarios, resulting in several lower bounds
A. All Task Chains Are Sporadic
Let us start by assuming that all task chains are sporadicand remember that we assume so far that deadlines are constrained. Indeed, the sporadic case is the simpler one for computing lower bounds. In order to use Corollary 1 we must provide a lower bound for lower-priority interference for σ a -busy-windows [t 1 , t 2 ] with the properties required in Corollary 1, in particular those from Proposition 7.
• Chain σ a is activated at t 1 , then again as early as permitted by its activation model (such that the result extends to lower bounds on latencies).
• The same holds for all chains in hp(a) (for Proposition 7 to hold).
• All task executions in chains of hpe(a) take their worstcase execution time to complete. This leaves only room for specifying the activation scenario of lower-priority chains. To do so, we distinguish two cases, presented in the following paragraphs.
1) Critical Segment Maximizing (4) Is Not Circular:
This is the most intuitive scenario.
Theorem 10: Suppose that the critical segment that maximizes (4) is not circular. Then (4) is also a lower bound on the worst-case lower-priority interference of chain σ a under the above conditions.
Proof: Denote σ crit the chain such that its critical segment maximizes pI a in (4) and assume that this critical segment is not circular. Consider now the following activation scenario for lower-priority chains. All chains in p(a) are activated at t 1 , except chain σ crit that is activated at t 1 − C crit[1...ξ crit ] , where ξ crit is the index of the task that precedes the critical segment. We further assume that there is no other activation of any chain before t 1 , which is a feasible scenario since all chains are sporadic, and that all chains in p(a) always run for their worst-case execution time.
In such a scenario, the header of each chain in p(a)\{σ crit } is activated at t 1 , as well as the critical segment of σ crit . All tasks in these segments have higher priority than τ t a a and are thus guaranteed to execute within any prefix of [t 1 , t 2 ] that finishes with the completion of a task τ i a for i > t a . Hence, the result.
In this case the upper bound on the latency is reachable and thus the analysis is tight. (4) (1) . Unfortunately, only rather naive sufficient conditions will scale.
2) Critical Segment Maximizing
• Option 2 finding other lower bounds on the WCEL. In the following, we investigate the latter option.
Definition 19 (Noncircular Critical Segment s nocirc b→a ):
The noncircular critical segment of σ b with respect to σ a , is the noncircular segment of σ b with respect to σ a that has the largest worst-case execution time.
Theorem 11: When the critical segment is circular, a lower bound on the lower-priority interference on σ a is
Proof: The reasoning in the proof of Theorem 10 holds for any lower-priority chain, not only σ crit . This theorem is thus a direct application of the same principle.
In order to reduce the difference between lower and upper bound further, one could also investigate whether the proposed upper bound can be improved. This requires a much finer-grained analysis, as the worst-case lower-priority interference may not coincide with the worst-case higherpriority interference scenario we have been working with so far. Still, our experiments show that, in most cases, pI a (1) is close to pI a .
B. At Least One Task Chain Is Periodic
Interestingly, the periodic case is the most complex one when computing lower bounds. The reason is that one cannot assume that there are no activations other than the one from σ crit before t 1 . Finding an alternative scenario that takes into account the constraints induced by the periodic activations is far from trivial. We therefore prefer to rely on the simpler scenario where all chains are activated at the same instant as σ a , and hence all chains in p(a) interfere with σ a with their head segment. In other words, there is no critical chain anymore. This scenario yields the following lower bound on the blocking time.
Theorem 12: When at least one chain is periodic, a lower bound on the lower-priority interference on σ a is
Proof: The proof proceeds similarly to that of Theorem 10.
C. Lifting the Constrained Deadline Hypothesis
In this section, we briefly sketch how the results under the constrained deadline assumptions generalize to the general case of arbitrary deadlines.
• Proposition 7 and Corollary 1 can be directly adapted for higher-priority interference of chains with arbitrary deadlines by using the definitions of Section VII.
• Regarding low-priority interference, we have the same problem as in Section VIII-A2 for all σ crit that are asynchronous, i.e., we do not know exactly how early a subsequent activation may arrive. Still, we can use the lower bound provided in Theorem 12.
IX. EXPERIMENTAL EVALUATION
We now describe how we generated our test cases and evaluated our latency analysis. Our analysis is implemented inside standalone Python tool. We have generated systems with at least one periodic and one sporadic chain. Utilization breakdowns are randomly chosen amongst the following values: [0.4, 0.5, 0.6, 0.7] and the utilization breakdown of sporadic chains is chosen amongst: [0.001, 0.01, 0.1] (utilization breakdown of periodic chains is the difference between both previous values). We randomly choose the number of chains between 2 and 9 and the number of periodic chains between 1 and 8 (there is at least one periodic task chain). We deduce the number of sporadic task chains. The number of tasks per chain is also a random value between 1 and 9. For periodic task chains, the periods are randomly chosen amongst [10, 20, 50 , 100, 200, 500, 1000]. The above parameters have been chosen as they are considered for many industrial cases [10] . Chain and task utilizations are generated using U-Unifast [6] . The WCET of each chain is deduced using periodic utilization and periods. For sporadic chains, the WCET is a random value between 1 and 100 and the arrival function is defined s.t. it fits the sporadic utilization for δ − (100) (which we consider large enough). Priorities of all tasks are randomly assigned. Altogether, the generated systems contain 5538 chains. These chains have been analyzed first under the synchronous semantics (Fig. 2) and then under the asynchronous semantics (Fig. 3) , except for those chains that took too long to analyze (which explains why Fig. 3 has only 4148 chains).
We have evaluated two criteria. First, we have compared our analysis to [15] , which makes some approximations in the computation of both higher-priority and lower-priority interference; we did not compare our analysis with that of [8] because, for synchronous chains the two analyses coincide, and for asynchronous chains [8] does not work. Second, we have evaluated the precision of our bounds as given by the computed lower bounds.
Figs. 2 and 3 report the experimental results, respectively, for synchronous and asynchronous chains. To each chain correspond three points in Fig. 2 , our lower bound (green triangle) lower or equal to our upper bound (red circle) that is also lower or equal to the upper bound computed using the method proposed in RTAS 2016 (blue rhombus). In Fig. 3 , each chain has two points, our upper bound (red square) that is lower or equal to the upper bound proposed in RTAS 2016 (blue rhombus). For readability, all generated chains are sorted according to our computed upper bound on their WCEL. The exponential shape of our graphs results from our generation methods for sporadic chain activations, which may be bursty up to δ − (100). Due to space limitations, Figs. 2 and 3 illustrate our results only for the first group of systems. The trend is similar for both groups of systems, but the precision (i.e., the difference between upper and lower bound) is better for the sporadic case (not shown), as was to be expected from the theoretical results. For systems with periodic and sporadic chains, however, the relative difference between the lower and the upper bound can be large. Additional experiments have shown that the number of tasks per chain, the number of chains in the system, the length of the task chains, or the utilization breakdown do not significantly influence the precision of the obtained bounds.
Since lower bounds can also be obtained by simulating the system, we have simulated the system of Fig. 1 on 1 000 000 randomly generated activation scenarios, and measured the simulated lower bound for each of its four chains. Table I summarizes the results. For σ a and σ b , the simulated lower bound is identical to the computed one. For σ c and σ d , the computed lower bound improves the simulated one, respectively, by 95% and 71%. Since these results were obtained for a single system, more experiments are required to compare the simulated and the computed lower bounds. Fig. 4 depicts the evolution of the simulated lower bound in function of the number of activation scenarios. Table II reports an equivalent simulation but for a tight system consisting of six chains, again simulated over 1 000 000 randomly generated activation scenarios. The lower bound obtained by simulation is sometimes tight (e.g., for chains σ a and σ e ), but otherwise it can be very far from the computed lower bound (e.g., for chain σ b ). All this demonstrates the usefulness of our analysis. In both tables, the percentages are computed as computed i − simulated i simulated i × 100.
In summary, our experimental results show the following.
• Our upper bound analysis significantly improves over [15] .
• In most cases, our lower bound analysis is able to guarantee that the computed upper bound is fairly tight.
• There are, however, quite a few instances for which upper and lower bounds differ significantly. This underlines the value of such an information.
• The lower bounds obtained by simulations are sometimes significantly less than the ones computed by our analysis.
X. CONCLUSION
In this paper, we propose an improved performance analysis technique allowing the computation of tighter upper bounds for task chain latencies in uniprocessor systems compared to the state of the art, and providing an innovative approach to assess the quality of the computed bounds by comparing them to lower worst-case bounds from feasible execution scenarios. We also present a set of experiments that show the gain obtained in term of analysis precision when using our solution.
We believe that our analysis represents an important step toward the acceptance of performance analysis techniques in the industrial design process of real-time embedded systems. One should take notice that a major reason hindering the use of performance analysis in the industry is not only the overdimensioning induced by the various approximations used in current analyses but also the lack of methods to quantify it.
Future work will extend our solution to make it applicable to more complex industrial real-time systems by adding offsets, equal priorities, and support for multiprocessor systems. In addition, the analysis technique we propose in this paper also represents an important step toward the computation of task chain latencies in multiprocessor systems. Very often in industrial multiprocessor systems, e.g., in software defined radios, after finishing its execution, a task will only activate the next task in the chain in case both are mapped to the same processor. When they are mapped to different processors, the activation of the next task is instead independent from the termination of the first task: the first task writes its output data in a memory, which are then read by the next task upon activation. The computation of the WCEL for such task chains will require using our analysis technique to compute the latency of the subchains on each processor, combined with a mechanism to analyze cause-effect chains between processors.
