The design of automotive electronic systems needs to address a variety of important objectives, including safety, performance, fault tolerance, reliability, security, extensibility, etc. To obtain a feasible design, timing constraints must be satised and latencies of certain functional paths should not exceed their deadlines. From functionality perspective, soft errors caused by transient or intermittent faults need to be detected and recovered with fault tolerance techniques. Moreover, during the lifetime of a vehicle design or even the same car, updates are often needed to add new features or x bugs in existing ones. It is therefore critical to improve the design extensibility for accommodating such updates without incurring major redesign and re-verication cost. In this work, we discuss the metrics for measuring latency, fault tolerance and extensibility, and present a simulated annealing based algorithm to search the design space with respect to them. Experimental results on industrial and synthetic examples demonstrate clear trade-os among these objectives, and hence the importance of quantitatively analyzing such trade-os and exploring the design space with automation tools.
INTRODUCTION
The design of automotive electronic systems has become increasingly challenging due to large design space and stringent design requirements. The development of autonomous and semi-autonomous features, as well as vehicle connectivity functionality, requires more complex automotive software and hardware. In addition, the underlying architecture platform is shifting from the traditional federated architecture, where each function is deployed to one ECU (Electronic Control Unit) and provided as a black-box by a Tier-1 supplier, to the integrated architecture, where one function can be distributed over multiple ECUs and multiple functions can be supported by one ECU.
Model-based design (MBD) methodology has been proposed to address the deign challenges in complex systems such as vehicles and avionic systems [13, 14] . In MBD, system functionality is rst captured with formal or semi-formal models for early-stage analysis and validation. These functional models are then mapped onto an architectural platform (often also captured with models) for software or hardware implementation. For automotive electronic systems, this mapping/synthesis process involves generating software tasks from functional models (sometimes through another layer of runnables), allocating tasks onto ECUs connected with buses (such as CAN [7, 15, 22] or FlexRay [3, 16] ), and scheduling the execution of tasks and the transmission of bus messages ( Figure 1 ). During this process, a variety of design objectives, such as safety, performance, fault tolerance, reliability, security and extensibility, need to be addressed. Extensibility: A major challenge in vehicle design is to cope with software and hardware evolutions over the lifetime of a design or across multiple versions in the same product family or even for the same car. Updates such as adding new application software, reallocating some software among ECUs, or adding a new ECU are needed to x bugs and provide new functionality. Due to the fast development of automotive applications, such updates (especially software updates) are expected to be more frequent. For instance, Tesla has already been able to carry out regular software updates over-the-air since version 8.1 [23] .
However, small changes in software and hardware may cause big and unexpected changes in system timing and functionality. It is often necessary to re-verify and re-certify the entire system, which could lead to prohibitively expensive costs and undermine system availability and reliability. Therefore, it is important to improve the extensibility of designs so that future updates can be accommodated without incurring major redesign and re-verication cost. This is a challenging goal, especially due to the sharing and contention of software functions over limited computation and communication resources.
Fault tolerance: Soft errors caused by transient or intermittent faults have become a major design concern, because of the continuous scaling of technology, high energy cosmic particles and radiation from the application environment [2, 24] . Fault tolerance techniques are greatly needed to detect and recover errors such as application crashes, illegal branches and silent data corruption. In this paper, similarly as in [8, 26] , we focus on two categories of error detection techniques: embedded error detection (EED) and explicit output comparison (EOC). More specically, EED includes a variety of error detection techniques such as instruction signature checking, control ow check (CFC) and watchdog timers [17] . EOC detects errors through explicit redundancy of task execution. For instance, the same program can be executed twice and output mismatch indicates occurred error(s) [8] . Choosing EOC or EED techniques for specic tasks could signicantly improve system's ability to tolerate soft errors.
In addition to extensibility and fault tolerance, there are often timing constraints that must be satised to ensure functional correctness and system safety, such as task execution deadlines, message transmission deadlines, and latency deadlines along functional paths.
In this work, we discuss the metrics for measuring extensibility, fault tolerance and latency, optimize them in task allocation and scheduling, and analyze their trade-os. We consider automotive systems that are based on CAN, the prevalent bus protocol currently in vehicles. For tasks on the same ECU, the communication is through local memory and very fast. For tasks on dierent ECUs, the communication is through CAN bus messages/frames and the transmission time is much longer.
Intuitively, maximizing extensibility or fault tolerance may lead to a more "balanced" task allocation and therefore more bus messages and longer path latencies. To quantitatively evaluate such trade-os, we rst dene timing models for software tasks, messages and schedulability constraints, and metrics for extensibility and fault tolerance. We then optimize these metrics with a simulated annealing based approach, and conduct experiments with industrial and synthetic examples.
The rest of the paper is organized as follows. In Section 2, we discuss previous work on extensibility and fault tolerance. In Section 3, we introduce our system models on timing/latency, extensibility and fault tolerance. In Section 4, we demonstrate the trade-os among these objectives with an illustrating example, and then introduce our simulated annealing-based algorithm for optimizing them. We present experimental results and discuss our ndings in Section 5, and conclude the paper in Section 6.
RELATED WORK
In the literature, a number of studies have addressed robustness, scalability, exibility and extensibility of real-time embedded system. The notions of these objectives could sometimes be obfuscated since they all relate to system's capability of accommodating changes (which could come from variations or updates). For instance, in [25] , scalability refers to how well a system can handle task execution time increases. In [1] , exibility describes system's ability to add additional tasks without impeding existing ones.
Various viewpoints and denitions have also been proposed for system extensibility. In [25] , Yerraballi et al. develop a method to nd an optimal execution time scaling factor for all tasks in a given subset while ensuring system schedulability. In [21] , novel denitions of sustainability and extensibility for FlexRay-based communication systems are presented, which can then be combined with CAN-based system. In [10] , although no formal denition of extensibility is provided, an original approach utilizing contractbased design is proposed to negotiate among contracts for software updates. In this work, we adopt the task-level extensibility metric from [27] , which measures how much task execution time can be increased without violating design constraints.
Regarding fault tolerance, many error detection techniques, such as triple modular redundancy, watchdog timers and instruction signature checking, have been proposed [4, 5, 11, 12, [17] [18] [19] [20] . For instance, in [12, 20] , Izosimov et al. employ process re-execution and replication to tolerate transient faults and then extend their algorithm by checkpointing with rollback recovery. In [11] , they develop a heuristic algorithm to trade-o between hardware hardening and re-execution in software. In [4, 5] , Burns et al. propose schedulability analysis and priority assignment with embedded error detection techniques.
In our previous work [26] , we formulate the impact of EOC and EED on system timing for dierent platform congurations. An MILP (mixed integer linear programming) model is then developed to explore task allocation and scheduling, together with the selections of error detection techniques for individual tasks.
SYSTEM MODEL
In our system model, the CAN-based architectural platform includes a set of p ECUs E = {e 1 , e 2 , . . . , e p } connected through a CAN bus. The functional model is represented as a task graph G = {T , S}, where T = { 1 , 2 , . . . , n } is the set of tasks and S = {s 1 , s 2 , . . . , s m } is the set of signals that impose data dependency and execution order among tasks.
We assume all tasks are invoked periodically and scheduled based on static priorities with preemptions allowed. Each task i has its own activation period T i , worst-case execution time (WCET) c i and priority p i . If two tasks are allocated to the same ECU, signals are transmitted through local memory and we assume the communication delay is negligible. If two dependent tasks are mapped onto dierent ECUs, data will be exchanged through messages/frames on the CAN bus. The set of CAN messages is denoted
In the task graph, a path is an interleaving sequence of tasks and signals denoted as p = [ r 1 , s r 1 , r 2 , s r 2 , . . . , s r k 1 , r k ]. r 1 , the source node of the path, is usually triggered by external events such as sensor inputs. The sink node r k is often the task that activates actuators. It is possible that multiple paths exist between a source task and a sink task.
Worst-case End-to-end Path Latency
We dene worst case end-to-end latency l p of a path p as the maximum time delay needed for the input changes on the source node to be propagated to the outputs of the sink node. To ensure system safety and performance, a deadline d p may be imposed on l p , i.e. l p  d p . The computation of l p requires the computation of worst-case response time for tasks and messages along the path, as explained in below. Task worst-case response time: In our model, tasks running on the same ECU are scheduled based on static priorities with preemptions (commonly supported by OSEK standard and its derivatives). The execution of a task is subject to the interferences from higher priority tasks on the same ECU. Therefore, the worst-case response time r i of a task i , which represents the longest time delay needed to complete the task after its activation, can be calculated as follows (similarly as in [9, 28] ):
where hp( i ) denotes the set of higher priority tasks on the same ECU. The second term represents the interferences from these higher priority tasks within the response time. This formula can be solved with an iterative numerical method. Message worst-case response time: In our model, when two tasks communicating through signals are allocated to dierent ECUs, their communication signals are packed into messages and transmitted over the CAN bus. We further assume each signal s i is mapped to its own message m i . The transmission delays of these messages contribute signicantly to the path latencies, and can be calculated similarly as tasks. Slightly dierent from the preemptive task scheduling policy though, CAN bus employs a xed priority non-preemptive scheduling. Thus, a CAN message may suer from additional blocking delay caused by lower priority messages, which can be approximated with the largest possible transmission time among all messages transfered on the same CAN bus. Equation (2) below is the formula for calculating message worst-case response time r m i , where B max is the largest blocking time and c m i is the worst-case transmission time of the messages.
Path latency: The worst-case end-to-end path latency l p of path p is the summation of the periods and worst-case response times of all tasks and global signals (i.e., signals that are packed into CAN messages) on the path, as shown below in Equation (3). GS is the set of global signals. Note that in our model, a global signal has the same worst-case response time as its corresponding message, i.e. r s i = r m i . The periods are taken into account because of the asynchronous communication nature.
Schedulability: In this work, a system is schedulable if all the timing constraints shown below in (4) to (6) are met. Constraint (4) ensures that the response time of every task is not greater than its deadline, which equals to its period in our model. Similarly, Constraint (5) ensures that every message is transmitted within its period. Constraint (6) ensures that the end-to-end latency of every path will not exceed its deadline.
Task Level Extensibility
We adopt the task level extensibility metric from [27] , which measures how much task WCET can be increased without violating design constraints. More specically, we calculate system extensibility as the weighted sum of each task's maximum possible increase of its WCET:
where w i is a predetermined value that indicates how likely a task's WCET might be increased in future updates. c i is the maximum possible increase of task WCET c i without violating design constraints (i.e., schedulability constraints (4) to (6) in this work), while all other system congurations remain unchanged. A binary search based algorithm is used to compute the extensibility, as shown in Algorithm 1. In this algorithm, E denotes the system extensibility and is initialized to zero. For every task i , we use binary search to calculate how much its WCET c i can be increased, as shown from line 2 to line 11. During the binary search, the lower bound lb is initially set to 1, representing the normalized factor with respect to the original WCET; while the upper bound ub is initially set to T i /c i , representing the normalized factor with Algorithm 1: System Extensibility Computation
while
. If the system is schedulable, we continue search the upper half (i.e., trying larger value for the execution time), otherwise we search the lower half. The system extensibility is the weighted sum of all tasks.
Soft Error Tolerance Model
We consider two major soft error detection techniques, i.e. embedded error detection (EED) and explicit output comparison (EOC). Usually, EED covers part of the total errors with additional computation overhead (which depends on specic application and implementation method). For instance, state-of-the-art CFC techniques may cover 70% of total errors. EOC can achieve almost 100% error detection at the cost of 100% execution time overhead (temporal redundancy) or 100% resource overhead (spatial redundancy). In this work, we assume EOC detection rate is 100%, similarly as in [26] .
System error coverage: During the hyperperiod T h per of a task set T (i.e., the least common multiple of the task periods), a total number of K 0 errors may occur. System error coverage is then dened as the probability that all errors are either i) detected and recovered within hyperperiod while all timing constraints are satised or ii) happened during idle time [26] .
Let t eoc , t eed , t none denote the accumulative time needed by tasks employing EOC, EED and no error detection technique, respectively. t idle denotes the total idle time. An exact analysis of system error coverage depends on the specic error occurrence prole and timing pattern, and is hard to capture with a closed form formulation. For simplicity, on a single ECU, we assume that K arbitrary errors of uniform distribution may occur during a hyperperiod. The system error coverage P is then approximated as:
where and represent the error detection rate of EED and EOC, respectively.
Task execution time with error detection and recovery: As we mentioned, EED and EOC come with additional computation overhead. We characterize a task using error detection technique with C dec i , which denotes the time for execution and error detection of task i . For EOC, C dec i = 2C i + i if a temporal redundancy approach is used. i denotes the time for comparing outputs. If we duplicate the execution of same task on dierent cores (i.e. a spatial redundancy approach), C dec i = C i + i . For EED, since we run a task with built-in detection, C dec i = C i + C i , where C i is the increased timing cost for EED. C r ec i is the error recovery time for task i if the error is detected. We assume the re-execution of a task is scheduled immediately if error(s) is detected. C r ec i = C i for EOC, while C r ec i = C i + C i for EED.
Worst-case response time analysis with error detection:
To analyze the response time of task with EED/EOC technique, we need to integrate error detection time and recovery time into Equation (1). For this, we employ two binary i and o i to distinguish error detection strategy. i is 1 if either EED or EOC is employed for task i and 0 otherwise. o i is 1 if EOC if used, and 0 if EED is used. We rewrite C dec i and C r ec i as following:
Let r i , j denote worst-case response time for task i when error(s) occurs during the execution of task j . Task i will be blocked by task j 's recovery time if j has higher priority. Follow the same idea of Equation (1), we have:
where p i , j denotes the relative priority between task i and j . Considering a complete task with K errors, the response time of task i can be formulated as:
where Boolean variable a i ,e l is 1 if task i is assigned to core e l and 0 otherwise. h i , j ,e l is 1 if task i and j are on the same core e l and 0 otherwise. To ensure each tasks is only mapped to one ECU, the following relations must be enforced:
OPTIMIZATION AND TRADE-OFFS AMONG OBJECTIVES
Based on the models introduced in Section 3, we quantitatively analyze the trade-os among extensibility, fault tolerance and latency (as well as other metrics related to communication cost, such as the number of CAN messages and the bus utilization). In this section, we will rst demonstrate such trade-os with an illustrating example, and then introduce a simulated annealing based algorithm for optimizing these dierent design objectives/metrics. Figure 2 shows how the mapping of three tasks onto a CAN-based platform with two ECUs can aect system extensibility and communication cost (measured by the number of CAN messages or path latencies). The task graph, task WCETs and periods are shown on the left side. Task 1 and 2 send their output to task 3 .
Illustrating Example
In mapping (a), task 1 and 3 are mapped to the same ECU-A, while 2 is mapped to ECU-B. We assign higher priority to 3 than 1 , based on the Rate Monotonic policy. Although there is still 16.7% utilization left on ECU-A, this mapping makes it impossible to increase the execution time for either 1 or 3 , i.e., their extensibility is zero. On ECU-B, task 2 can increase its WCET by 2 time unit. Thus, the total system extensibility is 2/3/3 = 22.2%. In terms of communication, only message m 2 needs to be transmitted over CAN bus and the latency on path 1 to 3 should be relatively short.
In mapping (b), task 2 and 3 are swapped. On ECU-A, task 1 and 2 have the same period and WCET, and we assume 1 has the higher priority. The maximum increase of WCET for either task is 1 time unit. The system extensibility is calculate as (1/3 + 1/3 + 1/2)/3 = 38.9%. In terms of communication, messages m 1 and m 2 need to be transmitted over the CAN bus, and the latency on path 1 to 3 is also higher than mapping (a).
In this example, we clearly see the trade-o between extensibility and communication cost (i.e., latency or number of messages). Next, we will introduce how we can optimize these design objectives.
Objective Function and Constraints
We optimize an objective function (16) that includes extensibility, fault tolerance and communication cost, by exploring allocation, priority assignment and error detection technique (EED or EOC or none detection) for each task. The communication cost could be measured by total path latency, number of CAN messages or bus utilization.
Cost ext , Cost f t , Cost com in (16) are costs for extensibility, fault tolerance and communication, respectively. For instance, Cost ext = 1 E, where E is the system extensibility in (7) . Note that the higher the extensibility, the lower the cost Cost ext is. , µ and are weights and can be tuned to trade o these objectives.
The optimization is subject to the schedulability constraints in (4) to (6) , and possible constraint on each design objective. For instance, there could be upper bounds EXT max , FT max and COM max on each cost, as shown below.
Simulated Annealing
We developed a simulated annealing based algorithm for the above optimization, as shown in Algorithm 2. For the initial conguration, tasks are randomly allocated to ECUs and scheduled using the Rate Monotonic policy. T represents current simulation temperature, T ⇤ is the nal temperature and is the cooling factor. K ⇤ is the maximum number of iterations within each temperature. During each iteration, function randomChan e modies current solution A cur into a candidate solution A new by randomly performing one of the following operations: i) changing the allocation of a task from one ECU to another, ii) swapping the priorities of two tasks on the same ECU, or iii) changing the error detection technique of a task. Function ComputeObjecti e (A new ) then computes corresponding cost C new as dened in (16) , and function checkSched (A new ) determines the schedulability of this candidate solution. Note that if the candidate solution is infeasible, we add a penalty to the cost instead of rejecting the solution directly. Function P (C cur , C new ,T ) then computes the acceptance possibility of A new . A cur and C cur keep track of the latest solution and its cost. A opt stores the best solution. The simulated annealing procedure stops when temperature reaches the predened value T ⇤ . Algorithm 2: Simulated Annealing for Task Allocation, Scheduling and Error Detection Technique Selection 1: Construct initial conguration. 2: while T T ⇤ do 3:
A new = randomChange(A cur ) 5:
if C new < C cur then 10:
A cur = A new , C cur = C new 11:
if isSched = true then 12:
A opt = A cur 13:
EXPERIMENTAL RESULTS
We conducted experiments on an industrial case and a set of synthetic examples. The industrial case is derived from an experimental vehicle subsystem and contains 41 tasks communicating through 81 signals. The subsystem involves distributed functions collecting data from 360 sensors to actuators. All the periods and task WCETs are given in the industrial case. We also use the TGFF tool [6] to generate a set of synthetic examples with random periods and WCETs. We impose end-to-end latency deadlines on selected critical paths. The examples are tested for a number of dierent platform congurations. For the industrial case, a minimum of 5 ECUs is needed for nding feasible solutions.
Extensibility vs. Latency
We rst explore the trade-o between extensibility and communication cost, which is measured by the total critical path latency (i.e., the sum of end-to-end latencies for all selected critical paths). We conduct optimizations using our simulated annealing approach (Algorithm 2). More specically, for extensibility optimization, we set = 1 and µ = = 0 in the objective function (16) . For latency optimization, we set = µ = 0 and = 1. We carry out these optimizations for platform congurations containing 5 to 10 ECUs, and record both extensibility and latency for each optimization. The comparison results for the industrial case are shown in Figure 3 , with yellow bars on the right in both sub-gures representing extensibility optimization results and blue bars on the left in both sub-gures representing latency optimization results.
We can clearly see a trade-o between extensibility and latency. Extensibility optimization indeed signicantly improves the extensibility metric over latency optimization, but leads to longer total latency. Intuitively, optimizing extensibility leads to more "balanced" allocation of tasks on ECUs and thus more CAN messages (rather than communication through local memory) and longer total latency. In our experiments, extensibility optimization results have over 70 signals mapped to CAN messages while latency optimization results only have 10-20 messages. Furthermore, we can see that the extensibility increases with more ECUs. This is as expected since the tasks get more timing slacks with lower average ECU utilization.
Error Coverage vs. Latency
We then explore the trade-o between error coverage (fault tolerance) and latency. We employ EED and the temporal redundancy model of EOC as described in Section 3.3, and we set = = 0 and µ = 1 for error coverage optimization. Figure 4 shows the comparison between error coverage optimization (yellow bars on the right) and latency optimization (blue bars on the left) for the industrial case. The trade-o between the two is also very clear. Intuitively, more balanced allocation of tasks leads to more timing slacks for tasks to add error detection techniques, but results in longer latency. We can also see the error coverage increases with more ECUs.
Extensibility vs. Error Coverage
We study the relation between extensibility and error coverage, by mapping the industrial case onto a platform of 5 ECUs with an average ECU utilization of 58%. Note that extensibility and error coverage are not always mutually exclusive. Both metrics get better when more time slack is available. However, applying the slack to error detection techniques does take away some capability to accommodate future changes. Table 1 shows the trade-o between extensibility and error coverage, when error coverage optimization is performed with minimum extensibility set to 0, 0.1, 0.2, 0.3, 0.4 and 0.5, respectively. 
Optimization of All Three Objectives
We then optimize all three objectives (extensibility, error coverage / fault tolerance, latency / communication cost) by setting = µ = = 1 in (16). Cost ext , Cost f t and Cost com are all normalized. In particular, Cost ext = 1 E. Cost f t is as dened in Equation (8) . Cost com represents the cost of total latency, and is dened as Cost com = (Lat Lat lb )/(Lat ub Lat lb ). Lat = P p k 2P l p k is the total latency, Lat lb = P p k 2P P i 2p k (T i +c i ) is a lower bound for the total latency, and Lat max = 2⇤ P p k 2P (
is an upper bound. Table 2 : System total critical path latency, extensibility and error coverage in the solutions from optimizing each individual objective and from optimizing all three objectives (industrial case). As shown in Table 2 , optimizing all three objectives provides more balanced solutions, when compared with optimizing for each individual objective. Such results are not surprising qualitatively, but the quantitative comparison should facilitate designers to make design choices.
Impact of ECU Speed and Number of Tasks
Finally, we study how extensibility is aected by the ECU computation speed and the number of tasks. We generate a set of synthetic examples with dierent number of tasks and map them to a platform with 5 ECUs. We scale all task WCETs by a factor of 1X, 1.5X and 2X to model dierent ECU computation speed while task periods remain unchanged. Figure 5 demonstrates the quantitative impact of ECU speed and number of tasks on system extensibility. 
CONCLUSION
In this work, we quantitatively analyze the trade-os among extensibility, fault tolerance and latency for CAN-based automotive electronic systems. We introduce metrics for dening these three objectives and present a simulated annealing based algorithm for optimizing them. The clear trade-os among these objectives demonstrate the need to develop design automation methods for facilitating the design space exploration in automotive systems.
