Embedded automotive applications such as drive-by-wire in cars require dependable interaction between various sensors, processors, and actuators. This paper addresses the design of low-cost communication networks guaranteeing to meet both the performance and fault-tolerance requirements of such distributed applications. We develop a fault-tolerant allocation and scheduling method which maps messages on to a lowcost multiple-bus system to ensure predictable interprocessor communication. The proposed method targets time-division multiple access (TDMA) communication protocols. Finally, we present a case study using some advanced automotive control applications to show that our approach uses the available network bandwidth efficiently to guarantee message deadlines.
INTRODUCTION
Embedded computers are being increasingly used in automobiles to replace safety-critical mechanical and hydraulic systems. Drive-by-wire is one example where traditional hydraulic steering and braking are replaced by a networked microprocessor-controlled electromechanical system [1] . Sensors measure the steeringwheel angle and brake-pedal position, and processors calculate the desired road-wheel and braking parameters which are then applied via electro-mechanical actuators at the wheels. Other computerized vehicle-control applications including adaptive cruise control, collision avoidance, and autonomous driving are also being developed [2] . These applications will be realized as realtime distributed systems requiring dependable and timely interaction between sensors, processors, and actuators. This paper addresses the design of low-cost communication networks to meet both the performance and fault-tolerance requirements of such applications.
The approach described in this paper synthesizes a faulttolerant (FT) network topology from application requirements. While synthesis methods such as [3] assume an underlying CAN communication protocol and arbitrate bus access using message (processor) priorities, we target TDMA communication protocols where processors are allotted transmission slots according to a static, periodic, and global communication schedule [4] . Examples include TTP [5] and FlexRay [6] that have recently emerged as possible networking standards for in-vehicle networks.
We restrict the network topology space to multiple-bus systems such as the one in Fig. 1 where each processor P i connects to a subset of the communication buses. A co-processor handles message communication independently without interfering with task execution on P i . A multiple-bus topology allows fault-tolerant message allocation. Also, since communication protocols for the embedded systems of interest are typically implemented over low-cost physical media, individual buses have limited bandwidth. Therefore, multiple buses may be needed to accommodate the message load.
Given a set of distributed applications modeled as task graphs {G i }, the proposed approach generates a communication network satisfying both the performance and fault-tolerance requirements of each G i . Messages are allocated and scheduled on the minimum number of buses {B j } where each B j has a specified bandwidth. The major features of our approach are as follows:
• It assumes a multi-rate system where each graph G i may have a different execution period period(G i ).
• It targets a TDMA communication protocol. • It supports dependable message communication by establishing redundant transmission paths between processors, thereby tolerating a bounded number of permanent bus failures.
• It uses network bandwidth efficiently by reusing transmission slots allotted to a processor between the multiple messages sent by it.
Finally, using representative distributed automotive control applications, we show that the proposed method guarantees predictable message transmission while reducing bandwidth utilization.
The rest of this paper is organized as follows. Section 2 presents an overview of the proposed approach, while Section 3 discusses some preliminaries including task scheduling. The message allocation method is developed in Section 4, and Section 5 presents the case study. We conclude the paper in Section 6.
DESIGN FLOW
As the primary objective, we construct a network topology meeting the fault-tolerance and performance goals of the embedded applications. The secondary objective is to minimize hardware cost in terms of communication buses. An heuristic method is developed where a feasible network topology satisfying performance goals is first obtained. Its cost is then reduced via a series of steps which minimize the number of buses by appropriately grouping (clustering) messages while preserving the feasibility of the original solution.
The main steps of the proposed design approach are as follows. For a given allocation of task to processors {P i }, the corresponding inter-processor messages are mapped to a low-cost network topology comprising identical buses {B j }. Redundant routes are provided for messages with specific fault-tolerance requirements; for a k-fault-tolerant (k-FT) message m i , k replicas or copies are allocated to separate buses. The network is synthesized assuming a generic TDMA protocol, and can be modified to accommodate specific cases such as TTP and FlexRay.
We assume that each task graph G i must meet its deadline by the end of its period period (G i Fig. 1 , the message transmission schedule must be compact enough to fit within the available memory.
The proposed clustering approach also uses bus bandwidth efficiently by sharing or re-using transmission slots between multiple messages sent by a processor whenever possible. Each message cluster is allocated to a separate bus in the final topology.
PRELIMINARIES
This section shows how to obtain the initial solution where tasks are assigned deadlines and scheduled on processors, and messages allocated to separate communication buses. We use the approach of [7] which maximizes the slack added to each task in graph G i while still satisfying its deadline D i .
We now describe the deadline distribution algorithm. Entry and exit tasks in the graph are first assigned release times and deadlines. A path path i through G i comprises one or more tasks {T i }; the slack available for distribution to these tasks is
where D i is the deadline of path i and c i the execution time of a task T i along this path. The distribution heuristic in [7] maximizes the minimum slack added to each T i along path i by dividing slack i equally among tasks. During each iteration through G i , path i minimizing n slack i , where n denotes the number of tasks along path i , is chosen and the corresponding slack added to each task along that path. The deadlines (release times) of the predecessors (successors) of tasks belonging to path i are updated. Tasks along path i are then removed from the original graph, and the above process is repeated until all tasks are assigned release times and deadlines.
We use the graph in Fig. 2 Task Scheduling: Once the scheduling ranges of tasks in the graph are fixed, each T i may now be considered independent with release time r i and deadline d i , and scheduled as such. To tackle multi-rate systems, we use fixed-priority scheduling where tasks are first assigned priorities according to their periods [8] , and at any time instant, the processor executes the highest-priority ready task. Again, the schedule is feasible if all tasks finish before their deadlines. Feasibility analysis of schedules using simple closed-form processor-utilization-based tests has been extensively studied under fixed-priority scheduling. However, in addition to feasibility, we also require task T i 's response time w i , given by the time interval between T i 's release and finish times; the response time is used in the next stage of our algorithm to determine the message delays to be satisfied by the network.
For multi-rate task graphs, the schedules on individual processors are simulated for duration equal to the least common multiple (LCM) of the graph periods. Since this duration evaluates all possible interactions between tasks belonging to the different graph iterations, the worst-case response time for each task T i is obtained. Fig. 3(a) shows a simple multi-rate system comprising two task graphs with periods 2000 µs and 3000 µs; Figs. 3(b) and 3(c) show the task allocation and scheduling ranges, respectively. Fig. 3(d) shows the corresponding schedule for 6000 µs−the LCM of the graph periods. Task response times within this time interval are shown in Fig. 3(e) . Multiple iterations of a task are evaluated to obtain its worst-case response time. For example, in Fig.  3(e) , the first iteration of tasks T 1 , T 2 , and T 4 (in bold) has the maximum response time among the iterations within 
MESSAGE CLUSTERING
We now develop a clustering approach to reduce the cost of the initial network topology where multiple messages are grouped on a single bus while preserving the feasibility of the original solution. The fault-tolerance requirement of each k-FT message is also satisfied during this procedure.
First, we briefly review message transmission in a typical TDMA communication protocol such as FlexRay. Messages are transmitted according to a static, periodic, and global communication schedule called a round, comprising identical-sized slots. Each processor P j is allotted one or more sending slots during a round where both slot size and the number of slots per round are fixed by the system designer. Though successive rounds are constructed identically, the messages sent by processors may vary during a given round.
We now state the fault-tolerant message clustering problem as follows. Given a communication deadline delay(m i ) for each k-FT message m i sent by processor P j , construct TDMA rounds on the minimum number of communication buses such that during any time interval corresponding to delay(m i ), P j is allotted a sufficient number of transmission slots to transmit m i . Allocation of messages to multiple buses is related to bin-packing where messages are packed into a bin (round) of finite size while minimizing the number of bins. The general bin-packing problem is NP-complete and heuristics are typically used to obtain a solution [9] .
We treat each m i as a periodic message with period period(m i ) equal to its deadline delay(m i ) and generate message clusters {C j }, such that the corresponding TDMA round round(C j ) satisfies the following constraints: (1) No two replicas of a k-FT message m i are allocated to C j . (2) The duration of round(C j ) does not exceed a designer-specified threshold. (3) The slots within round(C j ) guarantee m i 's deadline, i.e., the time interval between successive sending slots for m i equals its period.
Each message cluster C j is allocated to a separate communication bus in the final network topology. Our method also makes efficient use of bus bandwidth by minimizing the number of transmission slots needed to satisfy message deadlines within a TDMA round by reusing slots between messages sent by a processor whenever possible.
We assume an upper bound on TDMA-round duration provided by the designer in terms of the maximum number of transmission slots n max and slot duration ∆ slot . Typically, the choice of n max depends on the memory limitations of the communication co-processor such as the number of transmit and receive buffers. Each transmission slot within a round has duration To guarantee message m i 's deadline, the corresponding slot allocation must satisfy both its periodicity requirement and a distance constraint between successive m i transmissions as the following example illustrates. Fig. 4(a) shows an allocation scenario for message m 1 having delay(m 1 ) = 2 slots within a TDMA round of duration four slots where m 1 requires one slot for transmission. Though m 1 's periodicity requirement may be satisfied by simply allocating sufficient slots within each of its periods, it results in missed deadlines. The interval between successive m 1 transmissions may be as close to one and as far as three slots away. As Fig. 4(b) results in a deadline violation where the minimum and maximum distances between successive slots for m 2 are four and six slots, respectively. Therefore, to guarantee message m i 's deadline, the corresponding allocation must satisfy a maximum distance between successive m i transmission slots equal to period(m i ). Note that in the above example, message deadlines may be satisfied by modifying their periods appropriately. Fig. 4(c) shows the slot allocation for both messages after m 2 's period is modified to four slots. It is easily checked that the distance constraint of two and four slots for successive transmissions of m 1 and m 2 , respectively, is satisfied.
The above discussion suggests that the original message periods may need modification prior to allocating slots within the TDMA round. We adopt a strategy where the message periods within a cluster are constrained to be harmonic multiples of some base period p base , i.e., 
Transmission-Slot Reuse:
Recall that during clustering, each message m i is treated as periodic with period period(m i ). However, if the task T i transmitting m i does not execute at that rate, then the bus bandwidth is overutilized. We can improve bandwidth utilization by reusing transmission slots among the multiple messages sent by processor P j .
The worst-case arrival rate arrival(m i ) for each message m i in a multi-rate system is obtained during schedulability analysis by simulating the corresponding task schedule. It is important to note that arrival(m i ), expressed in terms of slot intervals, depends on the execution rate of the sender task T i . Let {m i } be the set of messages sent by a processor within a message cluster C j . Now, assume message m new , also transmitted by the same processor, to be allotted slots within round(C j ). Given a set of clusters and a new message to be allocated to one, CLUSTER explores all possible clustermessage allocation scenarios. Slot reuse is used as the deciding factor in selecting the best allocation since the cluster allocation resulting in maximum reuse minimizes the bandwidth utilization. Finally, when TDMA slots are shared between messages sent by a processor, the communication co-processor must correctly schedule their transmission, i.e., given a slot, decide which message to transmit in it. Though this paper does not address message-scheduling logic within the coprocessor, an earliest-deadline first approach seems appropriate. 
CASE STUDY
We now illustrate the proposed synthesis method using some advanced automotive control applications as examples. These include adaptive cruise control (ACC), electric power steering (EPS), and traction control (TC), and are detailed in Figs. 7(a)-(c) . The ACC application automatically maintains a safe following distance between two cars, while EPS uses an electric motor to provide necessary steering assistance to the driver. The TC application actively stabilizes the vehicle to maintain its intended path even under slippery road conditions. These applications demand timely interaction between distributed sensors, processors, and actuators, i.e., have specific end-to-end deadlines, and therefore require a dependable communication network. Fig. 8(a) shows the physical architecture of the system where sensors and actuators are directly connected to the network and the designer-specified task-to-processor allocation, while Fig. 8(b) summarizes the various message attributes affecting network topology generation. We assume 1-FT messages throughout. Columns two and three list the sending and receiving tasks for each message and the message size size(m i ) in bits, respectively, while columns four and five list the communication delay delay(m i ) for messages in µs, and transmission-slot intervals. These delay values are obtained by first assigning deadlines to tasks and then performing a schedulability analysis on their respective processors.
As summarized in Fig. 9(a We now show how to reduce bandwidth utilization by sharing transmission slots between messages. As candidates for slot reuse, consider messages m 3 and m 10 sent by tasks T 3 and T 12 , respectively, where both tasks are allocated to processor P 2 . In Fig. 10(a) , where message periods are modified using p base = 3, m 3 and m 10 cannot share slots since both have a periodicity of six slots. In Fig. 10(b) , however, when their periods are modified as period(m 3 ) = 4 and period(m 10 ) = 8 using p base = 4 slots, reuse is possible. Note that the EPS application comprising T 3 transmitting m 3 has a 1500 µs period-corresponding to the inter-interval time between successive m 3 transmissions. Therefore, in Fig. 10(b) , m 3 requires only one of four allocated slots on bus B 1 (Task T 3 , however, may request m 3 's transmission anytime during the round), and m 10 with a period of eight slots can reuse the one free slot available during any Fig. 10(b) has a somewhat lower slot utilization of 89.5% compared to 90% for Fig. 10(c) . Since the empty slots in Fig. 10(b) may be used to transmit additional (non-critical) messages when compared to Fig. 10(c) , we select the topology in Fig. 10(b) as the final solution.
CONCLUSION
This paper has addressed the synthesis of low-cost TDMA communication networks for distributed embedded systems. We have developed a fault-tolerant clustering method which allocates and schedules k-FT messages on the minimum number of buses to provide dependable transmission. The proposed method was illustrated using a case study involving some advanced automotive control applications and it was shown that sharing transmission slots among multiple messages reduces bandwidth consumption while preserving predictable communication. Therefore, the method has the potential to reduce topology cost when applied to larger embedded systems.
