The Engineering Meetings Board has approved this paper for publication. It has successfully completed SAE's peer review process under the supervision of the session organizer. This process requires a minimum of three (3) reviews by industry experts.
The Engineering Meetings Board has approved this paper for publication. It has successfully completed SAE's peer review process under the supervision of the session organizer. This process requires a minimum of three (3) reviews by industry experts.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of SAE. 
INTRODUCTION
Controller Area Network (CAN) has received wide acceptance among automotive industries in Europe as well as in US. CAN protocol is designed to take care of event-triggered messages. Since the generations of event-triggered messages are asynchronous activities, sometimes several messages may compete for the bus at the same time. As a result, lower priority messages will have to wait in the transmit buffers of the corresponding CAN controllers, and their latencies will also increase as well. The average busload has to be low to keep the message latencies from growing without any bounds. Safety critical messages must be delivered within certain time limits to protect the vehicle and the life of its occupants. In other words, safety critical messages need guaranteed deterministic latencies. Researchers have been investigating various types of time-triggered protocols [1] [2] [3] [4] [5] [6] to deliver messages that need deterministic latencies. The Time-Triggered CAN (TTCAN) protocol has been developed to take care of those CAN messages that need deterministic latencies.
In a TTCAN system, messages are scheduled based on a matrix called the system matrix, as shown in Figure 1 . Figure 1 also contains four arbitrating windows and four free windows. Event-triggered messages compete for the bus during the arbitrating windows. The free windows are reserved for future expansion of the system. A system matrix is divided into several basic cycles. For example, the system matrix of Figure 1 has been divided into four basic cycles, Basic Cycle 0 through Basic Cycle 3. A TTCAN system has a master node, which is responsible for maintaining the global time. The master node sends a reference message at the beginning of each basic cycle, as shown in Figure 1 . This reference message contains the value of the global time. All other nodes determine the offset between this global time and their own local time. After that, during the remaining part of the basic cycle, all other nodes keep track of the global time by adding their offset with their local time. Every node knows at what global time (trigger time) it has to send its time-triggered messages. Thus, when a node determines that the current global time is equal to the trigger time of one of its messages, the node sends the message through the CAN bus. There are some issues related to a TTCAN system. For example, if a message cannot go through the bus due to the presence of some errors, retransmission of the message is not allowed in order to prevent the message from crossing over the boundary of its window. If it is a time-triggered message, then it has to wait until its next exclusive window. For example, if Message A cannot go through its exclusive window in Basic Cycle 1 (see Figure 1 ) due to some errors, then it has to wait until its exclusive window in Basic Cycle 0. Thus, Message A will be delayed by the length of three basic cycles. As a result, vehicle's safety may be compromised. If an event-triggered message cannot go through an arbitrating window due to some errors, then it has to wait until the next arbitrating window. Thus, the message will be delayed by a significant amount of time. In this paper, we present the design of a fault tolerant TTCAN system. Our system contains two buses: the primary bus and the secondary bus, as shown in Figure  2 . The primary bus acts as a TTCAN bus, and it is the main bus for carrying messages of the TTCAN system. The secondary bus is a backup bus, and it will be used to transmit those messages that could not go through the primary bus due to the presence of faults. The secondary bus could be a standard CAN, LIN or any other type of bus. Since the probability of having an error in a CAN message is not very high [7] , the traffic on the secondary bus will be very low, e.g. under 1% of the traffic on the primary bus. Thus, the secondary bus could be a low speed bus and the nodes of other networks within the system could also send their messages through the same secondary bus. This means that the secondary bus could be shared with the nodes of other networks of the system.
It would be very convenient if a fault tolerant TTCAN system could be built on the top of an existing CAN system without requiring any hardware modifications within the CAN controller chips. This paper investigates the implementation of such a system using off-the-shelf chips and developing code to manage the TTCAN messages.
The implementation of the fault tolerant TTCAN is explained in the next section.
IMPLEMENTATION SECTION
A prototype of fault tolerant TTCAN system has been built using six PIC18F258 micro-controllers that have integrated CAN modules within the same chip. The CAN ports of all six chips have been connected together to make the primary TTCAN bus. The secondary bus has been built as a standard CAN bus by using six external CAN chips (MCP2510). The PIC18F258 micro-controller communicates with the external CAN chip through the SPI (Serial Peripheral Interface) port. Note that, if a micro-controller has two built-in CAN ports, then no external chips are needed to build the secondary CAN bus. Each node of the fault tolerant TTCAN prototype has two CAN modules, two 10 MHz oscillators, two transceivers (PCA82C250T), a reset switch, and an LCD unit. The LCD unit is used to display the real-time status of a TTCAN node. Figure 3 shows a photograph of the hardware components of the TTCAN system with six nodes. Figure 4 shows a pin-level schematic diagram of a node and Figure 5 shows a block diagram of a node. Special evaluation tools are used to verify the system and measure its overall performance. Some of these tools are Vector CANTech and CAN4USB tools.
HARDWARE COMPONENTS AND TOOLS
This section gives a list of hardware components and tools used to build and test the fault tolerant TTCAN system. The list is as follows:
1. Easy to program Microchip 18F258 RISC microcontrollers with built-in CAN controllers were used to implement the TTCAN bus. 2. External Microchip MCP2510 CAN controllers were used to implement the standard secondary CAN bus. 3. 24x2 LCD displays were used to show the status of each node. 4. MPLAB software from Microchip was used to assemble and debug the assembly code. 5. Several CAN transceivers TJA1050 were used to enable communications among other nodes. 6. Several crystal oscillators were used to maintain time accuracy of the overall system. 7. PIC programmer board from EPIC was used to program the micro-controllers. CAN4USB and Vector CANTech tools were used to monitor the bus and collect data using a computer.
SOFTWARE
The software was written totally from scratch. It was written in assembly language and debugged using MPLAB software from microchip. Various input/output routines were written to display different types of messages on the LCD screen of each node. The LCD screen displays total number of messages transmitted and received since the system has been started. The LCD also displays the ID number and the content of the last received message, and the status of the bus, whether it is on or off.
Initialization of I/O Ports
The Input and output lines were initialized on Port B and Port C. Port A pins were not used. Some of the Port B pins were used to control the LCD display and to transmit/receive data to and from the internal CAN chip. Some of the Port C pins were initialized for SPI bus to communicate with the external CAN chip. Moreover, some pins of Port C were used to control LCD display.
Initialization of the Internal CAN module
The internal CAN module of a node has been initialized to use 5 acceptance filters. Three of them are used for potential masters and the rest for data communication.
The system uses 11 bit IDs and standard data frames. where, BRP is the baud rate pre-scale factor and F OSC is the oscillator frequency. One bit time is defined as:
The duration of different time segments are as follows: 
Initialization of the External CAN module (MCP2510)
The external CAN module was initialized to use all 6 acceptance filters. The system uses 11 bit IDs and standard data frames. The MCP2510 chip has several instructions to correctly use the part. Some of these instructions are reset, read, write, RTS (request to send), read status, and bit modify. These instructions are sent to the external CAN module from the PIC MCU using the SPI bus. The data rate for the external CAN module was also set to 125kbits/sec.
Initialization of the SPI bus for the External CAN module
In order to communicate with the external CAN chip, it is important to initialize the SPI bus. SPI I/O lines are on Port C. So Port C I/O pins were programmed accordingly. The external CAN chip (MCP2510) supports only two modes: Mode 0,0 and Mode 1,1 that are related to the SPI clock polarity and clock phase. Thus, the SPI channel needed to be adjusted for one of these modes.
Software Implementation of TTCAN
As mentioned earlier, a TTCAN system needs a master node to maintain the global time. The system should also have several potential masters to become the master when the current master becomes faulty.
Potential Masters
For this TTCAN design, three potential masters have been defined to keep the network alive. Each potential master is given a priority so that when multiple potential masters will compete for the bus to become the master, the highest priority potential master will become the master. In our design, potential masters are: Node1 (ID 1), Node 2 (ID 2) and Node 3 (ID 3). 
Level 1 and Level 2 Reference Messages
There are two types of reference messages: Level 1 and Level 2. During the system start up, all potential masters broadcast Level 1 messages to become the master. In our design, the Level 1 reference message does not contain any data. The identifier of the Level 1 message determines which potential master will become the master. As mentioned earlier, we have three potential masters, which are Nodes 1, 2 and 3. When these potential masters send Level 1 reference messages, they send their node number in the message identifier field. As a result, Node 1 becomes the highest priority node and Node 3 becomes the lowest priority node.
A potential master, after becoming the master, sends a Level 2 reference message. The Level 2 reference message is the reference message of a basic cycle. In our design, the new master sends a Level 2 reference message 2 msec after it sent the Level 1 reference message to become the master. Thus, the new master starts controlling the TTCAN system without any significant delay. The ID of a Level 2 reference message is 0, so the master can get the bus immediately. A Level 2 reference message contains two bytes that carry the master's local time, which is also the global time of the entire network. A Level 2 reference message also contains one byte that indicates the basic cycle number.
In our system, the system matrix has only two basic cycles, as shown in Figure 7 . In the system matrix, we reserved a 2-msec arbitrating window after the reference window of a basic cycle. This arbitrating window will enable a Level 2 reference message to be retransmitted, in case if it could not go through the reference window of a basic cycle. Thus, a master will get two chances in a basic cycle to send its Level 2 reference message. Sometime it could be the case that the reason there was no Level 2 reference message during the reference window of a basic cycle is that the master died. To recover the system from a dead master, we designed the system in such a way that if the potential masters did not see a Level 2 reference message during the reference window of a basic cycle, then all of them will send a Level 1 message during the arbitrating window that is next to the reference window of a basic cycle. If the master is still alive and sending a Level 2 reference message during the arbitrating window, then it will win the arbitration, because the identifier of the Level 2 reference message is 0 meaning that the Level 2 reference message has the highest priority among all CAN messages. If the master is dead, then one of the potential masters will win the arbitration during the arbitrating window and become the new master. This new master will then send a Level 2 reference message at the beginning of the next basic cycle. From that point, the new master will be in control of the system. Figure 6 shows the flow diagram of the operation of the fault tolerant TTCAN system.
Initialization of Timers
Before initializing the timers of TTCAN nodes, there must be an agreement among all nodes to have certain NTU (network time unit) that can be established. The designer needs to collect data regarding each node to see whether a common NTU could be established. Among these data are oscillator frequency and timer pre-scale values. In the worst case, a change of the oscillator frequency may be needed. In our design, all So for a 10MHz clock, each timer count is equivalent to 1.6 microseconds. In this design, we considered a timer count as one NTU.
Initialization of Receive Message Time Stamp
It is very important to have time stamps for Level 2 reference messages. The PIC18F258 CAN chip is configured to time stamp receiving messages. It simply sends an interrupt to CCP1 (Capture, Compare and PWM module) and initiate time capture of Timer 3 of the PIC micro after the message is received. The challenge here is how accurately we can compensate errors due to the variable stuff bit times. During time stamping, it is important to have CCP module configured for capture mode.
Initialization of the Capture Module
At a particular time, only a Capture, Compare, or PWM feature of the PIC micro can be used. The Level 2 reference message is time stamped using the Capture Module of the timer. The Capture Module cannot be used for all received messages because it is necessary by Compare Module to determine whether the current time is equal to the trigger time of a message. Note that a message is transmitted through an exclusive window when the current time is equal to the trigger time of that message.
Initialization of the Compare Module
As mentioned earlier, the Compare Module is used to determine whether the current time is equal to the trigger time of a message. It constantly compares the next trigger time of a message with the value of Timer 3 of PIC micro. If the condition is true, an interrupt is generated and the processor transmits a message at that time. For the master node, Compare Module is used for triggering Level 2 reference message as well as regular messages. All nodes (except the master node) enable the Capture Module at the time of receiving a Level 2 reference message. After that, all nodes enable the Compare Module to trigger messages.
System Matrix
For the system designed in this paper, there are two basic cycles in the system matrix. Each basic cycle is 16-msec long. There are 8 time windows in each basic cycle, as shown in Figure 7 . Each time window is 2-msec long. The TTCAN transmission is configured for
the stuff bits and the 3-bit inter-frame space. The maximum length of a CAN message with the 29-bit identifier is 154 bits. Thus, the 2-msec window is wide enough to send a CAN message plus the error frame, in case any errors occur during the transmission of a message.
Calculation of Trigger Times
From the system matrix, all nodes know when to transmit their messages. The difference (offset) between the beginning of a Level 2 reference message and the trigger-time of a message is known as the cycle time. From the system matrix, a node knows the cycle times of all messages that it needs to transmit. When a node receives a Level 2 reference message, it captures its local time. Then the offsets (cycle times) of different messages are added to this local time to determine at what values of the local timer the node will transmit its messages. Potential masters also know the cycle time of the Level 2 reference message. This cycle time is equal to the length of a basic cycle.
Initiating Transmission and Disabling Re-Transmission
When compare module initiates an interrupt, it is the trigger-time to start transmitting a message. To initiate transmission, TXREQ bit (transmit request bit) of the internal CAN module is set to 1. Then the message is put on the bus. If an error occurs, an error interrupt is generated. Since retransmission of a message is not allowed through the TTCAN bus, once an error is detected, the retransmission is disabled. After that, the message is sent to the secondary CAN bus via the external CAN chip (MCP2510). To disable retransmission, TXREQ bit is cleared after an error interrupt is detected. The worst-case delay, in putting the message on the secondary bus, will be encountered if the error occurs on the last bit of the message. The total delay would be the sum of the length of the message plus some processing time to load data into the other CAN module. Figure 8 shows the flow chart for the process of initiating transmission and disabling retransmission.
Testing and Validation Criteria
Several testing criteria have been defined to check the overall system performance. These criteria are shown below.
1. Test potential masters by disconnecting each one at a time and watch the flow of messages. 2. Induce faults to change the message route to the external CAN bus. 3. Induce faults to the master node to determine whether one of the other potential masters can take control of the bus automatically. 
Synchronization and Physical Layer Testing
All nodes on a given CAN bus must have the same nominal bit rate. For the design presented in this paper, the nominal bit rate is 125Kbit/sec, which is 8-µsec bit time. The CAN protocol uses NRZ (non return to zero) coding, which does not encode a clock into the data stream. Therefore, the receive clock must be recovered by the receiving nodes and synchronized to the transmitter clock. Oscillator and transmission time vary from node to node. 
CHALLENGES, DIFFICULTIES AND DESIGN ISSUES
Our goal was to determine the feasibility of designing a fault tolerant TTCAN system using off-the-shelf chips and developing software to control the TTCAN system. If such a system could be built without requiring any modifications within existing CAN controllers, then no custom chips need to be designed for building a TTCAN system. During the design process, we encountered some challenges, which are explained below.
If the master node suddenly fails, basic cycle must be kept as it was started at the beginning. To solve this problem, an arbitrating window has been added to every basic cycle for retransmission of the Level 2 reference message. This arbitrating window has been placed next to the reference message window. Other potential masters can also use this arbitrating window to send their Level 1 reference message, if they did not detect any Level 2 reference message at the beginning of a basic cycle. So if the master node really died, then one of the potential masters will automatically become the master of the system.
Another difficulty that could face designers in the auto industry is having a common NTU (Network Time Unit) among all nodes. It is not very common to have the same oscillator frequency among all nodes. In the worstcase, the designers will need to have a separate MCU with oscillator frequency that make a common NTU among all nodes. For the design presented in this paper, NTU was not an issue because all nodes use the same oscillator frequency of 10 MHz.
Another difficult issue was how to time stamp a Level 2 reference message. CAN controllers are designed to time stamp a message after they receive the message. Thus, all nodes capture their local time after receiving the Level 2 reference message. Since a Level 2 reference message contains variable number of stuff bits, some timing errors will be introduced in other nodes. As a result, trigger times for messages may vary slightly from one basic cycle to another basic cycle. It has already been mentioned that a Level 2 reference message contains three bytes of data: two bytes for the global time and one byte for the basic cycle number. It can be shown that with 3 bytes of data in a CAN message, the maximum number of stuff bits in the message will be 11 and 15 for messages with 11-bit and 29-bit identifiers, respectively. In our design, CAN messages have 11-bit identifiers, and the bus has a speed of 125kbits/sec. Thus, the maximum timing error in triggering messages in our design is The above mentioned problem of time stamping a Level 2 reference message can be solved by having CCP1 pin (Capture, Compare and PWM pin) connected to the receive line of the CAN chip. If the CCP1 line is programmed to capture a falling edge, then the capture module will be triggered to capture the local time at the beginning of the SOF (start of frame) bit. In order to enable capture mode at every Level 2 reference message, there must be enough time to make sure that there is no traffic on the bus while the capture module is being enabled. This requirement can be satisfied either by having a small free window at the end of each basic cycle or by making the last window of a basic cycle little bit wider. This solution gives an accurate timing for triggering messages. The price that we have to pay for this solution is to waste a part of every basic cycle to provide a small free window at the end of a basic cycle or to make the last window of every basic little bit wider. However, if a CAN controller can capture time at the beginning of a message, then that would be the best solution. In that case, the trigger time of messages will be perfect and no part of the system matrix will be wasted.
CONCLUSION
This paper studied the feasibility of implementing a fault tolerant TTCAN system using off-the-shelf chips. From our study, we determined that it is possible to build a fault tolerant TTCAN system without requiring any custom chips. There were some problems and issues related to the design of our system. However, all problems were solved and issues were resolved.
