DIGISAFE® XME: High Availability Vital Computer by Lejeune, Pascal et al.
HAL Id: hal-02271067
https://hal.archives-ouvertes.fr/hal-02271067
Submitted on 26 Aug 2019
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entific research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
DIGISAFE® XME: High Availability Vital Computer
Pascal Lejeune, Françoise Dufour, Eric Chenu
To cite this version:
Pascal Lejeune, Françoise Dufour, Eric Chenu. DIGISAFE® XME: High Availability Vital Computer.
2nd Embedded Real Time Software Congress (ERTS’04), 2004, Toulouse, France. ￿hal-02271067￿
2nd European Congress ERTS - 1 - 21 – 22 – 23 January 2004
Session 5A: Hardware & Software Architectures
DIGISAFE® XME:
High Availability Vital Computer
Pascal Lejeune, Françoise Dufour, Eric Chenu
Siemens Transportation Systems
pascal.lejeune@siemens.com, phone: +33 1 49 65 74 71
1. INTRODUCTION
During the 1980s, progress achieved in automation systems led us to computerize functions that had hitherto used hard wired or relay
logic. This especially included system safety related functions which previously used hardware based systems. The safety of these
functions was based on a "failsafe" principle: any failure placed the equipment in a non dangerous state.
This property is unfortunately no longer guaranteed with computer based systems: an anomaly in a compiler, a failure in a computer
memory can all lead to a dangerous situation that must be detected and inhibited. This article presents a computer that meets these safety
goals: the DIGISAFE® XME computer. This system represents the third generation of this technology in 15 years.
The safety principles that apply to the DIGISAFE® XME vital computer is based on coding information. Compared with other designs
based on computer redundancy (requiring complex reliability studies), this principle offers the advantage of reducing the amount of
hardware used and simplifying the safety demonstration which is grounded in solid mathematical bases and is independent of the
computer used.
A vital functions application is a cyclic one. A processing
integrity check is run at the end of every cycle, when refreshing
the vital function's logical outputs. This check is performed safely
by specific vital part equipment. If the coding is not compliant,
the logical outputs are forced to a restricted state (a state that is
not dangerous for the system).
In the latest version of this system, the design has been improved
in order to send all vital checks to a dedicated digital system
board, making it possible to use a standard computer for the non-
vital parts of the processing work.
From a single source program, the way an application runs on the
DIGISAFE® XME is automatically split into two parallel
executions: data processing performed by the standard computer
board and control codes processed by a coprocessor (located on
the dedicated digital board), bringing the benefit of heterogeneous
hardware redundancy while retaining coding advantages.
The coding technology used guarantees that there is no common
mode shared by both processes, protecting from both errors in the compilation sequence as well as hardware failures.
To improve availability, the DIGISAFE® XME computer can be installed in a redundant layout and comprises a hot redundancy
(software) protocol ensuring that the redundant system remains safe [1].
This article presents the general principles used in vital systems and the computer's hardware and software architecture. It describes how
to ensure the safety of vital application processing and of vital inputs/outputs (On/Off and messages) and presents related hardware
implementations.
Source code
Run non-vital
processing
Run vital
processing
Safety check
(dynamic controller)
Output supply
Figure 0 DIGISAFE® XME computer operating principle
2nd European Congress ERTS - 2 - 21 – 22 – 23 January 2004
2. ARCHITECTURE
2.1. Principle
Ensuring the dependability of DIGISAFE® XME computer processing is based on a probabilistic approach to safety. The high level of
safety is obtained by combining: arithmetic coding, signature checks and hardware checking measures.
Every data item (called a non-vital value hereinafter) is protected by coded redundancy (called code hereinafter) which integrates
arithmetic data coding, a data identification signature and a dynamic signature ensuring validity in time (called date hereinafter).
This safety approach guarantees the execution of the vital system software, the exchange of inter-equipment messages, binary On/Off I/O
and the acquisition of specific sensor data (speed measurements, beacons,…). It does not guarantee the validity of the application source
code. This needs to be done by an analysis of this source code by an independent team who may possibly be assisted by the use of formal
proving tools (B language). This approach is presented in [2].
The inter-equipment messages are protected by the same code as the internal data, except for the dynamic signature which is replaced by
a protocol which handles the management of asynchronism aspects between equipment. This coding makes it possible to ensure the
validity of messages received and retain independence from the transmission media.
The vital On/Off inputs are provided by specific failsafe electronic boards which guarantee that the code corresponding to an input is
permissive only if the input really is. A restrictive input state leads to restrictive application behavior (for example: door closed is false,
so the vehicle is not allowed to run). A permissive input state may lead to a permissive action by the application (for example: door
closed is true, so the vehicle may run).
The On/Off outputs generated by the application are checked for safety by specific electronic boards designed for vital operation. If the
coding of one of the outputs is incorrect or if an output that should be restrictive is set to a permissive state, then all of the outputs are
forced to the restrictive state. Remark: an output's restrictive state corresponds to a state that can never be contrary to safety requirements
(for example: train running permission is false). A permissive output state may be dangerous to the system and is only generated where
there is no risk involved (for example: permission to run is true).
Positioning is determined by associating spot beacons and a mechanism for measuring travel and speed. A beacon is a transponder which
returns its identity in the form of coded data when an interrogation antenna passes over it. Travel is measured by counting the number of
teeth on a wheel mounted on the vehicle axle or by an optical speed measurement sensor (OSMES), by correlating images from the rail,
an extremely precise and accurate approach. With the exception of OSMES [3], a standalone sensor that uses the inter-equipment
protocol, the digital processing performed for other sensors dates and codes the data, as far upstream as possible, in a format that is
directly usable by the computer.
An overall signature for vital processing is periodically checked by an independent device: the dynamic controller. Should an error occur,
this circuit places the computer in a safe state: all On/Off outputs set to the restrictive state and their power supply cut.
2.2. Hardware Architecture
The XME computer uses a single processor open architecture based on a cPCI bus. The non-vital part comprises standard "off the shelf"
boards covering the CPU, network (Ethernet,…), non-vital I/O, serial link and power supply functions along with all of the other
standard functions available on the market. There are no safety constraints affecting this standard part of the computer.
In the current version, the processor used is a 500 MHz Pentium III. Without any need for further development, the CPU clock frequency
can be increased and if the software is ported, other processors could be used (PowerPC, MIPS,...). Using a standard Pentium processor
board makes it possible to benefit from a PC environment with its VGA interfaces, keyboard, Ethernet, Com1 serial port, floppy disk
drive as well as a flash disk type hard disk.
In addition to those available from the standard CPU, serial links are provided by VHDL modules and can support UART and HDLC
communications. It is possible to use seven full duplex links.
Ensuring vital computer operation is achieved by adding a specific digital board (called CTN_P). This board handles the following
functions.
• Digital processing that is coded to ensure safe software processing are performed by a math coprocessor mounted on an extension
board. This circuit calculates the execution signature for vital software execution.
• Managing the DIGISAFE® XME network of vital On/Off I/O that enables the transfer, dating and coding of vital On/Off I/O. The
DIGISAFE® XME network is a synchronous ring network.
• The dynamic controller function that cyclically checks the code produced by the vital software as well as the DIGISAFE® XME
output codes. It provides the commands for the vital power supply used for the DIGISAFE® XME outputs. If an operating failure
2nd European Congress ERTS - 3 - 21 – 22 – 23 January 2004
occurs, the dynamic controller cannot provide the right commands to the vital power supply which will drop out and place the
system in a safe state (On/Off outputs in a restrictive condition).
• A safe clock time base that performs a drift check comparing two independent crystal oscillators. Any drift that exceeds the
tolerated level will cause the vital power supply to drop out. This time base provides the synchronization signals for the software
and hardware applications, a check on DIGISAFE® XME I/O dates and on the software via the dynamic controller, guarantees that
the computer is correctly aligned with its time base.
• An interface with the positioning boards for the on-board version.
These safety functions are performed in VHDL and represent the complexity of 2.4 million FPGA gates (the FPGAs used are ALTERA
10K and 20K).
Specific boards can be added, depending on requirements, for vital I/O. Another one can be used for analog processing and the electrical
interface of positioning devices. The vital power supply board provides the power required by the output boards.
Figure 1 DIGISAFE® XME computer hardware architecture
2.3. Software Architecture
The software is written in C and Ada languages and is run in a multitask real-time environment. There is one cyclically awakened vital
task and a number of non vital tasks that awaken cyclically or triggered by an event (serial links,…) for application and immediate
services purposes. The frequency of cyclic activation of the vital task and DIGISAFE® XME network tasks is determined by interrupts
generated by the dynamic controller. A specific interrupt makes it possible to maintain a cyclic non-vital task, synchronized with the vital
task cycle. The data transfers between the tasks use a mailbox mechanism.
The vital task sequencing is as follows. At the start of the cycle, the vital application software processes the inputs provided by the
DIGISAFE® XME vital I/O network's immediate services, the inter-computer messages, if any, and the positioning data. This
information is processed to calculate the appropriate system response, then the vital task generates the vital outputs which are sent over
the DIGISAFE® XME vital I/O network at the end of the cycle and prepares the inter-computer messages to be sent. This cyclic
operation is repeated ad-infinitum.
Keyboard
Standard computer Vital part
cPCI bus
DIGISAFE Vital I/O network
HDLC/UART
serial links
Ethernet
CPU board
(including
flash disk)
Network
board
(Ethernet,…)
Non-vital
I/O board
Dedicated
digital vital
board
(CTN_P)
Serial link
controller
(On CNT_P
or board of the
shelf)
Vital input
board
Vital
output
board
Odometer & Beacon
interface board
Beacon
Phonic wheel or
OSMES
Power
supply
boards
Vital
output
board
SVGA
On-
board
only
Vital power supply
(Needed by output)
Vital input
board
2nd European Congress ERTS - 4 - 21 – 22 – 23 January 2004
Vital software loop 
( 48 to 360 ms cycle) 
Vital messages to 
other computers Sequences to safe clock 
and vital power supply 
DIGISAFE  
network 
Vital positioning 
data 
Vital messages from 
other computers 
Synchronizing 
signals 
Dated vital input 
code 
Vital outputs 
redundancy and 
dated tracer 
Digital outputs 
commands 
Dated vital 
outputs code 
 
Vital digital inputs and 
messages, positioning data 
extraction 
Vital software 
computation 
Vital digital outputs and 
vital messages computation 
DIGISAFE 
network receiver 
DIGISAFE 
network, 
Transmiter 
I/O 
board 
DIGISAFE 
dynamic 
controller 
I/O 
board 
Figure 2 DIGISAFE® XME computer dynamic architecture
The data along with the vital I/O are coded and dated with a date that is incremented by each iteration of the vital cycle. This makes it
possible to identify data from successive cycles.
At the end of each cycle, the dynamic controller checks the static and dynamic signatures that result from running the vital software.
The validity of the vital inputs, the positioning data and the inter-computer messages is indirectly checked by them being accepted at the
start of the cycle by the vital software.
The vital outputs are cyclically checked at a higher frequency. This check is handled by the hardware (dynamic controller), as the
software's role is to provide, at the slower application cycle rate, a checksum that reflects the controlled output states.
2.4. A Modular and Configurable Architecture
The DIGISAFE® XME advanced vital computer is modular and configurable.
• Processing is secured by a generic coprocessor board called MSCA.
• On/Off I/O are managed by the DIGISAFE® XME vital I/O network and by boards that are configurable according to cycle time
and the number of I/O.
• Positioning inputs are acquired by a dedicated board. Two positioning configurations are available: DIGISAFE® XME beacons plus
phonic wheel or DIGISAFE® XME beacons plus optical speed measurement (OSMES).
Vital digital functions are clearly separated from operating ones and confined to just one board whose configuration can be downloaded
by the processor. The content of the FPGAs is programmed on power up by the CPU board, making it possible to configure the
computer's digital hardware functions simply by updating the software.
There is no safety constraint affecting the software except for the vital application source code. All of the software drivers are non-vital
ones.
The software layers use standard programming languages (Ada, C) and a standard RTOS (currently VxWorks) so that they are portable.
The same applies for digital hardware functions that are entirely described in VHDL.
These modularity and configurability properties make it possible to produce optimized computers ensuring vital safety limited to certain
functions only, for example:
• The vital processing unit that only secures software processing and inter-equipment messages. It is implemented simply by placing a
PMC extension board onto a standard computer.
• The I/O module comprising only the DIGISAFE® XME I/O and dynamic controller functions.
2nd European Congress ERTS - 5 - 21 – 22 – 23 January 2004
I/O module operation is coupled with a vital processing unit via a standard network. The vital processing units are then used as I/O
concentrators.
This allows implementing a distributed I/O network with the capacity to remotely locate I/O many kilometers away thanks to the use of a
fiber optic Ethernet network. If necessary, to ensure availability, sets of I/O, managed by a concentrator, are duplicated.
The complete assembly operates transparently, as seen by the application tasks and the DIGISAFE® XME drivers, just like a computer
that integrates the two functions. Nevertheless, the ability to manage a configurable number of I/O modules must be added to the
concentrator computer.
To guarantee proper operation of the complete assembly, the I/O modules are synchronized by the network concentrator via
synchronization messages sent over the Ethernet network.
3. VITAL PROCESSING
3.1. Data Coding
Protection against errors likely to corrupt data during exchanges and processing is based on the association of a separable arithmetic code
and a signature check. This coding makes it possible to detect:
• Operand and operation errors, and more generally any instruction sequencing errors. Actual sequencing is compared with preset,
expected, sequencing. This pre-determinability aspect is fundamental to the approach to securing processing because it is at the heart
of the mechanism used for validating processing system operation.
• Calculation errors. The principle used for securing processing does not of course allow pre-determining calculation results, but
coding is used so as to make validation impossible should an error occur.
This approach makes it possible to achieve an overall level of safety for the system, that is not dependant on either the complexity, nor
the technology of the hardware used (processor and transmission media) but which is however dependant on the size of the data
redundancy.
All vital data therefore comprises two fields:
• an information field, which forms the useful part of the data.
• a control field made up of the arithmetic sum of three terms:
   - the remainder of the integer division of the information field by the code key. If A is the code key (a large prime number chosen
randomly), x the useful data and k an arbitrary number such as 2
k
 > A, then the remainder rkx of the division of 2
kx by A is
used. This coding forms the fundamental error detection mechanism for securing processing.
   - a static signature Bx which characterizes the data identity. For all coded operators (both arithmetic and logic operators,
conditional branching...), the static signature of the result depends on that of the operands and the operator itself. This can be pre-
determined as it is not dependent on the values taken by the data and it forms a trace of their antecedents.
   - a dynamic signature D, common to all data, which guarantees the validity of the information in time. This signature is
incremented by a constant value on every application cycle as well as on each iteration in the internal loops.
The data item is therefore finally coded in the form X = 2kx -rkx+Bx+D as shown by the (x, -rkx+Bx+D) pair, where x represents the
information field (non vital value) and -rkx+Bx+D the control field.
In the remainder of this article, we will use the notation XF for the information field and XC for the control field of secure data X. A data
item X is said to be in-code if its control field XC is correct and its static signature takes the expected value. It is said to be out-code if
not.
Thanks to this coding, we show that the probability of failing to detect errors is lower than 1/A regardless of its origins: hardware failures
or compiler/linker errors. In other words, the technology used to implement processing does not affect the safety demonstration.
3.2. Securing Operations
The following basic principle applies. Every coded operation has a corresponding signature evolution law that is specific to it. The
control field for the result of a coded operation is securely calculated from the input operand control fields.
For a coded operation Op (X, Y):
2nd European Congress ERTS - 6 - 21 – 22 – 23 January 2004
Z = (XF op YF , Gop(X, Y) ) (securely calculated result with XC and YC)
Gop (X, Y) = ( - rk(XF op YF) + Fop(Bx,By) + D ) modulo A
The addition example shows how to understand the mechanism:
Fplus(Bx, By) = (Bx + By) modulo A, Gplus(X, Y) = (XC + YC – D) modulo A
Detecting operand and operator errors using the pre-determinable signature:
Using operand T in place of Y in "Z=X+Y", yields: Z = ( XF + TF , - rk(XF + TF) + Bx + Bt + D )
Using operation op in place of the addition, yields: Z = ( XF op YF , - rk(XF op YF) + Fop(Bx, By) + D )
In both of the above error cases, the result signature is incorrect.
Execution errors are detected by a coded data item that is internal to the coded operations. This data item is called a "tracer" and its
notation is S. The tracer is dependent on all of the operations performed since the application was initialized. It does not have an
information field, but it also has a signature that can be pre-determined. It evolves so that there cannot be any error compensation in case
of multiple malfunctions. The output control fields (On/Off and messages) must be tracer dependent.
After every coded operation, the tracer evolves in line with the formula: Sn+1 = α (ZC + rk(ZF) - D + Sn) ≡ α (Bz + Sn)
α is a constant. The implementation guarantees that it is impossible to store tracer values, so the tracer is not dated.
3.3. Branching
In line with the way coding is used, as defined previously, when a conditional branching function is used, care must be taken to ensure
that the branch followed is the right one, and if not, cause a tracer "out-code condition" (a signature different from the expected
signature).
Furthermore, the tracer signatures and the updated
variables change differently, depending on which
branch is followed (as the operations are branch
dependent). And to ensure pre-determination of
signatures after branching, these signatures need to
be independent of the branch followed.
To guarantee these two points, the following
processing is performed.
Test operations send back a non-vital Boolean
value that will allow branching and change the
tracer differently in the two branches so that
correct branch execution can be checked. More
precisely, tracer comes to include the control field
for test marker X (which is
 "- rk(true) + Bx + D" in the then branch, and
"- rk(false) + Bx + D" in the else branch):
Sn+1 = α ( XC - D + Sn).
At convergence (where branching ends), the tracer is added to the variables that are modified during divergence so that the signatures are
dependent on the test marker. All of these variables, as well as the tracer, are compensated (by adding a constant) so that they can be
brought back to a pre-determinable value regardless of which branch is followed.
Should a branching error occur, the tracer before convergence is out-code by αn .(rk(true)-rk(false)).
At the end of convergence, a tracer compensation error can never bring back an out-code tracer back to in-code.
Constants for compensating the tracer and the variables modified during divergence are called compensations. They are pre-calculated by
a host tool, the Signature Pre-determination Tool (OPS). They are stored in tables, the compensation tables.
Else branch
Xc = Xc + Tracer + CompXthen
Tracer = Tracer + CompSthen
Branching test
Then branch
Then branch processing
Bxthen, Tracerthen
Else branch processing
Bxelse, Tracerelse
Bxfinal, Tracerfinal
Xc = Xc + Tracer + CompXelse
Tracer = Tracer + CompSelse
Convergence
Figure 2 IF-THEN-ELSE branching
2nd European Congress ERTS - 7 - 21 – 22 – 23 January 2004
3.4. Loops
For repetitive structures (loops), care must be taken to ensure that the number of iterations performed is correct. Furthermore, the tracer
signatures and the variables updated in the loop must be independent of the number of iterations performed. The principle is similar to
that of branching (marking by a test variable and compensation at the end of the iteration). At any given point in the loop, the signatures
are the same regardless of the iteration. It is the date that makes it possible to guarantee the validity of data in time.
3.5. Result Check
The result check comprises checking that the static signatures for the On/Off outputs and the tracer are the same as the pre-determined
values and that the dynamic signature (date) evolves correctly during
each cycle.
The check is preceded by a compacting operation, to ensure that for
each cycle only a final signature bearing the trace of all of the
operations performed and any errors that may have occurred is
checked.
The check must be performed as a vital function by the Dynamic
Controller board. If the final signature is incorrect, the On/Off outputs
are forced to a restrictive state without taking into account the results
obtained.
3.6. Double System Principle
The secure processing principle makes it possible not only to protect
against execution errors, but also against compiler/link editor errors
that may impact software safety.
The compensation tables generated by the OPS are used during execution. If the program that is executed is different from the one
analyzed by the OPS, the vital application is in the out-code condition. It is the OPS that defines all of the signatures, especially the one
that should validate execution and which is checked by the Dynamic Controller board. Redundancy between the software generation
system and the Signature Pre-determination Tool is built-in.
3.7. Coding Information is transparent to the Applications
The vital applications are coded in Ada language (with the option of extending to other languages).
Data coding is transparent to the vital applications. They use conventional operators such as +, -, and, ≥, … (using Ada overloading),
table read/write services, divergence, convergence…. These are the coded operators that are tasked with handling the various processing
functions on the data fields and on the coded part.
3.8. Hardware Implementation
The vital software is run by the standard processor board that sends, via the cPCI bus, a trace of its execution to the vital digital control
board: instruction code, parameter addresses and calculation results. A calculation coprocessor then handles the related coded
calculations in line with the algorithms described above.
This coded calculations coprocessor comprises a hundred pipeline layers. The way the algorithms are implemented has been optimized to
eliminate looping back. This ensures that a basic instruction is executed every clock cycle without emptying the pipeline.
To reduce the size of the circuit, calculations on 48 bit code has been multiplexed into two 24 bit calculations.
Some complex operations (convergence, reading and writing array elements) require sequential processing which blocks the pipeline. An
input FIFO is used to stack 128 instructions awaiting processing, so that the coprocessor never makes the processor wait and it can catch
up thanks to its processing speed when performing basic operations.
Some calculations are performed without dating. Precautions are taken to avoid any dangerous storage of this information as well as
unplanned date information removals.
Source program
Compiler
OPS
Link editor
Compensation tables
Compiler
Executable
Figure 2 Double system principle
2nd European Congress ERTS - 8 - 21 – 22 – 23 January 2004
4. VITAL INPUTS/OUTPUTS
4.1. On/Off Inputs/Outputs and the Dynamic Controller
The conversion of logical inputs from sensors into coded values is handled by a failsafe type vital board. An input signal can only be sent
as permissive (high logic state) if it is truly so for the entire acquisition duration. If not or if a failure occurs, it is provided as restrictive.
The control field obtained for n inputs is provided compacted by the formula: Redundancy = ( )∑
=
−+
1-n
0i
iiii ).Sfalseδ(1.Strueδ  +D
where δi equals 1 if input i is true (permissive), and 0 if it is false (restrictive), Struei and Sfalsei are the codes returned by the board for
input i true and false.
For logical outputs (example: permission to move), it is necessary to ensure that the processing routines have not produced false
permissive outputs (by checking their control field and the tracer), that the restrictive outputs are indeed restrictive (checking output state
read back) and that the dynamic signature evolves correctly at every iteration. It doesn't matter when an output was incorrectly generated
as restrictive or when a permissive output is set to restrictive as these two cases are not dangerous ones.
After the logic outputs (On/Off) are generated by the vital application, their control fields and that of the tracer are compacted into a
single dated sequence. The sequence obtained in this way is provided, for checking, to the Dynamic Controller.
The sequence obtained for n outputs is provided by the formula: Redundancy = ( )∑
=
−+
1-n
0i
iiii ).Sfalseδ(1.Strueδ  + S + D
where δi equals 1 if output i is true, and 0 if it is false, Struei and Sfalsei are the codes corresponding to output i at true and at false.
The non-vital output values serve to control output actuators in a non vital way. It is therefore necessary to ensure that a restrictive
command does indeed correspond to a restrictive action at actuator level and has not been incorrectly transformed into a permissive
output (clearly not a safe situation). To do so, all of the outputs are read back. Just like for inputs, output read back data is returned in
coded format by a failsafe device. This board guarantees that a read back code is restrictive (set to false) only if the output is truly set this
way. If not, or if a failure occurs, this code is the default code (set to true). The codes returned by this board complement the codes
supplied by the application: Struei for a false output, and Sfalsei for a true output.
The sequence obtained for read back is: Read back = ( )∑
=
−+
1-n
0i
iiii ).Strueδ(1.Sfalseδ  + D
The Dynamic Controller adds the data read back to the sequence supplied by the vital application, then it compares the sequence obtained
with the expected dated signature. If they are different, then all of the outputs are forced to the restrictive state (low logic state),
irreversibly. Remark: as the output code and output code read back are complementary, their sum equals a constant that is independent of
the output's value (Struei + Sfalsei). The expected signature is therefore truly a pre-determinable constant.
Expected sequence = Redundancy + Read back – D = ( )∑
=
+
1-n
0i
ii SfalseStrue  + S + D = Expected signature + D
The diagram below presents the control sequence and the hardware architecture of the vital Inputs/Outputs.
2nd European Congress ERTS - 9 - 21 – 22 – 23 January 2004
On/Off
outputs
Input
code
Redundant outputs, vital
computation signatures
Read back
outputs
Safe clocks
Output values
Vital
input
boards
Vital outputs
board
On/Off inputs
Dynamic
Controllers
Vital
computations Output vital
power supply
Vital
time
base
Figure 3 Control sequence for DIGISAFE® XME I/O and the vital software
The I/O are secured by failsafe circuits which guarantee the following behavior:
• The input interface circuit allows the transfer of a binary sequence only if the input is supplied (supplied = true).
• The output interface circuit lets a binary sequence pass only if no power is transferred from the vital power supply to the
outputs.
These binary sequences are transformed into Strue and Sfalse control codes by the digital circuits on the vital Input/Output boards. They
are transferred to the vital software and the dynamic controller via the DIGISAFE® XME network. The codes are compacted on
transmission, avoiding any saturation of network bandwidth.
The input codes are checked by the vital software. The output codes (Redundancy and Read back), the vital software static and dynamic
signatures (dated tracer) are cyclically checked by the dynamic controller.
Receiving correct data lets the dynamic controller supply, via the safe time base clock, coherent commands to the vital power supply
which produces the necessary power for validating outputs. The slightest error causes this power supply to drop out and consequently the
vital outputs become passive ones.
4.2. Inter-equipment Messages
The vital inter-equipment messages are sent in compacted form: only one control field, that is called redundancy, for the entire message.
The redundancy depends on the non-vital values of the message information as well as on the tracer ( the tracer includes all processing
errors that may occur ).
This type of coding makes the choice of transmission media a transparent one. In reception mode, redundancy is only in the code if the
processing that led to generating message data was performed without errors and only if transmission did not alter the message.
Given that the equipment is not synchronized, message redundancy cannot be dated. A vital logical time value is necessary to guarantee
the freshness of the data exchanged. This time value is a coded value that changes with every vital application cycle. It is attached to the
data in every message. The receiving application has to use a vital function to check the freshness of the logical time of each message.
All of the occurrences of a given message have the same signature (whether they are sent by the same sender at different clock cycles, or
whether they are sent by different senders of the same type). As redundancy depends on the message data values, it guarantees integrity
(it is impossible to mix the data from two occurrences of the message without damaging their signatures).
The vital functions do not protect against message loss. The applications are designed to be message loss tolerant. If this protection is
necessary, it should be implemented at the application level (acknowledgement).
5. PERFORMANCE LEVELS AND CHARACTERISTICS
5.1. Performance Levels
The cycle duration of the applications developed to date are between 48 and 360 ms. This application cycle is split into message sub-
cycles of 6 to 20 ms depending on the programming.
2nd European Congress ERTS - 10 - 21 – 22 – 23 January 2004
The table below shows the maximum delay, guaranteed by the vital system, after which the system will be placed in a restrictive state
(dynamic controller drop out) following a failure:
Failure Response time
Controlled output, incorrectly
permissive
Two message cycles by activating DIGISAFE® XME computer error
correction + 10 ms for the vital power supply to drop out
Controlled output incorrectly
restrictive
Action not dangerous, no vital function surveillance
Vital software fault One application cycle + 1 to 4 message cycles depending on configuration
Restrictive input seen to be permissive
incorrectly
Two application cycles + 1 to 4 message cycles depending on configuration
Permissive input seen to be restrictive
incorrectly
Action not dangerous, no vital function surveillance
The existing vital (coded) operations cover assignments, arithmetic
and logic operations, integer comparisons, reading and writing
arrays, tests and loops.
This trend shows changes in vital software performance with the
different technologies used. The reference is represented by the first
version (V1) which entered service in 1988 and where processing is
entirely software implemented. The next point represents a version
entered service in 1998 (V2) and which implements specific
signature processing in a coprocessor. This version represents a
ratio of 7 gain in performance. The latest version, XME, brings a
ratio of 80 gain in performance compared with the original
technology implemented in 1988.
The XME architecture is now a mature one and will follow
technological evolutions. The natural gain is estimated to be a
factor 2 progression every 3 to 4 years. XME technology was
developed in 2000. Now in 2003, the technologies are available to
increase performance by a factor of two: Pentium 1.5 GHz, cPCI 32 bits/66 MHz and FPGA ALTERA Stratix.
5.2. Limits on Use
Independently of its performance and characteristics, the XME computer imposes some limitations in terms of its operation. The vital
software especially uses a sub-set of Ada language close to that of certified computers. The main limitations are:
• Typing restricted to integer and Boolean values, arrays of integer and Boolean values, floating point calculations not available
• No reentrance nor recursivness
• Only one cyclic task is run by the vital software (multi-tasking was available in the previous version but not used)
• The maximum number of vital On/Off I/O that can be handled by a computer, excluding an I/O concentrator, is limited to 72
boards (one input board = 8 inputs, one output board = 7 outputs) or for example 384 vital inputs and 168 vital outputs (a set of
I/O managed by a concentrator in its maximum configuration could reach up to eight times more I/O).
5.3. Characteristics
The DIGISAFE® XME computer meets the applicable standards for operating in a railway environment in mechanical, thermal and
EMC terms.
• Mechanical: EN 61373: Railway Applications – Rolling stock equipment shock and vibration tests.
• Thermal: EN 50155: Railway applications electronic equipment used on rolling stock, EN 50125-1: Railway applications
environmental conditions for equipment, Class T3 (-25°C to +85°C ambient).
2nd European Congress ERTS - 11 - 21 – 22 – 23 January 2004
• EMC (ElectroMagnetic Compatibility): EN 50121: Railway applications – Electromagnetic Compatibility.
Dimensions: The table below presents a few examples of DIGISAFE® XME computer sizing.
Hardware module Width Depth Height Remark
Complete redundant XME on-board computer: vital software and I/O
(30 inputs and 15 outputs), positioning.
44 cm 32 cm 38 cm 9U 19"
Additional draw for handling 80 vital inputs and 35 vital outputs, 32
non-vital inputs and 32 non-vital outputs.
44 cm 37 cm 26.5 cm 6U 19"
long format
MSCA extension for software and message vital functions. 7.4 cm 15 cm 1.35 cm PMC extension
board
6. AVAILABILITY: BUILT-IN REDUNDANCY
6.1. Introduction
To improve system availability, a redundant layout implementing two identical hardware and software, inherently safe, units is used. A
failure affecting one of them should not affect the system's overall availability.
Ensuring the safety of the complete system poses a major problem: the two vital applications are inherently safe, but as they are
independent, they may apply antagonistic commands affecting the environment (if their dynamic contexts are incoherent or diverge); the
result can therefore be behavior that is on the whole anti-dependable.
A simple principle that will correct this problem is to ensure that the computers are perfectly identical at all times, ensuring especially
that their data contexts are rigorously equal [1]. Implementing this principle is however difficult and takes up a lot of processing time and
bandwidth on the inter-computer communications link. This is why DIGISAFE® XME's built-in redundancy protocol has been adapted
to improve the performance achieved.
6.2. Processing
Of the two units, one is said to be active (sending messages) and the other passive. From a vital point of view, the guarantee is that at
most one of the units is active. Each unit's
vital function knows whether it is active or
passive.
The two units are linked by a two-way link
allowing message exchanges.
In redundant mode, the two units run the
following protocol:
• Acquire their own local vital
inputs (On/Off, phonic wheel, beacons…)
and remote ones (messages);
• Compare the inputs to the two
redundant units and force themselves to use
the same inputs by applying a decision
making algorithm when they are divergent;
• Each perform their own
processing;
• Use vital functions to check processing coherence between the two units;
• Apply their own outputs.
To guarantee the use of the same inputs by the two units, they are synchronized.
Active
unit
Passive
unit
Local
inputs Messages
Local outputs
Messages
Local
inputs Messages
OR
Switching
mechanism
Two-way link
Active PA Active PA
Synchronization
Fig. 3 Redundant architecture
2nd European Congress ERTS - 12 - 21 – 22 – 23 January 2004
The local outputs from the two units are in a hardware OR configuration. Only the active unit sends its remote outputs.
Two operating modes are defined:
• Redundant mode: when the system is operating normally (no failures, no divergence, …), the two units apply the protocol;
• Isolated mode: when system operation is ensured by only one unit, the active one. It uses its own inputs and applies its outputs.
The passive unit simply acquires its own inputs, retrieves the active unit's operating context so that it can prepare its own
reintegration and applies On/Off outputs in the restrictive state.
To change state and/or operating mode, actions may be taken by the units:
• initialization: after a reset, one unit is in isolated mode. If the two units are initialized, one becomes active and the other
passive. If only one unit is reset, and the other is active, it will restart in passive mode;
• reintegration: used to changes from the isolated mode to the redundant mode.
The non-operational unit functionally aligns its dynamic context with that of the active unit. When the contexts are validated as
identical in vital terms, the unit is deemed to be reintegrated. The computer then returns to the standard redundant operating
mode.
• switching: in the event of an active unit failure, switching changes a unit to the passive state, and can then switch the redundant
unit to the active state. Non vital  switching is used;
• passivizing: lets the active unit change the passive one to non operational mode (the active unit then runs in isolated mode).
Data exchanges between the two units are performed using vital messages. The logical time mechanism is used to guarantee the
freshness of the information exchanged (loosing On/Off inputs, using inputs that are too old, … is forbidden). The logical time also
allows a software check to ensure synchronization between the two units.
As the built-in redundancy protocol is safety critical, it uses vital processing techniques.
6.3. Built-in Redundancy Software
The built-in redundancy software comprises on the one hand, generic services, and on the other hand a host software making it possible
to automatically create the redundancy code that is specific to the vital application, from its I/O and its dynamic context as provided by
the Signature Predetermination Tool. This specific code calls on the generic services.
Only a very few of the application's specific parameters need to be filled-in manually should they differ from the default values.
2nd European Congress ERTS - 13 - 21 – 22 – 23 January 2004
7. CERTIFICATION
The latest version of the DIGISAFE® XME advanced vital computer is SIL4 certified by TUV. This means that it ensures that events
with catastrophic consequences are made "impossible".
The previous version of this computer (the version used on Line 14 of the Paris metro) is RATP (Paris transport authority) validated and
SIL4 certified by TUV.
The safety level achieved by the DIGISAFE® XME computer is better than the target of 10-11 undetected failures per hour, a level
required of the most demanding systems. This level of safety can be proven.
The aim is to ensure that the hourly rate of occurrence for dangerous failures remains below 10-11 per hour: Pdangerous_Error ≤ 10-11 / h.
The absolute probability Pabs of not detecting an error is linked to the size of the coding used: Pabs = 1/code_size = 1/248 = 3.55.10-15.
If λ is the average failure rate per hour affecting the entire system, we can show that: Pdangerous_error ≤ λ × Pabs.
It is therefore enough to have: λ × Pabs ≤ 10-11 / h. This is largely the case, as the realistic values for λ are some 10-2/h for a controlled
system and 10-3/h for an automatic system.
The validation approach is a different one from conventional fault injection approaches. As safety is independent of the hardware used
(microprocessor, memories, transmission media, bus…), of the computer system platform (real-time monitor), the production system
(compiler, link editor), the DIGISAFE® XME computer is validated by simply analyzing its specifications and the VHDL
implementation (subject to implementation constraints).
As vital processing functions are isolated in specific hardware (Inputs/Output boards, dynamic controller, coprocessor), computer
validation is restricted to the analysis of this equipment.
8. CONCLUSION
Digitizing vital functions our control systems led to the development of the DIGISAFE® XME computer. Vital safety functions rely on
data coding so that failures can be detected independently of the hardware used.
The computer has evolved significantly since; the latest version. The main improvements cover:
• Performance: a gain in excess of 80 since the first version of DIGISAFE® XME.
• Openness to standards: a standard CPU board, a cPCI bus. This openness will allow the computer to evolve further in terms of
performance in line with technology.
• Availability: DIGISAFE® XME proposes a generic and transparent redundancy management protocol for vital applications. It
almost entirely eliminates the need for redundancy related system security studies and simplifies the coding of redundant
applications.
• Modularity and versatility: adapting DIGISAFE® XME to different usage configurations: secure processing unit, remote I/O
module and complete computer with high performance positioning functions.
With the latest generation of computer, DIGISAFE® XME technology has achieved a level of performance and availability that ensures
that it can meet the needs of even the most demanding applications. Its open architecture means that it can take advantage of on-going
improvements to processors, busses and communications. Through this open architecture, the DIGISAFE® XME computer is usable in
any field where stringent safety and availability demands are made.
References
[1] D. Essamé, J. Arlat, D. Powell. PADRE: a protocol for asymmetric duplex redundancy. IFIP, 7th Working Conference on Dependable
Computing in Critical Applications (DCCA-7), San Jose, CA, USA, January 1999.
[2] P. Behm, P. Benoit, A. Faivre, J.-M. Meynadier. Météor: a successful Application of B in a Large Project, LNCS 1708, Springer-
Verlag Berlin Heidelberg 1999.
[3] C. Alves, P. Forin: A speckle correlation speed sensor: Special addition of the Instrumentation, measurement and metrology
magazine: Sensors and signal processing in guided transportation systems - Hermes science publication - Librairie Lavoisier volume 2
No. 1-2/2002
