As the transition towards integrated programmable systems gradually takes place in the automotive industry, there is clearly a need to ensure that new onboard systems and networks will deliver safety-related services with at least the degree of reliability that similar services have been delivered by conventional systems in the past.
As the transition towards integrated programmable systems gradually takes place in the automotive industry, there is clearly a need to ensure that new onboard systems and networks will deliver safety-related services with at least the degree of reliability that similar services have been delivered by conventional systems in the past.
Current trends show that in the near future we will see the emergence of integrated safety related electronic systems that will be able to share a multiplicity of sensor data over a common computing infrastructure to provide new functions that will improve the active safety of vehicles. Such systems will include driver assistance systems such as (the so called) intelligent bywire braking, steering, electronic stability, and obstacle/ collision avoidance systems. These systems will run on networked architectures and assume direct electronic control of the steering, braking, suspension and powertrain functionality, taking actions that would depend on the current driving conditions and environmental influences.
The majority of early such systems in vehicles today either rely on a mechanical or hydraulic backup or can be switched off in case of a significant electronic failure without compromising the safety of the vehicle. Future by-wire systems without backup, though, will not necessarily have this fail-safe property. The development of sound processes for the engineering and assessment of such systems, therefore, becomes increasingly more essential. Malfunctions caused by hardware faults have to be modelled properly at the system specification level, so that error detection and correction mechanisms can be developed as early as possible to avoid expensive design iterations. In addition, robust and practicable safety assessment processes must be applied to ensure the integrity of (hardware & software) designs and to show that any safety and reliability requirements have been met.
In this paper, we discuss our work towards the development of a safety analysis method that largely automates and simplifies those two tasks. The analysis in the proposed method is performed using an algorithm for the automatic synthesis of fault trees that can be applied on the system at different stages of its design. At the early stages of the design, the algorithm can generate fault trees on the basis of abstract functional specifications of the system. Mechanical analysis of those fault trees, cut-set analysis for example, can help to identify conceptual design flaws and refine the initial design. This process can then be repeated down to the low levels of the hardware and software implementation. By partly mechanising the safety analysis process, the proposed method could help, we believe, in managing the increasing complexity that the automotive sector is likely to experience in safety assessments with the introduction of safety-related integrated driver assistance systems in the future.
CURRENT PRACTICE AND NEW DEVELOPMENTS IN SAFETY ANALYSIS
Currently, safety assessment processes evolve to deal with changes in the technology of programmable electronic systems. Guidelines and standards are emerging, for example, in which hazard and safety analysis occupy a central position in the design life cycle of such systems. Such guidelines applicable in the automotive sector include those developed by the Motor Industry Software Reliability Association, and IEC-61508, the generic international standard on safety related programmable electronic systems (1) . New standards define elaborate processes in which safety analysis transcends the hierarchy of the system design. The recommended processes typically start early in the design with functional hazard assessment, and then proceed in various forms through to the analysis of lowlevel system architectures using a variety of different techniques. The suggested forms of analyses include well-established classical safety analysis techniques such as Fault Tree Analysis (FTA), Hazard and Operability Studies (HAZOP) and Failure Modes and Effects Analysis (FMEA).
Emerging standards indicate that safety assessment is increasingly being recognised as an important component of the design life cycle. On the other hand, though, very little guidance exists on precisely how and when to do it in programmable systems. This, we believe, is largely due to the lack of mature techniques for software hazard and safety analysis. Over the years, classical safety analysis techniques have been applied successfully at system level and on hardware architectures. The real question, though, is whether those techniques are also applicable in programmable systems, or in other words whether the application of those techniques could be effectively extended to the analysis of the software architecture of programmable components in a system. Recent attempts to adapt classical safety analysis techniques such as HAZOP and FTA on software and programmable systems have not yet achieved the degree of maturity that would enable their successful application in complex systems. In computer HAZOP (2-3), for example, the analysis remains a predominantly manual activity in which analysts are called to identify and relate hazards by examining data flows in a software architecture. As systems become more complex though, manually performed hazard analyses become tedious, error prone, time-consuming and beyond a certain level of complexity practically infeasible. An alternative approach to software hazard analysis is the Leveson template approach to software fault tree analysis (4) . Software fault trees here are synthesised from generic templates that define the effects of low-level failure modes specified at code level. The method introduces a useful degree of automation in the assessment of software. The problem, though, is that, in practice, it is difficult to determine how the vast numbers of instruction-level failure modes examined in this approach become relevant at the level of software requirements or architecture where we typically specify and verify safety requirements. Beyond such conceptual difficulties, it is also important to point out a lack of appropriate accompanying concepts for tool support that could enable the practical evaluation of the above techniques in realistic contexts of application.
Thus, despite some theoretical contributions to the problem, no technique has yet found widespread application in complex environments. To address the existing difficulties, there is clearly a need for improving the current concept of software hazard analysis. In addition, any improved techniques need to be supplemented with appropriate tools that could enable their practical application in industrial projects. In this paper, we outline an approach to safety analysis in which concepts of computer HAZOP are fused with the idea of software fault tree analysis to enable a continuous assessment of an evolving programmable design. We also report on a tool that we have developed to support this approach and enable its application in complex environments.
Our work arises out of SETTA (Systems Engineering for Time-Triggered Architectures), a European Commission funded research project concerned with the development of an innovative process and tools for the engineering and assessment of distributed, timetriggered, safety critical computing architectures. A number of companies interested in developing such systems in the automotive sector, such as DaimlerChrysler, Renault and Siemens, participate in this project. In the area of safety analysis, our work builds upon the concept of automatic fault tree synthesis that we have developed in earlier work (see, for example, early results in the ESPRIT project Time Triggered Architectures) (5-6).
In SETTA, our first aim is to show that this concept can be effectively applied at different levels of abstraction in the design process. Our second aim is to develop a robust tool that could enable the large scale evaluation of automatic fault tree synthesis and ultimately its introduction in the current development processes of our industrial partners.
In the remainder of the paper, we discuss our progress on those two issues. Firstly, we outline the fault tree synthesis concept and discuss its application in the course of a continuous life-cycle safety assessment process. Secondly, we discuss the architecture of a tool that enables the application of that concept on designs currently produced using Matlab-Simulink, a wellknown specification and simulation tool, widely used in the automotive industry.
IMPROVING SYSTEM AND SOFTWARE SAFETY ANALYSIS
The proposed safety assessment process starts early in the design with the assessment of an abstract functional model of the system. This model is a functional block diagram which gives a high level view of the functional structure of the system. The model identifies system functions and material energy or data flows among those functions and between the system and its environment. Figure 1 is clearly an abstract functional model in which no actual references to particular hardware have been made. In the proposed method, the first step in the analysis of such abstract models is a form of hazard analysis that we apply on the components of the architecture.
The aim of this analysis is to derive for each component of the architecture a model of its local failure behaviour. This model records the failures that the component itself generates and the effects of those failures on the component outputs.
In addition, the model shows how the component responds to failures arriving at the component inputs, whether it mitigates or whether it transforms and propagates those failures from inputs to outputs. It is beyond our aims here to discuss the details of this technique. For such details the reader is referred to (5).
For the purposes of this paper it is sufficient to say that the application of this technique on a system generates a set of logical expressions that relate the output failures of each component to internal malfunctions and deviations of that component's inputs. Figure 1 , for example, shows that an omission of the output of function C will be caused either by a malfunction of C or by a simultaneous omission of both inputs of C.
Similarly, an omission of the output of function B will be caused either by a malfunction of B or by an omission of either input of B.
One important attribute of this modelling and analysis process is that the model of the system and the local hazard analyses of its components can be used in order to mechanically derive the global propagation of failure in the system. This propagation is captured in a set of automatically constructed fault trees for the system which define how the failure modes that analysts have specified in the local hazard analyses of components propagate through connections in the model and cause failures at the outputs of the system. Figure 2 for example shows the fault tree that can be mechanically constructed from the model and analyses of Figure 1 . Further mechanical analysis of this fault tree, minimal cut-set analysis for example, can help us to identify potentially weak areas of the design and direct our efforts in the improvement and refinement of the initial model. Such weak points may include single points of failure or hazardous dependencies between functions on common inputs. The cut-set analysis of the illustrated fault tree, for example, will immediately show that despite the AND gate, the omission of input b represents a single point of failure for the given design. The analysis suggests, therefore, that replicating this input is something that we should consider at the early stage of the design.
It is worth pointing out that analysts can also supplement the local failure models of components with reliability data such as component failure rates. These failure rates are later on embedded in the structure of the synthesised fault trees and enable the quantitative reliability evaluation of the given design. The results from this analysis would normally lead to a design iteration and a further refinement of the model.
In the course of that refinement process, functions are initially decomposed into networks of lower level subfunctions as in figure 3 where function B is decomposed into an architecture that contains sub-functions B1, B2 and B3. As this decomposition takes place, the model is also gradually enriched with architectural information about the allocation of functions to hardware. At that stage, composite elements in the model (eg. B in the example of Figure 3 ) start to represent programmable entities, processors for example enclosing the software tasks running on those processors.
As the model is being refined, the analysis of each subsequent version of the model is performed using the hazard analysis technique that we have briefly outlined in this section. The more detailed that the model becomes, though, the more fine-grained are the failure modes that we specify in the analysis. Thus, when we decompose a design module, we also develop more refined analyses for the components contained in the architecture of that module (see, for example, refined analyses in Figure 3) . Following that decomposition, the initial analysis of the module becomes redundant. However, the new information about the hierarchical organisation of the module provides new opportunities for using the failure model at the level of the composite module for capturing hazardous dependencies in the vertical axis of the hierarchy. If, for example, the enclosing component in Figure 3 (B) lies in an environment where there is excessive electromagnetic interference, then all the enclosed components (B1, B2 & B3) are susceptible to this hazard. It makes sense therefore to determine the effect of this condition at the level of the enclosing component. Also, in a similar way we can determine the effects of other types of spatial or environmental dependencies such as temperature or pressure or, in the case of software modules, the dependency on a common computing infrastructure (i.e. hardware or operating system). Figure 3 precisely illustrates the general concept for the representation and analysis of programmable hardware used in the framework of the techniques that we propose.
Let us assume, for example, that B is a programmable component that encapsulates a network of software tasks (B1, B2, B3). The hazard analysis of such a component could be performed as follows. At the level where the component is represented as a composite element (higher level in the model), we examine and record the direct effect of hardware failures to the outputs of the component. This makes sense, since hardware is typically a common resource shared by all the functional (software) modules of the component, and therefore, a hardware failure will typically impact all software modules. A failure of a processor, for example, will often cause an omission of all the outputs of a controller. It therefore makes sense to examine hardware failures separately, and in a direct and collective fashion.
At the level where the functional structure of the component is described (lower level in the model), we perform a hazard analysis of each task using the technique that we have outlined in this section. The analysis at this level treats software tasks as functions and records how each task responds to omission, commission, timing or value failures propagated by other tasks. Also, how possible internal logical defects in the implementation of each task could affect the outputs of the task. Collectively, the analyses of all tasks show how the software of the controller responds to failures arriving at the controller inputs and how input failures or possible logical errors in the design of that software may propagate and ultimately corrupt the controller outputs.
As the refinement of the model proceeds in the course of the life cycle, the new refined analyses can always be used for the automatic synthesis of new and more detailed fault trees for the system. The analysis of those fault trees can once more help to establish whether the current design satisfies the given safety requirements or to identify weak areas that need to be re-designed. It is worth pointing out that, currently, design iterations that naturally occur in the course of the life cycle create enormous difficulties in safety analysis and introduce additional costs. In contrast, such design iterations would not pose problems to the proposed method, as new fault trees could be automatically re-constructed following of course certain changes in the model and the underlying hazard analyses.
TOOL SUPPORT FOR INDUSTRIAL APPLICATION OF THE METHOD
We have shown that the proposed method potentially automates and simplifies the safety assessment of a programmable system. However, robust tool support is required to realise this potential for automation. In this section, therefore, we briefly discuss the principles and architecture of a tool that we have developed to support the method and enable its application in complex environments. The aim of this tool is to integrate the proposed method, and the fault tree synthesis algorithm, in an environment of already established industrial tools that consists of a popular functional modelling tool (Matlab/Simulink, from Mathworks) and a popular fault tree analysis tool (Fault Tree Plus, from Isograph).
The tool assists the continuous safety analysis of a programmable system. The basis of that analysis is always a model of the system developed in Matlab/ Simulink. Simulink models provide a wealth of information about components, their hierarchical relationships and architectural dependencies. At the same time, though, those models lack any information about the local failure behaviour of those components. To remedy this problem, we have extended Simulink with an editor that enables analysts to annotate components with failure information represented in the form of the hazard analysis that we introduced in the preceding section. We have also built a parser that can analyse annotated Simulink model files and a fault tree synthesiser that operates on such models to generate fault trees for the corresponding systems. The resultant fault trees are written in the binary format of a Fault Tree Plus project file and can be imported in that tool for further analysis and reliability evaluation purposes.
Note that there are no restrictions imposed on the size of the model or on the type of components that could be used for the development of the model. The tool can handle multiplexing and de-multiplexing of flows, recognise control loops and handle indirectly relayed control signals and implicit communication links. Another important feature of the tool is its ability to generate fault trees from incomplete models, where some components do not have their own local hazard analysis. Such components are marked as propagators and when they are encountered during the traversal of the model, the fault tree synthesiser assumes that they only propagate input failures to outputs. Such features have already helped us to deal with realistic models and apply the tool on significantly complex case studies.
Indeed, the concepts that we have outlined here have been (or are currently being) applied on a number of prototypes which include an aircraft fuel system, a brake-by-wire system for cars and an aircraft cabin pressure control system. Figure 4 illustrates, for example, a distant view of a fault tree that the tool has generated for an early version of a prototypical brakeby-wire system for cars currently being developed in SETTA by a consortium of automotive companies. 
CONCLUSIONS AND FURTHER WORK
In this paper, we have shown that it is both possible and practical to generate safety analyses from design models by augmenting the design descriptions with information in failure propagation. This has the beneficial effect of keeping the safety analyses consistent with the design. There is not yet sufficient project experience to show how cost-effective this approach is, but early experience suggests that it is relatively easy to produce the definition of "local" failure propagation, hence the technique is likely to give economic benefit. However the approach we have developed does not provide a "complete" solution. There are many issues which also need to be addressed, but we focus on two which are particularly pertinent.
First, we have not discussed how we obtain evidence that the failure propagation of the software modules is as predicted, or no worse than predicted. However we note that our approach gives the opportunity for some targeted, hazard directed, analysis of the design and implementation, either using formal techniques or testing. For example it may be possible to use model checking on state machines representing MatlabSimulink blocks to verify the failure propagation. Alternatively, we could employ hazard directed testing, e.g. using heuristics to search for data which simulate failure conditions, and thus to verify the failure propagation (7).
Second, there is an issue of how we allocate requirements for failure propagation in order to make safety more a controlled facet of the design process rather than allowing it to become an emergent property. The civil aerospace standards (8) recommend the use of fault trees to do preliminary system safety assessment (PSSA) whereby they allocate failure rate budgets to failure modes of components. These techniques need some extension to be fully effective but could naturally complement the technique described above.
Integrating work in these three areas would give the ability to set safety requirements using a variant of PSSA, assess the ability of the design to meet those requirements, using the techniques described in this paper, then provide evidence that the requirements are met in implementation. This would be a major contribution to the effectiveness of the software safety process.
