14 research outputs found
Establishing a framework for dynamic risk management in 'intelligent' aero-engine control
The behaviour of control functions in safety critical software systems is typically bounded to prevent the occurrence of known system level hazards. These bounds are typically derived through safety analyses and can be implemented through the use of necessary design features. However, the unpredictability of real world problems can result in changes in the operating context that may invalidate the behavioural bounds themselves, for example, unexpected hazardous operating contexts as a result of failures or degradation. For highly complex problems it may be infeasible to determine the precise desired behavioural bounds of a function that addresses or minimises risk for hazardous operation cases prior to deployment. This paper presents an overview of the safety challenges associated with such a problem and how such problems might be addressed. A self-management framework is proposed that performs on-line risk management. The features of the framework are shown in context of employing intelligent adaptive controllers operating within complex and highly dynamic problem domains such as Gas-Turbine Aero Engine control. Safety assurance arguments enabled by the framework necessary for certification are also outlined
Computer safety, reliability, and security: 27th International Conference, SAFECOMP 2008 Newcastle upon Tyne, UK, September 22-25, 2008 Proceedings
This book constitutes the refereed proceedings of the 27th International Conference on Computer Safety, Reliability, and Security, SAFECOMP 2008, held in Newcastle upon Tyne, UK, in September 2008. The 32 revised full papers presented together with 3 keynote papers and a panel session were carefully reviewed and selected from 115 submissions. The papers are organized in topical sections on software dependability, resilience, fault tolerance, security, safety cases, formal methods, dependability modelling, as well as security and dependability
Quantitative dependability and interdependency models for large-scale cyber-physical systems
Cyber-physical systems link cyber infrastructure with physical processes through an integrated network of physical components, sensors, actuators, and computers that are interconnected by communication links. Modern critical infrastructures such as smart grids, intelligent water distribution networks, and intelligent transportation systems are prominent examples of cyber-physical systems. Developed countries are entirely reliant on these critical infrastructures, hence the need for rigorous assessment of the trustworthiness of these systems. The objective of this research is quantitative modeling of dependability attributes -- including reliability and survivability -- of cyber-physical systems, with domain-specific case studies on smart grids and intelligent water distribution networks. To this end, we make the following research contributions: i) quantifying, in terms of loss of reliability and survivability, the effect of introducing computing and communication technologies; and ii) identifying and quantifying interdependencies in cyber-physical systems and investigating their effect on fault propagation paths and degradation of dependability attributes.
Our proposed approach relies on observation of system behavior in response to disruptive events. We utilize a Markovian technique to formalize a unified reliability model. For survivability evaluation, we capture temporal changes to a service index chosen to represent the extent of functionality retained. In modeling of interdependency, we apply correlation and causation analyses to identify links and use graph-theoretical metrics for quantifying them. The metrics and models we propose can be instrumental in guiding investments in fortification of and failure mitigation for critical infrastructures. To verify the success of our proposed approach in meeting these goals, we introduce a failure prediction tool capable of identifying system components that are prone to failure as a result of a specific disruptive event. Our prediction tool can enable timely preventative actions and mitigate the consequences of accidental failures and malicious attacks --Abstract, page iii
Software development in the post-PC era : towards software development as a service
PhD ThesisEngineering software systems is a complex task which involves various stakeholders
and requires planning and management to succeed. As the role of software in our daily
life is increasing, the complexity of software systems is increasing. Throughout the
short history of software engineering as a discipline, the development practises and
methods have rapidly evolved to seize opportunities enabled by new technologies
(e.g., the Internet) and to overcome economical challenges (e.g., the need for cheaper
and faster development).
Today, we are witnessing the Post-PC era. An era which is characterised by mobility and
services. An era which removes organisational and geographical boundaries. An era
which changes the functionality of software systems and requires alternative methods
for conceiving them.
In this thesis, we envision to execute software development processes in the cloud.
Software processes have a software production aspect and a management aspect. To
the best of our knowledge, there are no academic nor industrial solutions supporting the
entire software development process life-cycle(from both production and management
aspects and its tool-chain execution in the cloud.
Our vision is to use the cloud economies of scale and leverage Model-Driven Engineering
(MDE) to integrate production and management aspects into the development
process. Since software processes are seen as workflows, we investigate using existing
Workflow Management Systems to execute software processes and we find that these
systems are not suitable. Therefore, we propose a reference architecture for Software
Development as a Service (SDaaS). The SDaaS reference architecture is the first proposal
which fully supports development of complex software systems in the cloud.
In addition to the reference architecture, we investigate three specific related challenges
and propose novel solutions addressing them. These challenges are:
Modelling & enacting cloud-based executable software processes. Executing
software processes in the cloud can bring several benefits to software develop
ment. In this thesis, we discuss the benefits and considerations of cloud-based
software processes and introduce a modelling language for modelling such processes.
We refer to this language as EXE-SPEM. It extends the Software and Systems
Process Engineering (SPEM2.0) OMG standard to support creating cloudbased
executable software process models. Since EXE-SPEM is a visual modelling
language, we introduce an XML notation to represent EXE-SPEM models
in a machine-readable format and provide mapping rules from EXE-SPEM to
this notation. We demonstrate this approach by modelling an example software
process using EXE-SPEM and mapping it to the XML notation. Software process
models expressed in this XML format can then be enacted in the proposed SDaaS
architecture.
Cost-e cient scheduling of software processes execution in the cloud. Software
process models are enacted in the SDaaS architecture as workflows. We
refer to them sometimes as Software Workflows. Once we have executable software
process models, we need to schedule them for execution. In a setting where
multiple software workflows (and their activities) compete for shared computational
resources (workflow engines), scheduling workflow execution becomes
important. Workflow scheduling is an NP-hard problem which refers to the allocation
of su cient resources (human or computational) to workflow activities.
The schedule impacts the workflow makespan (execution time) and cost as well as
the computational resources utilisation. The target of the scheduling is to reduce
the process execution cost in the cloud without significantly a ecting the process
makespan while satisfying the special requirements of each process activity (e.g.,
executing on a private cloud). We adapt three workflow scheduling algorithms
to fit for SDaaS and propose a fourth one; the Proportional Adaptive Task Schedule.
The algorithms are then evaluated through simulation. The simulation results
show that the our proposed algorithm saves between 19.74% and 45.78% of the
execution cost, provides best resource (VM) utilisation and provides the second
best makespan compared to the other presented algorithms.
Evaluating the SDaaS architecture using a case study from the safety-critical
systems domain. To evaluate the proposed SDaaS reference architecture, we
instantiate a proof-of-concept implementation of the architecture. This imple
mentation is then used to enact safety-critical processes as a case study.
Engineering safety-critical systems is a complex task which involves multiple
stakeholders. It requires shared and scalable computation to systematically involve
geographically distributed teams. In this case study, we use EXE-SPEM to
model a portion of a process (namely; the Preliminary System Safety Assessment
- PSSA) adapted from the ARP4761 [2] aerospace standard. Then, we enact this
process model in the proof-of-concept SDaaS implementation.
By using the SDaaS architecture, we demonstrate the feasibility of our approach
and its applicability to di erent domains and to customised processes. We also
demonstrate the capability of EXE-SPEM to model cloud-based executable processes.
Furthermore, we demonstrate the added value of the process models and
the process execution provenance data recorded by the SDaaS architecture. This
data is used to automate the generation of safety cases argument fragments. Thus,
reducing the development cost and time. Finally, the case study shows that we
can integrate some existing tools and create new ones as activities used in process
models.
The proposed SDaaS reference architecture (combined with its modelling, scheduling
and enactment capabilities) brings the benefits of the cloud to software development. It
can potentially save software production cost and provide an accessible platform that
supports collaborating teams (potentially across di erent locations). The executable
process models support unified interpretation and execution of processes across team(s)
members. In addition, the use of models provide managers with global awareness and
can be utilised for quality assurance and process metrics analysis and improvement.
We see the contributions provided in this thesis as a first step towards an alternative
development method that uses the benefits of cloud and Model-Driven Engineering to
overcome existing challenges and open new opportunities. However, there are several
challenges that are outside the scope of this study which need to be addressed to allow
full support of the SDaaS vision (e.g., supporting interactive workflows). The solutions
provided in this thesis address only part of a bigger vision. There is also a need for
empirical and usability studies to study the impact of the SDaaS architecture on both
the produced products (in terms of quality, cost, time, etc.) and the participating
stakeholders
Recommended from our members
The Effectiveness of <i>t</i>-Way Test Data Generation
Modern society is increasingly dependent on the correct functioning of software and increasingly so in areas that are considered safety related or safety critical. Therefore, there is an increasing need to be able to verify and validate that the software is in fact correct and will perform its intended function. Many approaches to this problem have been proposed; however, none seems likely to supplant the role of testing in the near future.
If we accept that there is, and will be, a continuing need to be able to test software then the question becomes one of how can this be done effectively, both in terms of ability to detect errors and in terms of cost. One avenue of research that offers prospects of improving both of these aspects is the automatic generation of test data.
There has recently been a large amount of work conducted in this area. One particularly promising direction has been the application of ideas from the field of experimental design and in particular, the field of t-way adequate factorial designs.
The area however, is not without issues; there is evidence that the technique is capable of detecting errors but that evidence is not unequivocal. Moreover, as with almost all work in the area of automatic test generation, there has been very little comparative work comparing the technique with other test data generation techniques. Worse, there has been effectively no work done that compares any automatic test data generation technique with the effectiveness of tests generated by humans. Another major issue with the technique is the number of tests that applying the technique can result in. This implies that there is a need for an automated oracle if the technique is to be successfully applied. The flaw with this is of course that in most situations the oracle is the human that is conducting the tests, a point often ignored in testing research.
The work presented here addresses both of these points. To do this I have used a code base taken from an industrial engine control system that has an existing set of high quality unit tests developed by hand. To complement this, several other techniques for automatically generating test data have been applied, namely random testing, random experimental designs and a technique for generating single factor experiments. To address the issue of being able to compare the error detection ability of all of the sets of test vectors, rather than the usual effectiveness surrogates of code coverage I have used mutation analysis on the code base to directly measure the ability of each set of test vectors to discover common coding errors. The results presented here show that test data generation techniques based on t-way factorial designs are at least as effective as handgenerated tests and superior to random testing and the factor experimental technique.
The oracle problem associated with the factorial design techniques was addressed using a test set minimisation approach. The mutation tool monitored which vectors could “kill” which code mutants. After a subset of the test vectors had been run, the most effective vectors were retained and the rest discarded. Likewise, mutants that were killed were removed from further consideration and the process repeated. Experimental results show that this minimisation procedure is effective at reducing computational overhead and is capable of producing final sets of test vectors that are comparable in size with the sets of hand-generated tests and so amenable to final hand checking
Flexible management of bandwidth and redundancy in fieldbuses
Doutoramento em Engenharia ElectrotécnicaOs sistemas distribuídos embarcados (Distributed Embedded Systems – DES) têm sido usados ao longo dos últimos anos em muitos domínios de aplicação, da robótica, ao controlo de processos industriais passando pela aviónica e pelas aplicações veiculares, esperando-se que esta tendência continue nos próximos anos.
A confiança no funcionamento é uma propriedade importante nestes domínios de aplicação, visto que os serviços têm de ser executados em tempo útil e de forma previsível, caso contrário, podem ocorrer danos económicos ou a vida de seres humanos poderá ser posta em causa.
Na fase de projecto destes sistemas é impossível prever todos os cenários de falhas devido ao não determinismo do ambiente envolvente, sendo necessária a inclusão de mecanismos de tolerância a falhas.
Adicionalmente, algumas destas aplicações requerem muita largura de banda, que também poderá ser usada para a evolução dos sistemas, adicionandolhes novas funcionalidades.
A flexibilidade de um sistema é uma propriedade importante, pois permite a sua adaptação às condições e requisitos envolventes, contribuindo também para a simplicidade de manutenção e reparação. Adicionalmente, nos sistemas
embarcados, a flexibilidade também é importante por potenciar uma melhor utilização dos, muitas vezes escassos, recursos existentes.
Uma forma evidente de aumentar a largura de banda e a tolerância a falhas dos sistemas embarcados distribuídos é a replicação dos barramentos do sistema. Algumas soluções existentes, quer comerciais quer académicas,
propõem a replicação dos barramentos para aumento da largura de banda ou para aumento da tolerância a falhas. No entanto e quase invariavelmente, o propósito é apenas um, sendo raras as soluções que disponibilizam uma maior
largura de banda e um aumento da tolerância a falhas. Um destes raros exemplos é o FlexRay, com a limitação de apenas ser permitido o uso de dois
barramentos.
Esta tese apresentada e discute uma proposta para usar a replicação de barramentos de uma forma flexível com o objectivo duplo de aumentar a
largura de banda e a tolerância a falhas. A flexibilidade dos protocolos propostos também permite a gestão dinâmica da topologia da rede, sendo o número de barramentos apenas limitado pelo hardware/software.
As propostas desta tese foram validadas recorrendo ao barramento de campo
CAN – Controller Area Network, escolhido devido à sua grande implantação no mercado. Mais especificamente, as soluções propostas foram implementadas e validadas usando um paradigma que combina flexibilidade com comunicações event-triggered e time-triggered: o FTT – Flexible Time- Triggered. No entanto, uma generalização para CAN nativo é também apresentada e discutida.
A inclusão de mecanismos de replicação do barramento impõe a alteração dos antigos protocolos de replicação e substituição do nó mestre, bem como a definição de novos protocolos para esta finalidade. Este trabalho tira partido da
arquitectura centralizada e da replicação do nó mestre para suportar de forma eficiente e flexível a replicação de barramentos. Em caso de ocorrência de uma falta num barramento (ou barramentos) que poderia provocar uma falha no sistema, os protocolos e componentes propostos nesta tese fazem com que o sistema reaja, mudando para um modo de funcionamento degradado. As
mensagens que estavam a ser transmitidas nos barramentos onde ocorreu a falta são reencaminhadas para os outros barramentos.
A replicação do nó mestre baseia-se numa estratégia líder-seguidores (leaderfollowers),
onde o líder (leader) controla todo o sistema enquanto os seguidores (followers) servem como nós de reserva. Se um erro ocorrer no nó
líder, um dos nós seguidores passará a controlar o sistema de uma forma transparente e mantendo as mesmas funcionalidades.
As propostas desta tese foram também generalizadas para CAN nativo, tendo sido para tal propostos dois componentes adicionais. É, desta forma possível ter as mesmas capacidades de tolerância a falhas ao nível dos barramentos
juntamente com a gestão dinâmica da topologia de rede.
Todas as propostas desta tese foram implementadas e avaliadas. Uma
implementação inicial, apenas com um barramento foi avaliada recorrendo a uma aplicação real, uma equipa de futebol robótico onde o protocolo FTT-CAN foi usado no controlo de movimento e da odometria.
A avaliação do sistema com múltiplos barramentos foi feita numa plataforma de teste em laboratório. Para tal foi desenvolvido um sistema de injecção de faltas que permite impor faltas nos barramentos e nos nós mestre, e um sistema de medida de atrasos destinado a medir o tempo de resposta após a ocorrência de uma falta.Distributed embedded systems (DES) have been widely used in the last few decades in several application domains, from robotics, industrial process control, avionics and automotive. In fact, it is expectable that this trend will continue in the next years.
In some of these application fields the dependability requirements are very important since the fail to provide services in a timely and predictable manner
may cause important economic losses or even put humans in risk.
In the design phase it is impossible to predict all the possible scenarios of faults, due to the non deterministic behaviour of the surrounding environment.
In that way, the fault tolerance mechanisms must be included in the distributed embedded system to prevent failures occurrence.
Also, many application domains require a high available bandwidth to perform the desired functions, or to turn possible the scaling with the addition of new features.
The flexibility of a system also plays an important role, since it improves the capability to adapt to the surrounding world, and to the simplicity of the repair
and maintenance. The flexibility improves the efficiency of all the system by providing a way to efficiently manage the available resources. This is very important in embedded systems due to the limited resources often available.
A natural way to improve the bandwidth and the fault tolerance in distributed systems is to use replicated buses. Commercial and academic solutions propose the use of replicated fieldbuses for a single purpose only, either to improve the fault tolerance or to improve the available bandwidth, being the first the most common. One illustrative exception is FlexRay where the bus replica
can be used to improve the bandwidth of the overall system, besides enabling redundant communications. However, only one bus replica can be used.
In this thesis, a flexible bus replication scheme to improve both the dependability and the throughput of fieldbuses is presented and studied. It can
be applied to any number of replicated buses, provided the required hardware support is available. The flexible use of the replicated buses can achieve an also flexible management of the network topology.
This claim has been validated using the Controller Area Network (CAN) fieldbus, which has been chosen because it is widely spread in millions of
systems. In fact, the proposed solution uses a paradigm that combines flexibility, time and event triggered communication, that is the Flexible Time-
Triggered over CAN network (FTT-CAN). However, a generalization to native CAN is also presented and studied.
The inclusion of bus replication in FTT-CAN imposes not only new mechanisms but also changes of the mechanisms associated with the master replication, which has been already studied in previous research work. In this work, these
mechanisms were combined and take advantage of the centralized architecture and of the redundant masters to support an efficient and flexible bus
replication.
When considering the system operation, if a fault in the bus (or buses) occurs, and the consequent error leads to a system failure, the system reacts,
switching to a degraded mode, where the message flows that were transmitted in the faulty bus (or buses) change to the non-faulty ones.
The central node replication uses a leader-follower strategy, where the leader controls the system while the followers serve as backups. If an error occurs in
the leader, a backup will take the system control maintaining the system with the same functionalities.
The system has been generalized for native CAN, using two additional components that provide the same fault tolerance capabilities at the bus level,
and also enable the dynamic management of the network topology.
All the referred proposals were implemented and assessed in the scope of this work. The single bus version of FTT-CAN was assessed using a real
application, a robotic soccer team, which has obtained excellent results in international competitions. There, the FTT-CAN based embedded system has been applied in the low level control, where, mainly it is responsible for the
motion control and odometry.
For the case of the multiple buses system, the assessment was performed in a laboratory test bed. For this, a fault injector was developed in order to impose faults in the buses and in the central nodes. To measure the time reaction of the system, a special hardware has been developed: a delay measurement system. It is able to measure delays between two important time marks for posterior offline analysis of the obtained values
Recommended from our members
Problem Oriented Engineering for Software Safety
Safety critical systems must satisfy stringent safety standards and there development requires the use of specialist safe software system development (SSSD) approaches as the complexity and penetration of these systems increases. These SSSD approaches satisfy certain useful properties that make them suitable for safety system development. The first objective of this thesis is to select a candidate SSSD approach and evaluate its capabilities against a set of useful properties identified from reviewing a group of existing SSSD approaches, and thus show that this candidate SSSD approach is appropriate for use in safety system development.
In addition, a second objective is to use this candidate SSSD approach to improve the early life cycle phase of an existing industrial safety development process used to develop embedded avionics applications. In particular to allow issues to be resolved earlier in the development, which are currently not being uncovered until much later in the development when they are much more difficult and expensive to correct. This involved the identification of further properties and issues that the candidate SSSD approach must address.
The overall aim is to demonstrate that this candidate SSSD approach can be used in the early phase of a safety system development to derive a validated specification that can be subjected to safety analysis to show that it satisfies the identified system safety properties and thus forms a viable basis for the rest of the development
On the real world practice of Behaviour Driven Development
Surveys of industry practice over the last decade suggest that Behaviour Driven Development is a popular Agile practice. For example, 19% of respondents to the 14th State of Agile annual survey reported using BDD, placing it in the top 13 practices reported. As well as potential benefits, the adoption of BDD necessarily involves an additional cost of writing and maintaining Gherkin features and scenarios, and (if used for acceptance testing,) the associated step functions. Yet there is a lack of published literature exploring how BDD is used in practice and the challenges experienced by real world software development efforts. This gap is significant because without understanding current real world practice, it is hard to identify opportunities to address and mitigate challenges. In order to address this research gap concerning the challenges of using BDD, this thesis reports on a research project which explored: (a) the challenges of applying agile and undertaking requirements engineering in a real world context; (b) the challenges of applying BDD specifically and (c) the application of BDD in open-source projects to understand challenges in this different context.
For this purpose, we progressively conducted two case studies, two series of interviews, four iterations of action research, and an empirical study. The first case study was conducted in an avionics company to discover the challenges of using an agile process in a large scale safety critical project environment. Since requirements management was found to be one of the biggest challenges during the case study, we decided to investigate BDD because of its reputation for requirements management. The second case study was conducted in the company with an aim to discover the challenges of using BDD in real life. The case study was complemented with an empirical study of the practice of BDD in open source projects, taking a study sample from the GitHub open source collaboration site.
As a result of this Ph.D research, we were able to discover: (i) challenges of using an agile process in a large scale safety-critical organisation, (ii) current state of BDD in practice, (iii) technical limitations of Gherkin (i.e., the language for writing requirements in BDD), (iv) challenges of using BDD in a real project, (v) bad smells in the Gherkin specifications of open source projects on GitHub. We also presented a brief comparison between the theoretical description of BDD and BDD in practice. This research, therefore, presents the results of lessons learned from BDD in practice, and serves as a guide for software practitioners planning on using BDD in their projects