14 research outputs found

    Establishing a framework for dynamic risk management in 'intelligent' aero-engine control

    Get PDF
    The behaviour of control functions in safety critical software systems is typically bounded to prevent the occurrence of known system level hazards. These bounds are typically derived through safety analyses and can be implemented through the use of necessary design features. However, the unpredictability of real world problems can result in changes in the operating context that may invalidate the behavioural bounds themselves, for example, unexpected hazardous operating contexts as a result of failures or degradation. For highly complex problems it may be infeasible to determine the precise desired behavioural bounds of a function that addresses or minimises risk for hazardous operation cases prior to deployment. This paper presents an overview of the safety challenges associated with such a problem and how such problems might be addressed. A self-management framework is proposed that performs on-line risk management. The features of the framework are shown in context of employing intelligent adaptive controllers operating within complex and highly dynamic problem domains such as Gas-Turbine Aero Engine control. Safety assurance arguments enabled by the framework necessary for certification are also outlined

    Computer safety, reliability, and security: 27th International Conference, SAFECOMP 2008 Newcastle upon Tyne, UK, September 22-25, 2008 Proceedings

    No full text
    This book constitutes the refereed proceedings of the 27th International Conference on Computer Safety, Reliability, and Security, SAFECOMP 2008, held in Newcastle upon Tyne, UK, in September 2008. The 32 revised full papers presented together with 3 keynote papers and a panel session were carefully reviewed and selected from 115 submissions. The papers are organized in topical sections on software dependability, resilience, fault tolerance, security, safety cases, formal methods, dependability modelling, as well as security and dependability

    Quantitative dependability and interdependency models for large-scale cyber-physical systems

    Get PDF
    Cyber-physical systems link cyber infrastructure with physical processes through an integrated network of physical components, sensors, actuators, and computers that are interconnected by communication links. Modern critical infrastructures such as smart grids, intelligent water distribution networks, and intelligent transportation systems are prominent examples of cyber-physical systems. Developed countries are entirely reliant on these critical infrastructures, hence the need for rigorous assessment of the trustworthiness of these systems. The objective of this research is quantitative modeling of dependability attributes -- including reliability and survivability -- of cyber-physical systems, with domain-specific case studies on smart grids and intelligent water distribution networks. To this end, we make the following research contributions: i) quantifying, in terms of loss of reliability and survivability, the effect of introducing computing and communication technologies; and ii) identifying and quantifying interdependencies in cyber-physical systems and investigating their effect on fault propagation paths and degradation of dependability attributes. Our proposed approach relies on observation of system behavior in response to disruptive events. We utilize a Markovian technique to formalize a unified reliability model. For survivability evaluation, we capture temporal changes to a service index chosen to represent the extent of functionality retained. In modeling of interdependency, we apply correlation and causation analyses to identify links and use graph-theoretical metrics for quantifying them. The metrics and models we propose can be instrumental in guiding investments in fortification of and failure mitigation for critical infrastructures. To verify the success of our proposed approach in meeting these goals, we introduce a failure prediction tool capable of identifying system components that are prone to failure as a result of a specific disruptive event. Our prediction tool can enable timely preventative actions and mitigate the consequences of accidental failures and malicious attacks --Abstract, page iii

    Software development in the post-PC era : towards software development as a service

    Get PDF
    PhD ThesisEngineering software systems is a complex task which involves various stakeholders and requires planning and management to succeed. As the role of software in our daily life is increasing, the complexity of software systems is increasing. Throughout the short history of software engineering as a discipline, the development practises and methods have rapidly evolved to seize opportunities enabled by new technologies (e.g., the Internet) and to overcome economical challenges (e.g., the need for cheaper and faster development). Today, we are witnessing the Post-PC era. An era which is characterised by mobility and services. An era which removes organisational and geographical boundaries. An era which changes the functionality of software systems and requires alternative methods for conceiving them. In this thesis, we envision to execute software development processes in the cloud. Software processes have a software production aspect and a management aspect. To the best of our knowledge, there are no academic nor industrial solutions supporting the entire software development process life-cycle(from both production and management aspects and its tool-chain execution in the cloud. Our vision is to use the cloud economies of scale and leverage Model-Driven Engineering (MDE) to integrate production and management aspects into the development process. Since software processes are seen as workflows, we investigate using existing Workflow Management Systems to execute software processes and we find that these systems are not suitable. Therefore, we propose a reference architecture for Software Development as a Service (SDaaS). The SDaaS reference architecture is the first proposal which fully supports development of complex software systems in the cloud. In addition to the reference architecture, we investigate three specific related challenges and propose novel solutions addressing them. These challenges are: Modelling & enacting cloud-based executable software processes. Executing software processes in the cloud can bring several benefits to software develop ment. In this thesis, we discuss the benefits and considerations of cloud-based software processes and introduce a modelling language for modelling such processes. We refer to this language as EXE-SPEM. It extends the Software and Systems Process Engineering (SPEM2.0) OMG standard to support creating cloudbased executable software process models. Since EXE-SPEM is a visual modelling language, we introduce an XML notation to represent EXE-SPEM models in a machine-readable format and provide mapping rules from EXE-SPEM to this notation. We demonstrate this approach by modelling an example software process using EXE-SPEM and mapping it to the XML notation. Software process models expressed in this XML format can then be enacted in the proposed SDaaS architecture. Cost-e cient scheduling of software processes execution in the cloud. Software process models are enacted in the SDaaS architecture as workflows. We refer to them sometimes as Software Workflows. Once we have executable software process models, we need to schedule them for execution. In a setting where multiple software workflows (and their activities) compete for shared computational resources (workflow engines), scheduling workflow execution becomes important. Workflow scheduling is an NP-hard problem which refers to the allocation of su cient resources (human or computational) to workflow activities. The schedule impacts the workflow makespan (execution time) and cost as well as the computational resources utilisation. The target of the scheduling is to reduce the process execution cost in the cloud without significantly a ecting the process makespan while satisfying the special requirements of each process activity (e.g., executing on a private cloud). We adapt three workflow scheduling algorithms to fit for SDaaS and propose a fourth one; the Proportional Adaptive Task Schedule. The algorithms are then evaluated through simulation. The simulation results show that the our proposed algorithm saves between 19.74% and 45.78% of the execution cost, provides best resource (VM) utilisation and provides the second best makespan compared to the other presented algorithms. Evaluating the SDaaS architecture using a case study from the safety-critical systems domain. To evaluate the proposed SDaaS reference architecture, we instantiate a proof-of-concept implementation of the architecture. This imple mentation is then used to enact safety-critical processes as a case study. Engineering safety-critical systems is a complex task which involves multiple stakeholders. It requires shared and scalable computation to systematically involve geographically distributed teams. In this case study, we use EXE-SPEM to model a portion of a process (namely; the Preliminary System Safety Assessment - PSSA) adapted from the ARP4761 [2] aerospace standard. Then, we enact this process model in the proof-of-concept SDaaS implementation. By using the SDaaS architecture, we demonstrate the feasibility of our approach and its applicability to di erent domains and to customised processes. We also demonstrate the capability of EXE-SPEM to model cloud-based executable processes. Furthermore, we demonstrate the added value of the process models and the process execution provenance data recorded by the SDaaS architecture. This data is used to automate the generation of safety cases argument fragments. Thus, reducing the development cost and time. Finally, the case study shows that we can integrate some existing tools and create new ones as activities used in process models. The proposed SDaaS reference architecture (combined with its modelling, scheduling and enactment capabilities) brings the benefits of the cloud to software development. It can potentially save software production cost and provide an accessible platform that supports collaborating teams (potentially across di erent locations). The executable process models support unified interpretation and execution of processes across team(s) members. In addition, the use of models provide managers with global awareness and can be utilised for quality assurance and process metrics analysis and improvement. We see the contributions provided in this thesis as a first step towards an alternative development method that uses the benefits of cloud and Model-Driven Engineering to overcome existing challenges and open new opportunities. However, there are several challenges that are outside the scope of this study which need to be addressed to allow full support of the SDaaS vision (e.g., supporting interactive workflows). The solutions provided in this thesis address only part of a bigger vision. There is also a need for empirical and usability studies to study the impact of the SDaaS architecture on both the produced products (in terms of quality, cost, time, etc.) and the participating stakeholders

    Flexible management of bandwidth and redundancy in fieldbuses

    Get PDF
    Doutoramento em Engenharia ElectrotécnicaOs sistemas distribuídos embarcados (Distributed Embedded Systems – DES) têm sido usados ao longo dos últimos anos em muitos domínios de aplicação, da robótica, ao controlo de processos industriais passando pela aviónica e pelas aplicações veiculares, esperando-se que esta tendência continue nos próximos anos. A confiança no funcionamento é uma propriedade importante nestes domínios de aplicação, visto que os serviços têm de ser executados em tempo útil e de forma previsível, caso contrário, podem ocorrer danos económicos ou a vida de seres humanos poderá ser posta em causa. Na fase de projecto destes sistemas é impossível prever todos os cenários de falhas devido ao não determinismo do ambiente envolvente, sendo necessária a inclusão de mecanismos de tolerância a falhas. Adicionalmente, algumas destas aplicações requerem muita largura de banda, que também poderá ser usada para a evolução dos sistemas, adicionandolhes novas funcionalidades. A flexibilidade de um sistema é uma propriedade importante, pois permite a sua adaptação às condições e requisitos envolventes, contribuindo também para a simplicidade de manutenção e reparação. Adicionalmente, nos sistemas embarcados, a flexibilidade também é importante por potenciar uma melhor utilização dos, muitas vezes escassos, recursos existentes. Uma forma evidente de aumentar a largura de banda e a tolerância a falhas dos sistemas embarcados distribuídos é a replicação dos barramentos do sistema. Algumas soluções existentes, quer comerciais quer académicas, propõem a replicação dos barramentos para aumento da largura de banda ou para aumento da tolerância a falhas. No entanto e quase invariavelmente, o propósito é apenas um, sendo raras as soluções que disponibilizam uma maior largura de banda e um aumento da tolerância a falhas. Um destes raros exemplos é o FlexRay, com a limitação de apenas ser permitido o uso de dois barramentos. Esta tese apresentada e discute uma proposta para usar a replicação de barramentos de uma forma flexível com o objectivo duplo de aumentar a largura de banda e a tolerância a falhas. A flexibilidade dos protocolos propostos também permite a gestão dinâmica da topologia da rede, sendo o número de barramentos apenas limitado pelo hardware/software. As propostas desta tese foram validadas recorrendo ao barramento de campo CAN – Controller Area Network, escolhido devido à sua grande implantação no mercado. Mais especificamente, as soluções propostas foram implementadas e validadas usando um paradigma que combina flexibilidade com comunicações event-triggered e time-triggered: o FTT – Flexible Time- Triggered. No entanto, uma generalização para CAN nativo é também apresentada e discutida. A inclusão de mecanismos de replicação do barramento impõe a alteração dos antigos protocolos de replicação e substituição do nó mestre, bem como a definição de novos protocolos para esta finalidade. Este trabalho tira partido da arquitectura centralizada e da replicação do nó mestre para suportar de forma eficiente e flexível a replicação de barramentos. Em caso de ocorrência de uma falta num barramento (ou barramentos) que poderia provocar uma falha no sistema, os protocolos e componentes propostos nesta tese fazem com que o sistema reaja, mudando para um modo de funcionamento degradado. As mensagens que estavam a ser transmitidas nos barramentos onde ocorreu a falta são reencaminhadas para os outros barramentos. A replicação do nó mestre baseia-se numa estratégia líder-seguidores (leaderfollowers), onde o líder (leader) controla todo o sistema enquanto os seguidores (followers) servem como nós de reserva. Se um erro ocorrer no nó líder, um dos nós seguidores passará a controlar o sistema de uma forma transparente e mantendo as mesmas funcionalidades. As propostas desta tese foram também generalizadas para CAN nativo, tendo sido para tal propostos dois componentes adicionais. É, desta forma possível ter as mesmas capacidades de tolerância a falhas ao nível dos barramentos juntamente com a gestão dinâmica da topologia de rede. Todas as propostas desta tese foram implementadas e avaliadas. Uma implementação inicial, apenas com um barramento foi avaliada recorrendo a uma aplicação real, uma equipa de futebol robótico onde o protocolo FTT-CAN foi usado no controlo de movimento e da odometria. A avaliação do sistema com múltiplos barramentos foi feita numa plataforma de teste em laboratório. Para tal foi desenvolvido um sistema de injecção de faltas que permite impor faltas nos barramentos e nos nós mestre, e um sistema de medida de atrasos destinado a medir o tempo de resposta após a ocorrência de uma falta.Distributed embedded systems (DES) have been widely used in the last few decades in several application domains, from robotics, industrial process control, avionics and automotive. In fact, it is expectable that this trend will continue in the next years. In some of these application fields the dependability requirements are very important since the fail to provide services in a timely and predictable manner may cause important economic losses or even put humans in risk. In the design phase it is impossible to predict all the possible scenarios of faults, due to the non deterministic behaviour of the surrounding environment. In that way, the fault tolerance mechanisms must be included in the distributed embedded system to prevent failures occurrence. Also, many application domains require a high available bandwidth to perform the desired functions, or to turn possible the scaling with the addition of new features. The flexibility of a system also plays an important role, since it improves the capability to adapt to the surrounding world, and to the simplicity of the repair and maintenance. The flexibility improves the efficiency of all the system by providing a way to efficiently manage the available resources. This is very important in embedded systems due to the limited resources often available. A natural way to improve the bandwidth and the fault tolerance in distributed systems is to use replicated buses. Commercial and academic solutions propose the use of replicated fieldbuses for a single purpose only, either to improve the fault tolerance or to improve the available bandwidth, being the first the most common. One illustrative exception is FlexRay where the bus replica can be used to improve the bandwidth of the overall system, besides enabling redundant communications. However, only one bus replica can be used. In this thesis, a flexible bus replication scheme to improve both the dependability and the throughput of fieldbuses is presented and studied. It can be applied to any number of replicated buses, provided the required hardware support is available. The flexible use of the replicated buses can achieve an also flexible management of the network topology. This claim has been validated using the Controller Area Network (CAN) fieldbus, which has been chosen because it is widely spread in millions of systems. In fact, the proposed solution uses a paradigm that combines flexibility, time and event triggered communication, that is the Flexible Time- Triggered over CAN network (FTT-CAN). However, a generalization to native CAN is also presented and studied. The inclusion of bus replication in FTT-CAN imposes not only new mechanisms but also changes of the mechanisms associated with the master replication, which has been already studied in previous research work. In this work, these mechanisms were combined and take advantage of the centralized architecture and of the redundant masters to support an efficient and flexible bus replication. When considering the system operation, if a fault in the bus (or buses) occurs, and the consequent error leads to a system failure, the system reacts, switching to a degraded mode, where the message flows that were transmitted in the faulty bus (or buses) change to the non-faulty ones. The central node replication uses a leader-follower strategy, where the leader controls the system while the followers serve as backups. If an error occurs in the leader, a backup will take the system control maintaining the system with the same functionalities. The system has been generalized for native CAN, using two additional components that provide the same fault tolerance capabilities at the bus level, and also enable the dynamic management of the network topology. All the referred proposals were implemented and assessed in the scope of this work. The single bus version of FTT-CAN was assessed using a real application, a robotic soccer team, which has obtained excellent results in international competitions. There, the FTT-CAN based embedded system has been applied in the low level control, where, mainly it is responsible for the motion control and odometry. For the case of the multiple buses system, the assessment was performed in a laboratory test bed. For this, a fault injector was developed in order to impose faults in the buses and in the central nodes. To measure the time reaction of the system, a special hardware has been developed: a delay measurement system. It is able to measure delays between two important time marks for posterior offline analysis of the obtained values

    On the real world practice of Behaviour Driven Development

    Get PDF
    Surveys of industry practice over the last decade suggest that Behaviour Driven Development is a popular Agile practice. For example, 19% of respondents to the 14th State of Agile annual survey reported using BDD, placing it in the top 13 practices reported. As well as potential benefits, the adoption of BDD necessarily involves an additional cost of writing and maintaining Gherkin features and scenarios, and (if used for acceptance testing,) the associated step functions. Yet there is a lack of published literature exploring how BDD is used in practice and the challenges experienced by real world software development efforts. This gap is significant because without understanding current real world practice, it is hard to identify opportunities to address and mitigate challenges. In order to address this research gap concerning the challenges of using BDD, this thesis reports on a research project which explored: (a) the challenges of applying agile and undertaking requirements engineering in a real world context; (b) the challenges of applying BDD specifically and (c) the application of BDD in open-source projects to understand challenges in this different context. For this purpose, we progressively conducted two case studies, two series of interviews, four iterations of action research, and an empirical study. The first case study was conducted in an avionics company to discover the challenges of using an agile process in a large scale safety critical project environment. Since requirements management was found to be one of the biggest challenges during the case study, we decided to investigate BDD because of its reputation for requirements management. The second case study was conducted in the company with an aim to discover the challenges of using BDD in real life. The case study was complemented with an empirical study of the practice of BDD in open source projects, taking a study sample from the GitHub open source collaboration site. As a result of this Ph.D research, we were able to discover: (i) challenges of using an agile process in a large scale safety-critical organisation, (ii) current state of BDD in practice, (iii) technical limitations of Gherkin (i.e., the language for writing requirements in BDD), (iv) challenges of using BDD in a real project, (v) bad smells in the Gherkin specifications of open source projects on GitHub. We also presented a brief comparison between the theoretical description of BDD and BDD in practice. This research, therefore, presents the results of lessons learned from BDD in practice, and serves as a guide for software practitioners planning on using BDD in their projects
    corecore