16 research outputs found

    A Method to Reduce the Cost of Resilience Benchmarking of SelfAdaptive Systems

    Get PDF
    Ensuring the resilience of self-adaptive systems used in critical infrastructure systems is a concern as their failure has severe societal and financial consequences. The current trends in the growth of the scale and complexity of society\u27s workload demands and the systems built to cope with these demands increases the anxiety surrounding service disruptions. Self-adaptive mechanisms instill dynamic behavior to systems in an effort to improve their resilience to runtime changes that would otherwise result in service disruption or failure, such as faults, errors, and attacks. Thus, the evaluation of a self-adaptive system\u27s resilience is critical to ensure expected operational qualities and elicit trust in their services. However, resilience benchmarking is often overlooked or avoided due to the high cost associated with evaluating the runtime behavior of large and complex self-adaptive systems against an almost infinite number of possible runtime changes. Researchers have focused on techniques to reduce the overall costs of benchmarking while ensuring the comprehensiveness of the evaluation as testing costs have been found to account for 50 to 80% of total system costs. These test suite minimization techniques include the removal of irrelevant, redundant, and repetitive test cases to ensure that only relevant tests that adequately elicit the expected system responses are enumerated. However, these approaches require an exhaustive test suite be defined first and then the irrelevant tests are filtered out, potentially negating any cost savings. This dissertation provides a new approach of defining a resilience changeload for self-adaptive systems by incorporating goal-oriented requirements engineering techniques to extract system information and guide the identification of relevant runtime changes. The approach constructs a goal refinement graph consisting of the system\u27s refined goals, runtime actions, self-adaptive agents, and underlying runtime assumptions that is used to identify obstructing conditions to runtime goal attainment. Graph theory is then used to gauge the impact of obstacles on runtime goal attainment and those that exceed the relevance requirement are included in the resilience changeload for enumeration. The use of system knowledge to guide the changeload definition process increased the relevance of the resilience changeload while minimizing the test suite, resulting in a reduction of overall benchmarking costs. Analysis of case study results confirmed that the new approach was more cost effective on the same subject system over previous work. The new approach was shown to reduce the overall costs by 79.65%, increase the relevance of the defined test suite, reduce the amount of wasted effort, and provide a greater return on investment over previous work by a factor of two

    Performance et fiabilité des protocoles de tolérance aux fautes

    Get PDF
    In the modern era of on-demand ubiquitous computing, where applications and services are deployed in well-provisioned, well-managed infrastructures, administered by large groups of cloud providers such as Amazon, Google, Microsoft, Oracle, etc., performance and dependability of the systems have become primary objectives.Cloud computing has evolved from questioning the Quality-of-Service (QoS) making factors such as availability, reliability, liveness, safety and security, extremely necessary in the complete definition of a system. Indeed, computing systems must be resilient in the presence of failures and attacks to prevent their inaccessibility which can lead to expensive maintenance costs and loss of business. With the growing components in cloud systems, faults occur more commonly resulting in frequent cloud outages and failing to guarantee the QoS. Cloud providers have seen episodic incidents of arbitrary (i.e., Byzantine) faults where systems demonstrate unpredictable conducts, which includes incorrect response of a client's request, sending corrupt messages, intentional delaying of messages, disobeying the ordering of the requests, etc.This has led researchers to extensively study Byzantine Fault Tolerance (BFT) and propose numerous protocols and software prototypes. These BFT solutions not only provide consistent and available services despite arbitrary failures, they also intend to reduce the cost and performance overhead incurred by the underlying systems. However, BFT prototypes have been evaluated in ad-hoc settings, considering either ideal conditions or very limited faulty scenarios. This fails to convince the practitioners for the adoption of BFT protocols in a distributed system. Some argue on the applicability of expensive and complex BFT to tolerate arbitrary faults while others are skeptical on the adeptness of BFT techniques. This thesis precisely addresses this problem and presents a comprehensive benchmarking environment which eases the setup of execution scenarios to analyze and compare the effectiveness and robustness of these existing BFT proposals.Specifically, contributions of this dissertation are as follows.First, we introduce a generic architecture for benchmarking distributed protocols. This architecture, comprises reusable components for building a benchmark for performance and dependability analysis of distributed protocols. The architecture allows defining workload and faultload, and their injection. It also produces performance, dependability, and low-level system and network statistics. Furthermore, the thesis presents the benefits of a general architecture.Second, we present BFT-Bench, the first BFT benchmark, for analyzing and comparing representative BFT protocols under identical scenarios. BFT-Bench allows end-users evaluate different BFT implementations under user-defined faulty behaviors and varying workloads. It allows automatic deploying these BFT protocols in a distributed setting with ability to perform monitoring and reporting of performance and dependability aspects. In our results, we empirically compare some existing state-of-the-art BFT protocols, in various workloads and fault scenarios with BFT-Bench, demonstrating its effectiveness in practice.Overall, this thesis aims to make BFT benchmarking easy to adopt by developers and end-users of BFT protocols.BFT-Bench framework intends to help users to perform efficient comparisons of competing BFT implementations, and incorporating effective solutions to the detected loopholes in the BFT prototypes. Furthermore, this dissertation strengthens the belief in the need of BFT techniques for ensuring correct and continued progress of distributed systems during critical fault occurrence.A l'ère de l’informatique omniprésente et à la demande, où les applications et les services sont déployés sur des infrastructures bien gérées et approvisionnées par des grands groupes de fournisseurs d’informatique en nuage (Cloud Computing), tels Amazon,Google,Microsoft,Oracle, etc, la performance et la fiabilité de ces systèmes sont devenues des objectifs primordiaux. Cette informatique a rendu particulièrement nécessaire la prise en compte des facteurs de la Qualité de Service (QoS), telles que la disponibilité, la fiabilité, la vivacité, la sureté et la sécurité,dans la définition complète d’un système. En effet, les systèmes informatiques doivent être résistants aussi bien aux défaillances qu’aux attaques et ce, afin d'éviter qu'ils ne deviennent inaccessibles, entrainent des couts de maintenance importants et la perte de parts de marché. L'augmentation de la taille et la complexité des systèmes en nuage rend de plus en plus commun les défauts, augmentant la fréquence des pannes, et n’offrant donc plus la Garantie de Service visée. Les fournisseurs d’informatique en nuage font ainsi face épisodiquement à des fautes arbitraires, dites Byzantines, durant lesquelles les systèmes ont des comportements imprévisibles.Ce constat a amené les chercheurs à s’intéresser de plus en plus à la tolérance aux fautes byzantines (BFT) et à proposer de nombreux prototypes de protocoles et logiciels. Ces solutions de BFT visent non seulement à fournir des services cohérents et continus malgré des défaillances arbitraires, mais cherchent aussi à réduire le coût et l’impact sur les performances des systèmes sous-jacents. Néanmoins les prototypes BFT ont été évalués le plus souvent dans des contextes ad hoc, soit dans des conditions idéales, soit en limitant les scénarios de fautes. C’est pourquoi ces protocoles de BFT n’ont pas réussi à convaincre les professionnels des systèmes distribués de les adopter. Cette thèse entend répondre à ce problème en proposant un environnement complet de banc d’essai dont le but est de faciliter la création de scénarios d'exécution utilisables pour aussi bien analyser que comparer l'efficacité et la robustesse des propositions BFT existantes. Les contributions de cette thèse sont les suivantes :Nous introduisons une architecture générique pour analyser des protocoles distribués. Cette architecture comprend des composants réutilisables permettant la mise en œuvre d’outils de mesure des performances et d’analyse de la fiabilité des protocoles distribués. Cette architecture permet de définir la charge de travail, de défaillance, et l’injection de ces dernières. Elle fournit aussi des statistiques de performance, de fiabilité du système de bas niveau et du réseau. En outre, cette thèse présente les bénéfices d’une architecture générale.Nous présentons BFT-Bench, le premier système de banc d’essai de la BFT, pour l'analyse et la comparaison d’un panel de protocoles BFT utilisés dans des situations identiques. BFT-Bench permet aux utilisateurs d'évaluer des implémentations différentes pour lesquels ils définissent des comportements défaillants avec différentes charges de travail.Il permet de déployer automatiquement les protocoles BFT étudiés dans un environnement distribué et offre la possibilité de suivre et de rendre compte des aspects performance et fiabilité. Parmi nos résultats, nous présentons une comparaison de certains protocoles BFT actuels, réalisée avec BFT-Bench, en définissant différentes charges de travail et différents scénarii de fautes. Cette réelle application de BFT-Bench en démontre l’efficacité.Le logiciel BFT-Bench a été conçu en ce sens pour aider les utilisateurs à comparer efficacement différentes implémentations de BFT et apporter des solutions effectives aux lacunes identifiées des prototypes BFT. De plus, cette thèse défend l’idée que les techniques BFT sont nécessaires pour assurer un fonctionnement continu et correct des systèmes distribués confrontés à des situations critiques

    DEPENDABILITY BENCHMARKING OF NETWORK FUNCTION VIRTUALIZATION

    Get PDF
    Network Function Virtualization (NFV) is an emerging networking paradigm that aims to reduce costs and time-to-market, improve manageability, and foster competition and innovative services. NFV exploits virtualization and cloud computing technologies to turn physical network functions into Virtualized Network Functions (VNFs), which will be implemented in software, and will run as Virtual Machines (VMs) on commodity hardware located in high-performance data centers, namely Network Function Virtualization Infrastructures (NFVIs). The NFV paradigm relies on cloud computing and virtualization technologies to provide carrier-grade services, i.e., the ability of a service to be highly reliable and available, within fast and automatic failure recovery mechanisms. The availability of many virtualization solutions for NFV poses the question on which virtualization technology should be adopted for NFV, in order to fulfill the requirements described above. Currently, there are limited solutions for analyzing, in quantitative terms, the performance and reliability trade-offs, which are important concerns for the adoption of NFV. This thesis deals with assessment of the reliability and of the performance of NFV systems. It proposes a methodology, which includes context, measures, and faultloads, to conduct dependability benchmarks in NFV, according to the general principles of dependability benchmarking. To this aim, a fault injection framework for the virtualization technologies has been designed and implemented for the virtualized technologies being used as case studies in this thesis. This framework is successfully used to conduct an extensive experimental campaign, where we compare two candidate virtualization technologies for NFV adoption: the commercial, hypervisor-based virtualization platform VMware vSphere, and the open-source, container-based virtualization platform Docker. These technologies are assessed in the context of a high-availability, NFV-oriented IP Multimedia Subsystem (IMS). The analysis of experimental results reveal that i) fault management mechanisms are crucial in NFV, in order to provide accurate failure detection and start the subsequent failover actions, and ii) fault injection proves to be valuable way to introduce uncommon scenarios in the NFVI, which can be fundamental to provide a high reliable service in production

    Evaluating the performance of distributed agreement algorithms:tools, methodology and case studies

    Get PDF
    Nowadays, networked computers are present in most aspects of everyday life. Moreover, essential parts of society come to depend on distributed systems formed of networked computers, thus making such systems secure and fault tolerant is a top priority. If the particular fault tolerance requirement is high availability, replication of components is a natural choice. Replication is a difficult problem as the state of the replicas must be kept consistent even if some replicas fail, and because in distributed systems, relying on centralized control or a certain timing behavior is often not feasible. Replication in distributed systems is often implemented using group communication. Group communication is concerned with providing high-level multipoint communication primitives and the associated tools. Most often, an emphasis is put on tolerating crash failures of processes. At the heart of most communication primitives lies an agreement problem: the members of a group must agree on things like the set of messages to be delivered to the application, the delivery order of messages, or the set of processes that crashed. A lot of algorithms to solve agreement problems have been proposed and their correctness proven. However, performance aspects of agreement algorithms have been somewhat neglected, for a variety of reasons: the lack of theoretical and practical tools to help performance evaluation, and the lack of well-defined benchmarks for agreement algorithms. Also, most performance studies focus on analyzing failure free runs only. In our view, the limited understanding of performance aspects, in both failure free scenarios and scenarios with failure handling, is an obstacle for adopting agreement protocols in practice, and is part of the explanation why such protocols are not in widespread use in the industry today. The main goal of this thesis is to advance the state of the art in this field. The thesis has major contributions in three domains: new tools, methodology and performance studies. As for new tools, a simulation and prototyping framework offers a practical tool, and some new complexity metrics a theoretical tool for the performance evaluation of agreement algorithms. As for methodology, the thesis proposes a set of well-defined benchmarks for atomic broadcast algorithms (such algorithms are important as they provide the basis for a number of replication techniques). Finally, three studies are presented that investigate important performance issues with agreement algorithms. The prototyping and simulation framework simplifies the tedious task of developing algorithms based on message passing, the communication model that most agreement algorithms are written for. In this framework, the same implementation can be reused for simulations and performance measurements on a real network. This characteristic greatly eases the task of validating simulation results with measurements (or vice versa). As for theoretical tools, we introduce two complexity metrics that predict performance with more accuracy than the traditional time and message complexity metrics. The key point is that our metrics take account for resource contention, both on the network and the hosts; resource contention is widely recognized as having a major impact on the performance of distributed algorithms. Extensive validation studies have been conducted. Currently, no widely accepted benchmarks exist for agreement algorithms or group communication toolkits, which makes comparing performance results from different sources difficult. In an attempt to consolidate the situation, we define a number of benchmarks for atomic broadcast. Our benchmarks include well-defined metrics, workloads and failure scenarios (faultloads). The use of the benchmarks is illustrated in two detailed case studies. Two widespread mechanisms for handling failures are unreliable failure detectors which provide inconsistent information about failures, and a group membership service which provides consistent information about failures, respectively. We analyze the performance tradeoffs of these two techniques, by comparing the performance of two atomic broadcast algorithms designed for an asynchronous system. Based on our results, we advocate a combined use of the two approaches to failure handling. In another case study, we compare two consensus algorithms designed for an asynchronous system. The two algorithms differ in how they coordinate the decision process: the one uses a centralized and the other a decentralized communication schema. Our results show that the performance tradeoffs are highly affected by a number of characteristics of the environment, like the availability of multicast and the amount of contention on the hosts versus the amount of contention on the network. Famous theoretical results state that a lot of important agreement problems are not solvable in the asynchronous system model. In our third case study, we investigate how these results are relevant for implementations of a replicated service, by conducting an experiment in a local area network. We exposed a replicated server to extremely high loads and required that the underlying failure detection service detects crashes very fast; the latter is important as the theoretical results are based on the impossibility of reliable failure detection. We found that our replicated server continued working even with the most extreme settings. We discuss the reasons for the robustness of our replicated server

    Injecting software faults in Python applications

    Get PDF
    As técnicas de injeção de falhas de software têm sido amplamente utilizadas como meio para avaliar a confiabilidade de sistemas na presença de certos tipos de falhas. Apesar da grande diversidade de ferramentas que oferecem a possibilidade de emular a presença de falhas de software, há pouco suporte prático para emular a presença de falhas de soft ware em aplicações Python, que cada vez mais são usados para suportar serviços cloud críticos para negócios. Nesta tese, apresentamos uma ferramenta (de nome Fit4Python) para injetar falhas de software em código Python e, de seguida, usamo-la para analisar a eficácia da bateria de testes do OpenStack contra estas novas, prováveis, falhas de software. Começamos por analisar os tipos de falhas que afetam o Nova Compute, um componente central do OpenStack. Usamos a nossa ferramenta para emular a presença de novas falhas na API Nova Compute de forma a entender como a bateria de testes unitários, funcionais e de integração do OpenStack cobre essas novas, mas prováveis, situações. Os resultados mostram limitações claras na eficácia da bateria de testes dos programadores do Open Stack, com muitos casos de falhas injetadas a passarem sem serem detectadas por todos os três tipos de testes. Para além disto, observamos que que a maioria dos problemas analisados poderia ser detectada com mudanças ou acréscimos triviais aos testes unitários

    Understanding the Error Behavior of Complex Critical Software Systems through Field Data

    Get PDF
    Software systems are the basis for human everyday activities, which are increasingly dependent on software. Software is an integral part of systems we interact with in our daily life raging form small systems for entertainment and domotics, to large systems and infrastructures that provide fundamental services such as telecommunication, transportation, and financial. In particular, software systems play a key role in the context of critical domains, supporting crucial activities. For example, ground and air transportation, power supply, nuclear plants, and medical applications strongly rely on software systems: failures affecting these systems can lead to severe consequences, which can be catastrophic in terms of business or, even worse, human losses. Therefore, given the growing dependence on software systems in life- and critical-applications, dependability, has become among one of the most relevant industry and research concerns in the last decades. Software faults have been recognized as one of the major cause for system failures since the hardware failure rate has been decreasing over the years. Time and cost constraints, along with technical limitations, often do not allow to fully validate the correctness of the software solely by means of testing; therefore, software might be released with residual faults that activate during operations. The activation of a fault generates errors which propagate through the components of the system, possibly leading to a failure. Therefore, in order to produce reliable software, it is important to understand how errors affect a software system. This is of paramount importance especially in the context of complex critical software systems, where the occurrence of a failure can lead to severe consequences. However, the analysis of the error behavior of this kind of system is not trivial. They are often distributed systems based on many interacting heterogeneous components and layers, including Off-The-Shelf (OTS), third party components and legacy systems. All these aspects, undermine the understanding of the error behavior of complex critical software system. A well established methodology to evaluate the dependability of operational systems and to identify their dependability bottlenecks is represented by field failure data analysis (FFDA), which is based on the monitoring and recording of errors and failures occurred during the operational phase of the system under real workload conditions, i.e., field data. Indeed, direct measurement and analysis of natural failures occurring under real workload conditions is among the most accurate ways to assess dependability characteristics. One of the main sources of field data, are monitoring techniques. The contribution of the thesis is to provide a methodology that allows understanding the error behavior of complex critical software systems by means of field data generated by the monitoring techniques already implemented in the target system. The use of available monitoring techniques allows to overcome the limitations imposed in the context of critical systems, avoiding severe changes in the system, and preserving its functionality and performance. The methodology is based on fault injection experiments that stimulate the target system with different error conditions. Injection experiments allow to accelerate the collection of error data naturally generated by the monitoring techniques already implemented in the system. The collected data are analyzed in order to characterize the behavior of the system under the occurred software errors. To this aim, the proposed methodology leverages a set of innovative means defined in this dissertation, i.e., (i) Error Propagation graphs, which allow to analyze the error propagation phenomena occurred in the target system and that can be inferred by the collected field data, and a set of metrics composed by (ii) Error Determination Degree, which allows gaining insights into the ability of error notifications of a monitoring technique to suggest either the fault that led to the error, or the failure the error led to in the system, (iii) Error Propagation Reportability, which allow understanding the ability of a monitoring technique at reporting the propagation of errors, and (iv) Data Dissimilarity, which allows gaining insights into the suitability of the data generated by the monitoring techniques for failure analysis. The methodology has been experimented on two instances of complex critical software systems in the field of Air Traffic Control (ATC), i.e., a communication middleware supporting data exchanging among ATC applications, and an arrival manager that is responsible for managing flight arrivals to a given airspace, within an industry-academia collaboration in the context of a national research project. Results show that field data generated by means of monitoring techniques already implemented in a complex critical software system can be leveraged to obtain insights about the error behavior exhibited by the target system, as well as about the potential beneficial locations for EDMs and ERMs. In addition, the proposed methodology also allowed to characterize the effectiveness of the monitoring techniques in terms of failure reporting, error propagation reportability, and data dissimilarity

    Anomaly Detection and Fault Localization Using Runtime State Models

    Get PDF
    Software systems are impacting every aspect of our daily lives, making software failures expensive, even life endangering. Despite rigorous testing, software bugs inevitably exist, especially in complex systems. Existing tools to aid debugging, such as tracing, profiling, and logging facilities, reveal the behavior of a program’s execution; however, they require the developers to manually correlate the data to diagnose faults. This work is the first to introduce the Runtime State Model, a summarization of a program’s behavior, for software anomaly detection and fault localization. A Runtime State Model is constructed from variables’ value change events of an execution. It consists of a set of states, and state transitions, where a state is a set of variables with their current values, and a state transition is induced by a variable’s value change. Comparisons between states from difference executions can be conducted to detect software anomalies. Deviations from the healthy states also help explain and locate faults in the source code. To automate this process, we implement Xtract, a facility that automatically extracts runtime traces from the Java Virtual Machines and constructs Runtime State Models for multiple simultaneous Java applications. Our evaluation provides evidence that Runtime State Models might be effective in detecting and locating injected faults to a RUBiS server with Xtract

    From Safety Analysis to Experimental Validation by Fault Injection—Case of Automotive Embedded Systems

    Get PDF
    En raison de la complexité croissante des systèmes automobiles embarqués, la sûreté de fonctionnement est devenue un enjeu majeur de l’industrie automobile. Cet intérêt croissant s’est traduit par la sortie en 2011 de la norme ISO 26262 sur la sécurité fonctionnelle. Les défis auxquelles sont confrontés les acteurs du domaine sont donc les suivants : d’une part, la conception de systèmes sûrs, et d’autre part, la conformité aux exigences de la norme ISO 26262. Notre approche se base sur l’application systématique de l’injection de fautes pour la vérification et la validation des exigences de sécurité, tout au long du cycle de développement, des phases de conception jusqu’à l’implémentation. L’injection de fautes nous permet en particulier de vérifier que les mécanismes de tolérance aux fautes sont efficaces et que les exigences non-fonctionnelles sont respectées. L’injection de faute est une technique de vérification très ancienne. Cependant, son rôle lors de la phase de conception et ses complémentarités avec la validation expérimentale, méritent d’être étudiés. Notre approche s’appuie sur l’application du modèle FARM (Fautes, Activations, Relevés et Mesures) tout au long du processus de développement. Les analyses de sûreté sont le point de départ de notre approche, avec l'identification des mécanismes de tolérance aux fautes et des exigences non-fonctionnelles, et se terminent par la validation de ces mécanismes par les expériences classiques d'injection de fautes. Enfin, nous montrons que notre approche peut être intégrée dans le processus de développement des systèmes embarqués automobiles décrits dans la norme ISO 26262. Les contributions de la thèse sont illustrées sur l’étude de cas d’un système d’éclairage avant d’une automobile. ABSTRACT : Due to the rising complexity of automotive Electric/Electronic embedded systems, Functional Safety becomes a main issue in the automotive industry. This issue has been formalized by the introduction of the ISO 26262 standard for functional safety in 2011. The challenges are, on the one hand to design safe systems based on a systematic verification and validation approach, and on the other hand, the fulfilment of the requirements of the ISO 26262 standard. Following ISO 26262 recommendations, our approach, based on fault injection, aims at verifying fault tolerance mechanisms and non-functional requirements at all steps of the development cycle, from early design phases down to implementation. Fault injection is a verification technique that has been investigated for a long time. However, the role of fault injection during design phase and its complementarities with the experimental validation of the target have not been explored. In this work, we investigate a fault injection continuum, from system design validation to experiments on implemented targets. The proposed approach considers the safety analyses as a starting point, with the identification of safety mechanisms and safety requirements, and goes down to the validation of the implementation of safety mechanisms through fault injection experiments. The whole approach is based on a key fault injection framework, called FARM (Fault, Activation, Readouts and Measures). We show that this approach can be integrated in the development process of the automotive embedded systems described in the ISO 26262 standard. Our approach is illustrated on an automotive case study: a Front-Light system

    Dependability-driven Strategies to Improve the Design and Verification of Safety-Critical HDL-based Embedded Systems

    Full text link
    [ES] La utilización de sistemas empotrados en cada vez más ámbitos de aplicación está llevando a que su diseño deba enfrentarse a mayores requisitos de rendimiento, consumo de energía y área (PPA). Asimismo, su utilización en aplicaciones críticas provoca que deban cumplir con estrictos requisitos de confiabilidad para garantizar su correcto funcionamiento durante períodos prolongados de tiempo. En particular, el uso de dispositivos lógicos programables de tipo FPGA es un gran desafío desde la perspectiva de la confiabilidad, ya que estos dispositivos son muy sensibles a la radiación. Por todo ello, la confiabilidad debe considerarse como uno de los criterios principales para la toma de decisiones a lo largo del todo flujo de diseño, que debe complementarse con diversos procesos que permitan alcanzar estrictos requisitos de confiabilidad. Primero, la evaluación de la robustez del diseño permite identificar sus puntos débiles, guiando así la definición de mecanismos de tolerancia a fallos. Segundo, la eficacia de los mecanismos definidos debe validarse experimentalmente. Tercero, la evaluación comparativa de la confiabilidad permite a los diseñadores seleccionar los componentes prediseñados (IP), las tecnologías de implementación y las herramientas de diseño (EDA) más adecuadas desde la perspectiva de la confiabilidad. Por último, la exploración del espacio de diseño (DSE) permite configurar de manera óptima los componentes y las herramientas seleccionados, mejorando así la confiabilidad y las métricas PPA de la implementación resultante. Todos los procesos anteriormente mencionados se basan en técnicas de inyección de fallos para evaluar la robustez del sistema diseñado. A pesar de que existe una amplia variedad de técnicas de inyección de fallos, varias problemas aún deben abordarse para cubrir las necesidades planteadas en el flujo de diseño. Aquellas soluciones basadas en simulación (SBFI) deben adaptarse a los modelos de nivel de implementación, teniendo en cuenta la arquitectura de los diversos componentes de la tecnología utilizada. Las técnicas de inyección de fallos basadas en FPGAs (FFI) deben abordar problemas relacionados con la granularidad del análisis para poder localizar los puntos débiles del diseño. Otro desafío es la reducción del coste temporal de los experimentos de inyección de fallos. Debido a la alta complejidad de los diseños actuales, el tiempo experimental dedicado a la evaluación de la confiabilidad puede ser excesivo incluso en aquellos escenarios más simples, mientras que puede ser inviable en aquellos procesos relacionados con la evaluación de múltiples configuraciones alternativas del diseño. Por último, estos procesos orientados a la confiabilidad carecen de un soporte instrumental que permita cubrir el flujo de diseño con toda su variedad de lenguajes de descripción de hardware, tecnologías de implementación y herramientas de diseño. Esta tesis aborda los retos anteriormente mencionados con el fin de integrar, de manera eficaz, estos procesos orientados a la confiabilidad en el flujo de diseño. Primeramente, se proponen nuevos métodos de inyección de fallos que permiten una evaluación de la confiabilidad, precisa y detallada, en diferentes niveles del flujo de diseño. Segundo, se definen nuevas técnicas para la aceleración de los experimentos de inyección que mejoran su coste temporal. Tercero, se define dos estrategias DSE que permiten configurar de manera óptima (desde la perspectiva de la confiabilidad) los componentes IP y las herramientas EDA, con un coste experimental mínimo. Cuarto, se propone un kit de herramientas que automatiza e incorpora con eficacia los procesos orientados a la confiabilidad en el flujo de diseño semicustom. Finalmente, se demuestra la utilidad y eficacia de las propuestas mediante un caso de estudio en el que se implementan tres procesadores empotrados en un FPGA de Xilinx serie 7.[CA] La utilització de sistemes encastats en cada vegada més àmbits d'aplicació està portant al fet que el seu disseny haja d'enfrontar-se a majors requisits de rendiment, consum d'energia i àrea (PPA). Així mateix, la seua utilització en aplicacions crítiques provoca que hagen de complir amb estrictes requisits de confiabilitat per a garantir el seu correcte funcionament durant períodes prolongats de temps. En particular, l'ús de dispositius lògics programables de tipus FPGA és un gran desafiament des de la perspectiva de la confiabilitat, ja que aquests dispositius són molt sensibles a la radiació. Per tot això, la confiabilitat ha de considerar-se com un dels criteris principals per a la presa de decisions al llarg del tot flux de disseny, que ha de complementar-se amb diversos processos que permeten aconseguir estrictes requisits de confiabilitat. Primer, l'avaluació de la robustesa del disseny permet identificar els seus punts febles, guiant així la definició de mecanismes de tolerància a fallades. Segon, l'eficàcia dels mecanismes definits ha de validar-se experimentalment. Tercer, l'avaluació comparativa de la confiabilitat permet als dissenyadors seleccionar els components predissenyats (IP), les tecnologies d'implementació i les eines de disseny (EDA) més adequades des de la perspectiva de la confiabilitat. Finalment, l'exploració de l'espai de disseny (DSE) permet configurar de manera òptima els components i les eines seleccionats, millorant així la confiabilitat i les mètriques PPA de la implementació resultant. Tots els processos anteriorment esmentats es basen en tècniques d'injecció de fallades per a poder avaluar la robustesa del sistema dissenyat. A pesar que existeix una àmplia varietat de tècniques d'injecció de fallades, diverses problemes encara han d'abordar-se per a cobrir les necessitats plantejades en el flux de disseny. Aquelles solucions basades en simulació (SBFI) han d'adaptar-se als models de nivell d'implementació, tenint en compte l'arquitectura dels diversos components de la tecnologia utilitzada. Les tècniques d'injecció de fallades basades en FPGAs (FFI) han d'abordar problemes relacionats amb la granularitat de l'anàlisi per a poder localitzar els punts febles del disseny. Un altre desafiament és la reducció del cost temporal dels experiments d'injecció de fallades. A causa de l'alta complexitat dels dissenys actuals, el temps experimental dedicat a l'avaluació de la confiabilitat pot ser excessiu fins i tot en aquells escenaris més simples, mentre que pot ser inviable en aquells processos relacionats amb l'avaluació de múltiples configuracions alternatives del disseny. Finalment, aquests processos orientats a la confiabilitat manquen d'un suport instrumental que permeta cobrir el flux de disseny amb tota la seua varietat de llenguatges de descripció de maquinari, tecnologies d'implementació i eines de disseny. Aquesta tesi aborda els reptes anteriorment esmentats amb la finalitat d'integrar, de manera eficaç, aquests processos orientats a la confiabilitat en el flux de disseny. Primerament, es proposen nous mètodes d'injecció de fallades que permeten una avaluació de la confiabilitat, precisa i detallada, en diferents nivells del flux de disseny. Segon, es defineixen noves tècniques per a l'acceleració dels experiments d'injecció que milloren el seu cost temporal. Tercer, es defineix dues estratègies DSE que permeten configurar de manera òptima (des de la perspectiva de la confiabilitat) els components IP i les eines EDA, amb un cost experimental mínim. Quart, es proposa un kit d'eines (DAVOS) que automatitza i incorpora amb eficàcia els processos orientats a la confiabilitat en el flux de disseny semicustom. Finalment, es demostra la utilitat i eficàcia de les propostes mitjançant un cas d'estudi en el qual s'implementen tres processadors encastats en un FPGA de Xilinx serie 7.[EN] Embedded systems are steadily extending their application areas, dealing with increasing requirements in performance, power consumption, and area (PPA). Whenever embedded systems are used in safety-critical applications, they must also meet rigorous dependability requirements to guarantee their correct operation during an extended period of time. Meeting these requirements is especially challenging for those systems that are based on Field Programmable Gate Arrays (FPGAs), since they are very susceptible to Single Event Upsets. This leads to increased dependability threats, especially in harsh environments. In such a way, dependability should be considered as one of the primary criteria for decision making throughout the whole design flow, which should be complemented by several dependability-driven processes. First, dependability assessment quantifies the robustness of hardware designs against faults and identifies their weak points. Second, dependability-driven verification ensures the correctness and efficiency of fault mitigation mechanisms. Third, dependability benchmarking allows designers to select (from a dependability perspective) the most suitable IP cores, implementation technologies, and electronic design automation (EDA) tools. Finally, dependability-aware design space exploration (DSE) allows to optimally configure the selected IP cores and EDA tools to improve as much as possible the dependability and PPA features of resulting implementations. The aforementioned processes rely on fault injection testing to quantify the robustness of the designed systems. Despite nowadays there exists a wide variety of fault injection solutions, several important problems still should be addressed to better cover the needs of a dependability-driven design flow. In particular, simulation-based fault injection (SBFI) should be adapted to implementation-level HDL models to take into account the architecture of diverse logic primitives, while keeping the injection procedures generic and low-intrusive. Likewise, the granularity of FPGA-based fault injection (FFI) should be refined to the enable accurate identification of weak points in FPGA-based designs. Another important challenge, that dependability-driven processes face in practice, is the reduction of SBFI and FFI experimental effort. The high complexity of modern designs raises the experimental effort beyond the available time budgets, even in simple dependability assessment scenarios, and it becomes prohibitive in presence of alternative design configurations. Finally, dependability-driven processes lack an instrumental support covering the semicustom design flow in all its variety of description languages, implementation technologies, and EDA tools. Existing fault injection tools only partially cover the individual stages of the design flow, being usually specific to a particular design representation level and implementation technology. This work addresses the aforementioned challenges by efficiently integrating dependability-driven processes into the design flow. First, it proposes new SBFI and FFI approaches that enable an accurate and detailed dependability assessment at different levels of the design flow. Second, it improves the performance of dependability-driven processes by defining new techniques for accelerating SBFI and FFI experiments. Third, it defines two DSE strategies that enable the optimal dependability-aware tuning of IP cores and EDA tools, while reducing as much as possible the robustness evaluation effort. Fourth, it proposes a new toolkit (DAVOS) that automates and seamlessly integrates the aforementioned dependability-driven processes into the semicustom design flow. Finally, it illustrates the usefulness and efficiency of these proposals through a case study consisting of three soft-core embedded processors implemented on a Xilinx 7-series SoC FPGA.Tuzov, I. (2020). Dependability-driven Strategies to Improve the Design and Verification of Safety-Critical HDL-based Embedded Systems [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/159883TESI

    TARGETED, REALISTIC AND NATURAL FAULT INJECTION : (USING BUG REPORTS AND GENERATIVE LANGUAGE MODELS)

    Get PDF
    Artificial faults have been proven useful to ensure software quality, enabling the simulation of its behaviour in erroneous situations, and thereby evaluating its robustness and its impact on the surrounding components in the presence of faults. Similarly, by introducing these faults in the testing phase, they can serve as a proxy to measure the fault revelation and thoroughness of current test suites, and provide developers with testing objectives, as writing tests to detect them helps reveal and prevent eventual similar real ones. This approach – mutation testing – has gained increasing fame and interest among researchers and practitioners since its appearance in the 1970s, and operates typically by introducing small syntactic transformations (using mutation operators) to the target program, aiming at producing multiple faulty versions of it (mutants). These operators are generally created based on the grammar rules of the target programming language and then tuned through empirical studies in order to reduce the redundancy and noise among the induced mutants. Having limited knowledge of the program context or the relevant locations to mutate, these patterns are applied in a brute-force manner on the full code base of the program, producing numerous mutants and overwhelming the developers with a costly overhead of test executions and mutants analysis efforts. For this reason, although proven useful in multiple software engineering applications, the adoption of mutation testing remains limited in practice. Another key challenge of mutation testing is the misrepresentation of real bugs by the induced artificial faults. Indeed, this can make the results of any relying application questionable or inaccurate. To tackle this challenge, researchers have proposed new fault-seeding techniques that aim at mimicking real faults. To achieve this, they suggest leveraging the knowledge base of previous faults to inject new ones. Although these techniques produce promising results, they do not solve the high-cost issue or even exacerbate it by generating more mutants with their extended patterns set. Along the same lines of research, we start addressing the aforementioned challenges – regarding the cost of the injection campaign and the representativeness of the artificial faults – by proposing IBIR; a targeted fault injection which aims at mimicking real faulty behaviours. To do so, IBIR uses information retrieved from bug reports (to select relevant code locations to mutate) and fault patterns created by inverting fix patterns, which have been introduced and tuned based on real bug fixes mined from different repositories. We implemented this approach, and showed that it outperforms the fault injection performed by traditional mutation testing in terms of semantic similarity with the originally targeted fault (described in the bug report), when applied at either project or class levels of granularity, and provides better, statistically significant, estimations of test effectiveness (fault detection). Additionally, when injecting only 10 faults, IBIR couples with more real bugs than mutation testing even when injecting 1000 faults. Although effective in emulating real faults, IBIR’s approach depends strongly on the quality and existence of bug reports, which when absent can reduce its performance to that of traditional mutation testing approaches. In the absence of such prior and with the same objective of injecting few relevant faults, we suggest accounting for the project’s context and the actual developer’s code distribution to generate more “natural” mutants, in a sense where they are understandable and more likely to occur. To this end, we propose the usage of code from real programs as a knowledge base to inject faults instead of the language grammar or previous bugs knowledge, such as bug reports and bug fixes. Particularly, we leverage the code knowledge and capability of pre-trained generative language models (i.e. CodeBERT) in capturing the code context and predicting developer-like code alternatives, to produce few faults in diverse locations of the input program. This way the approach development and maintenance does not require any major effort, such as creating or inferring fault patterns or training a model to learn how to inject faults. In fact, to inject relevant faults in a given program, our approach masks tokens (one at a time) from its code base and uses the model to predict them, then considers the inaccurate predictions as probable developer-like mistakes, forming the output mutants set. Our results show that these mutants induce test suites with higher fault detection capability, in terms of effectiveness and cost-efficiency than conventional mutation testing. Next, we turn our interest to the code comprehension of pre-trained language models, particularly their capability in capturing the naturalness aspect of code. This measure has been proven very useful to distinguish unusual code which can be a symptom of code smell, low readability, bugginess, bug-proneness, etc, thereby indicating relevant locations requiring prior attention from developers. Code naturalness is typically predicted using statistical language models like n-gram, to approximate how surprising a piece of code is, based on the fact that code, in small snippets, is repetitive. Although powerful, training such models on a large code corpus can be tedious, time-consuming and sensitive to code patterns (and practices) encountered during training. Consequently, these models are often trained on a small corpus and thus only estimate the language naturalness relative to a specific style of programming or type of project. To overcome these issues, we propose the use of pre-trained generative language models to infer code naturalness. Thus, we suggest inferring naturalness by masking (omitting) code tokens, one at a time, of code sequences, and checking the models’ ability to predict them. We implement this workflow, named CodeBERT-NT, and evaluate its capability to prioritize buggy lines over non-buggy ones when ranking code based on its naturalness. Our results show that our approach outperforms both, random-uniform- and complexity-based ranking techniques, and yields comparable results to the n-gram models, although trained in an intra-project fashion. Finally, We provide the implementation of tools and libraries enabling the code naturalness measuring and fault injection by the different approaches and provide the required resources to compare their effectiveness in emulating real faults and guiding the testing towards higher fault detection techniques. This includes the source code of our proposed approaches and replication packages of our conducted studies
    corecore