10 research outputs found

    Re-use of tests and arguments for assesing dependable mixed-critically systems

    Get PDF
    The safety assessment of mixed-criticality systems (MCS) is a challenging activity due to system heterogeneity, design constraints and increasing complexity. The foundation for MCSs is the integrated architecture paradigm, where a compact hardware comprises multiple execution platforms and communication interfaces to implement concurrent functions with different safety requirements. Besides a computing platform providing adequate isolation and fault tolerance mechanism, the development of an MCS application shall also comply with the guidelines defined by the safety standards. A way to lower the overall MCS certification cost is to adopt a platform-based design (PBD) development approach. PBD is a model-based development (MBD) approach, where separate models of logic, hardware and deployment support the analysis of the resulting system properties and behaviour. The PBD development of MCSs benefits from a composition of modular safety properties (e.g. modular safety cases), which support the derivation of mixed-criticality product lines. The validation and verification (V&V) activities claim a substantial effort during the development of programmable electronics for safety-critical applications. As for the MCS dependability assessment, the purpose of the V&V is to provide evidences supporting the safety claims. The model-based development of MCSs adds more V&V tasks, because additional analysis (e.g., simulations) need to be carried out during the design phase. During the MCS integration phase, typically hardware-in-the-loop (HiL) plant simulators support the V&V campaigns, where test automation and fault-injection are the key to test repeatability and thorough exercise of the safety mechanisms. This dissertation proposes several V&V artefacts re-use strategies to perform an early verification at system level for a distributed MCS, artefacts that later would be reused up to the final stages in the development process: a test code re-use to verify the fault-tolerance mechanisms on a functional model of the system combined with a non-intrusive software fault-injection, a model to X-in-the-loop (XiL) and code-to-XiL re-use to provide models of the plant and distributed embedded nodes suited to the HiL simulator, and finally, an argumentation framework to support the automated composition and staged completion of modular safety-cases for dependability assessment, in the context of the platform-based development of mixed-criticality systems relying on the DREAMS harmonized platform.La dificultad para evaluar la seguridad de los sistemas de criticidad mixta (SCM) aumenta con la heterogeneidad del sistema, las restricciones de diseño y una complejidad creciente. Los SCM adoptan el paradigma de arquitectura integrada, donde un hardware embebido compacto comprende múltiples plataformas de ejecución e interfaces de comunicación para implementar funciones concurrentes y con diferentes requisitos de seguridad. Además de una plataforma de computación que provea un aislamiento y mecanismos de tolerancia a fallos adecuados, el desarrollo de una aplicación SCM además debe cumplir con las directrices definidas por las normas de seguridad. Una forma de reducir el coste global de la certificación de un SCM es adoptar un enfoque de desarrollo basado en plataforma (DBP). DBP es un enfoque de desarrollo basado en modelos (DBM), en el que modelos separados de lógica, hardware y despliegue soportan el análisis de las propiedades y el comportamiento emergente del sistema diseñado. El desarrollo DBP de SCMs se beneficia de una composición modular de propiedades de seguridad (por ejemplo, casos de seguridad modulares), que facilitan la definición de líneas de productos de criticidad mixta. Las actividades de verificación y validación (V&V) representan un esfuerzo sustancial durante el desarrollo de aplicaciones basadas en electrónica confiable. En la evaluación de la seguridad de un SCM el propósito de las actividades de V&V es obtener las evidencias que apoyen las aseveraciones de seguridad. El desarrollo basado en modelos de un SCM incrementa las tareas de V&V, porque permite realizar análisis adicionales (por ejemplo, simulaciones) durante la fase de diseño. En las campañas de pruebas de integración de un SCM habitualmente se emplean simuladores de planta hardware-in-the-loop (HiL), en donde la automatización de pruebas y la inyección de faltas son la clave para la repetitividad de las pruebas y para ejercitar completamente los mecanismos de tolerancia a fallos. Esta tesis propone diversas estrategias de reutilización de artefactos de V&V para la verificación temprana de un MCS distribuido, artefactos que se emplearán en ulteriores fases del desarrollo: la reutilización de código de prueba para verificar los mecanismos de tolerancia a fallos sobre un modelo funcional del sistema combinado con una inyección de fallos de software no intrusiva, la reutilización de modelo a X-in-the-loop (XiL) y código a XiL para obtener modelos de planta y nodos distribuidos aptos para el simulador HiL y, finalmente, un marco de argumentación para la composición automatizada y la compleción escalonada de casos de seguridad modulares, en el contexto del desarrollo basado en plataformas de sistemas de criticidad mixta empleando la plataforma armonizada DREAMS.Kritikotasun nahastuko sistemen segurtasun ebaluazioa jarduera neketsua da beraien heterogeneotasuna dela eta. Sistema hauen oinarria arkitektura integratuen paradigman datza, non hardware konpaktu batek exekuzio plataforma eta komunikazio interfaze ugari integratu ahal dituen segurtasun baldintza desberdineko funtzio konkurrenteak inplementatzeko. Konputazio plataformek isolamendu eta akatsen aurkako mekanismo egokiak emateaz gain, segurtasun arauek definituriko jarraibideak jarraitu behar dituzte kritikotasun mistodun aplikazioen garapenean. Sistema hauen zertifikazio prozesuaren kostua murrizteko aukera bat plataformetan oinarritutako garapenean (PBD) datza. Garapen planteamendu hau modeloetan oinarrituriko garapena da (MBD) non modeloaren logika, hardware eta garapen desberdinak sistemaren propietateen eta portaeraren aurka aztertzen diren. Kritikotasun mistodun sistemen PBD garapenak etekina ateratzen dio moduluetan oinarrituriko segurtasun propietateei, adibidez: segurtasun kasu modularrak (MSC). Modulu hauek kritikotasun mistodun produktu-lerroak ere hartzen dituzte kontutan. Berifikazio eta balioztatze (V&V) jarduerek esfortzu kontsideragarria eskatzen dute segurtasun-kiritikoetarako elektronika programagarrien garapenean. Kritikotasun mistodun sistemen konfiantzaren ebaluazioaren eta V&V jardueren helburua segurtasun eskariak jasotzen dituzten frogak proportzionatzea da. Kritikotasun mistodun sistemen modelo bidezko garapenek zeregin gehigarriak atxikitzen dizkio V&V jarduerari, fase honetan analisi gehigarriak (hots, simulazioak) zehazten direlako. Bestalde, kritikotasun mistodun sistemen integrazio fasean, hardware-in-the-loop (Hil) simulazio plantek V&V iniziatibak sostengatzen dituzte non testen automatizazioan eta akatsen txertaketan funtsezko jarduerak diren. Jarduera hauek frogen errepikapena eta segurtasun mekanismoak egiaztzea ahalbidetzen dute. Tesi honek V&V artefaktuen berrerabilpenerako estrategiak proposatzen ditu, kritikotasun mistodun sistemen egiaztatze azkarrerako sistema mailan eta garapen prozesuko azken faseetaraino erabili daitezkeenak. Esate baterako, test kodearen berrabilpena akats aurkako mekanismoak egiaztatzeko, modelotik X-in-the-loop (XiL)-ra eta kodetik XiL-rako konbertsioa HiL simulaziorako eta argumentazio egitura bat DREAMS Europear proiektuan definituriko arkitektura estiloan oinarrituriko segurtasun kasu modularrak automatikoki eta gradualki sortzeko

    Dependability investigation of wireless short range embedded systems: hardware platform oriented approach

    Get PDF
    A new direction in short-range wireless applications has appeared in the form of high-speed data communication devices for distances of hundreds meters. Behind these embedded applications, a complex heterogeneous architecture is built. Moreover, these short range communications are introduced into critical applications, where the dependability/reliability is mandatory. Thus, dependability concerns around reliability evaluation become a major challenge in these systems, and pose several questions to answer. Obviously, in such systems, the attribute reliability has to be investigated for various components and at different abstraction levels. In this paper, we discuss the investigation of dependability in wireless short range systems. We present a hardware platform for wireless system dependability analysis as an alternative for the time consuming simulation techniques. The platform is built using several instances of one of the commercial FPGA platforms available on the market place. We describe the different steps of building the wireless hardware platform for short range systems dependability analysis. Then, we show how this HW platform based dependability investigation framework can be a very interactive approach. Based on this platform we introduce a new methodology and a flow to investigate the different parts of system dependability at different abstraction levels. The benefits to use the proposed framework are three fold: first, it takes care of the whole system (HW/SW -digital part, mixed RF part, and wireless part); Second, the hardware platform enables to explore the application’s reliability under real environmental conditions taking into account the effect of the environment threats on the system; And last, the wireless platform built for dependability investigation present a fast investigation approach in comparison with the time consuming co-simulation technique

    Multi-core devices for safety-critical systems: a survey

    Get PDF
    Multi-core devices are envisioned to support the development of next-generation safety-critical systems, enabling the on-chip integration of functions of different criticality. This integration provides multiple system-level potential benefits such as cost, size, power, and weight reduction. However, safety certification becomes a challenge and several fundamental safety technical requirements must be addressed, such as temporal and spatial independence, reliability, and diagnostic coverage. This survey provides a categorization and overview at different device abstraction levels (nanoscale, component, and device) of selected key research contributions that support the compliance with these fundamental safety requirements.This work has been partially supported by the Spanish Ministry of Economy and Competitiveness under grant TIN2015-65316-P, Basque Government under grant KK-2019-00035 and the HiPEAC Network of Excellence. The Spanish Ministry of Economy and Competitiveness has also partially supported Jaume Abella under Ramon y Cajal postdoctoral fellowship (RYC-2013-14717).Peer ReviewedPostprint (author's final draft

    Selection of a new hardware and software platform for railway interlocking

    Get PDF
    The interlocking system is one of the main actors for safe railway transportation. In most cases, the whole system is supplied by a single vendor. The recent regulations from the European Union direct for an “open” architecture to invite new game changers and reduce life-cycle costs. The objective of the thesis is to propose an alternative platform that could replace a legacy interlocking system. In the thesis, various commercial off-the-shelf hardware and software products are studied which could be assembled to compose an alternative interlocking platform. The platform must be open enough to adapt to any changes in the constituent elements and abide by the proposed baselines of new standardization initiatives, such as ERTMS, EULYNX, and RCA. In this thesis, a comparative study is performed between these products based on hardware capacity, architecture, communication protocols, programming tools, security, railway certifications, life-cycle issues, etc

    Design and Implementation of a Time Predictable Processor: Evaluation With a Space Case Study

    Get PDF
    Embedded real-time systems like those found in automotive, rail and aerospace, steadily require higher levels of guaranteed computing performance (and hence time predictability) motivated by the increasing number of functionalities provided by software. However, high-performance processor design is driven by the average-performance needs of mainstream market. To make things worse, changing those designs is hard since the embedded real-time market is comparatively a small market. A path to address this mismatch is designing low-complexity hardware features that favor time predictability and can be enabled/disabled not to affect average performance when performance guarantees are not required. In this line, we present the lessons learned designing and implementing LEOPARD, a four-core processor facilitating measurement-based timing analysis (widely used in most domains). LEOPARD has been designed adding low-overhead hardware mechanisms to a LEON3 processor baseline that allow capturing the impact of jittery resources (i.e. with variable latency) in the measurements performed at analysis time. In particular, at core level we handle the jitter of caches, TLBs and variable-latency floating point units; and at the chip level, we deal with contention so that time-composable timing guarantees can be obtained. The result of our applied study with a Space application shows how per-resource jitter is controlled facilitating the computation of high-quality WCET estimates

    "Design and verification of a digitally controlled output stage for automotive safety applications"

    Get PDF
    In these last decades the importance of electronics in automotive environment had an exponential increase. Electronic Integrated Circuits (ICs) are now playing a primary role in the economy of vehicles, especially since special laws and strict safety requirements have been introduced. The aim of the thesis, developed within Austramicrosystem design centre of Navacchio (PI), is the design and the verification of a digitally controlled output stage. Output stages are the final components of many sensor-based ICS. In fact their typical typical signal-chain starts from the sensing of a physical phenomenon, passing by its transduction in an electrical quantity, its digital conversion and processing, and ends with the drive of an actuator. The task of an output stage is to interpret the input digital signal and consequently drive an actuator. The target of this work was to improve the performances of the current output stage company solutions. This target has been achieved through the development and realization of a digitally-controlled loop. The proposed solution guarantees a performance improvement and adds the possibility to cyclically monitor the output voltage, detecting issues and reporting errors. A control algorithm has been developed and validated through its insertion in a mathematical modeling of the system. Then, to experimentally validate this control algorithm, an Integrated Circuit has been designed, realized and lastly measured. This thesis follows the workflow behind the realization of the Integrated Circuit and its successive measurement

    Design and Development of a Multi-Purpose Input Output Controller Board for the SPES Control System

    Get PDF
    This PhD work has been carried out at the Legnaro National Laboratories (LNL), one of the four national labs of the National Institute for Nuclear Physics (INFN). The mission of LNL is to perform research in the field of nuclear physics and nuclear astrophysics together with emerging technologies. Technological research and innovation are the key to promote excellence in science, to excite competitive industries and to establish a better society. The research activities concerning electronics and computer science are an essential base to develop the control system of the Selective Production of Exotic Species (SPES) project. Nowadays, SPES is the most important project commissioned at LNL and represents the future of the Lab. It is a second generation Isotope Separation On-Line (ISOL) radioactive ion beam facility intended for fundamental nuclear physics research as well as experimental applications in different fields of science, such as nuclear medicine; radio-pharmaceutical production for therapy and diagnostic. The design of the SPES control system demands innovative technologies to embed the control of several appliances with different requirements and performing different tasks spanning from data sharing and visualization, data acquisition and storage, networking, security and surveillance operations, beam transport and diagnostic. The real time applications and fast peripherals control commonly found in the distributed control network of particle accelerators are accompanied by the challenge of developing custom embedded systems. In this context, the proposed PhD work describes the design and development of a multi-purpose Input Output Controller (IOC) board capable of embedding the control of typical accelerator instrumentation involved in the automatic beam transport system foreseen for the SPES project. The idea behind this work is to extend the control reach to the single device level without losing in modularity and standardization. The outcome of the research work is a general purpose embedded computer that will be the base for standardizing the hardware layer of the frontend computers in the SPES distributed control system. The IOC board is a Computer-on-Module (COM) carrier board designed to host any COM Express type 6 module and is equipped with a Field Programmable Gate Array (FPGA) and user application specific I/O connection solutions not found in a desktop pc. All the generic pc functionalities are readily available in off-the-shelf modules and the result is a custom motherboard that bridges the gap between custom developments and commercial personal computers. The end user can deal with a general-purpose pc with a high level of hardware abstraction besides being able to exploit the on-board FPGA potentialities in terms of fast peripherals control and real time digital data processing. This document opens with an introductory chapter about the SPES project and its control system architecture and technology before to describe the IOC board design, prototyping, and characterization. The thesis ends describing the installation in the field of the IOC board which is the core of the new diagnostics data readout and signal processing system. The results of the tests performed under real beam conditions prove that the new hardware extends the current sensitivity to the pA range, addressing the SPES requirements, and prove that the IOC board is a reliable solution to standardize the control of several appliances in the SPES accelerators complex where it will be embedded into physical equipment, or in their proximity, and will control and monitor their operation replacing the legacy VME technology. The installation in the field of the IOC board represents a great personal reward and crowns these years of busy time during which I turned what was just an idea in 2014, into a working embedded computer today

    Proceedings Work-In-Progress Session of the 13th Real-Time and Embedded Technology and Applications Symposium

    Get PDF
    The Work-In-Progress session of the 13th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS\u2707) presents papers describing contributions both to state of the art and state of the practice in the broad field of real-time and embedded systems. The 17 accepted papers were selected from 19 submissions. This proceedings is also available as Washington University in St. Louis Technical Report WUCSE-2007-17, at http://www.cse.seas.wustl.edu/Research/FileDownload.asp?733. Special thanks go to the General Chairs – Steve Goddard and Steve Liu and Program Chairs - Scott Brandt and Frank Mueller for their support and guidance

    New Fault Detection, Mitigation and Injection Strategies for Current and Forthcoming Challenges of HW Embedded Designs

    Full text link
    Tesis por compendio[EN] Relevance of electronics towards safety of common devices has only been growing, as an ever growing stake of the functionality is assigned to them. But of course, this comes along the constant need for higher performances to fulfill such functionality requirements, while keeping power and budget low. In this scenario, industry is struggling to provide a technology which meets all the performance, power and price specifications, at the cost of an increased vulnerability to several types of known faults or the appearance of new ones. To provide a solution for the new and growing faults in the systems, designers have been using traditional techniques from safety-critical applications, which offer in general suboptimal results. In fact, modern embedded architectures offer the possibility of optimizing the dependability properties by enabling the interaction of hardware, firmware and software levels in the process. However, that point is not yet successfully achieved. Advances in every level towards that direction are much needed if flexible, robust, resilient and cost effective fault tolerance is desired. The work presented here focuses on the hardware level, with the background consideration of a potential integration into a holistic approach. The efforts in this thesis have focused several issues: (i) to introduce additional fault models as required for adequate representativity of physical effects blooming in modern manufacturing technologies, (ii) to provide tools and methods to efficiently inject both the proposed models and classical ones, (iii) to analyze the optimum method for assessing the robustness of the systems by using extensive fault injection and later correlation with higher level layers in an effort to cut development time and cost, (iv) to provide new detection methodologies to cope with challenges modeled by proposed fault models, (v) to propose mitigation strategies focused towards tackling such new threat scenarios and (vi) to devise an automated methodology for the deployment of many fault tolerance mechanisms in a systematic robust way. The outcomes of the thesis constitute a suite of tools and methods to help the designer of critical systems in his task to develop robust, validated, and on-time designs tailored to his application.[ES] La relevancia que la electrónica adquiere en la seguridad de los productos ha crecido inexorablemente, puesto que cada vez ésta copa una mayor influencia en la funcionalidad de los mismos. Pero, por supuesto, este hecho viene acompañado de una necesidad constante de mayores prestaciones para cumplir con los requerimientos funcionales, al tiempo que se mantienen los costes y el consumo en unos niveles reducidos. En este escenario, la industria está realizando esfuerzos para proveer una tecnología que cumpla con todas las especificaciones de potencia, consumo y precio, a costa de un incremento en la vulnerabilidad a múltiples tipos de fallos conocidos o la introducción de nuevos. Para ofrecer una solución a los fallos nuevos y crecientes en los sistemas, los diseñadores han recurrido a técnicas tradicionalmente asociadas a sistemas críticos para la seguridad, que ofrecen en general resultados sub-óptimos. De hecho, las arquitecturas empotradas modernas ofrecen la posibilidad de optimizar las propiedades de confiabilidad al habilitar la interacción de los niveles de hardware, firmware y software en el proceso. No obstante, ese punto no está resulto todavía. Se necesitan avances en todos los niveles en la mencionada dirección para poder alcanzar los objetivos de una tolerancia a fallos flexible, robusta, resiliente y a bajo coste. El trabajo presentado aquí se centra en el nivel de hardware, con la consideración de fondo de una potencial integración en una estrategia holística. Los esfuerzos de esta tesis se han centrado en los siguientes aspectos: (i) la introducción de modelos de fallo adicionales requeridos para la representación adecuada de efectos físicos surgentes en las tecnologías de manufactura actuales, (ii) la provisión de herramientas y métodos para la inyección eficiente de los modelos propuestos y de los clásicos, (iii) el análisis del método óptimo para estudiar la robustez de sistemas mediante el uso de inyección de fallos extensiva, y la posterior correlación con capas de más alto nivel en un esfuerzo por recortar el tiempo y coste de desarrollo, (iv) la provisión de nuevos métodos de detección para cubrir los retos planteados por los modelos de fallo propuestos, (v) la propuesta de estrategias de mitigación enfocadas hacia el tratamiento de dichos escenarios de amenaza y (vi) la introducción de una metodología automatizada de despliegue de diversos mecanismos de tolerancia a fallos de forma robusta y sistemática. Los resultados de la presente tesis constituyen un conjunto de herramientas y métodos para ayudar al diseñador de sistemas críticos en su tarea de desarrollo de diseños robustos, validados y en tiempo adaptados a su aplicación.[CA] La rellevància que l'electrònica adquireix en la seguretat dels productes ha crescut inexorablement, puix cada volta més aquesta abasta una major influència en la funcionalitat dels mateixos. Però, per descomptat, aquest fet ve acompanyat d'un constant necessitat de majors prestacions per acomplir els requeriments funcionals, mentre es mantenen els costos i consums en uns nivells reduïts. Donat aquest escenari, la indústria està fent esforços per proveir una tecnologia que complisca amb totes les especificacions de potència, consum i preu, tot a costa d'un increment en la vulnerabilitat a diversos tipus de fallades conegudes, i a la introducció de nous tipus. Per oferir una solució a les noves i creixents fallades als sistemes, els dissenyadors han recorregut a tècniques tradicionalment associades a sistemes crítics per a la seguretat, que en general oferixen resultats sub-òptims. De fet, les arquitectures empotrades modernes oferixen la possibilitat d'optimitzar les propietats de confiabilitat en habilitar la interacció dels nivells de hardware, firmware i software en el procés. Tot i això eixe punt no està resolt encara. Es necessiten avanços a tots els nivells en l'esmentada direcció per poder assolir els objectius d'una tolerància a fallades flexible, robusta, resilient i a baix cost. El treball ací presentat se centra en el nivell de hardware, amb la consideració de fons d'una potencial integració en una estratègia holística. Els esforços d'esta tesi s'han centrat en els següents aspectes: (i) la introducció de models de fallada addicionals requerits per a la representació adequada d'efectes físics que apareixen en les tecnologies de fabricació actuals, (ii) la provisió de ferramentes i mètodes per a la injecció eficient del models proposats i dels clàssics, (iii) l'anàlisi del mètode òptim per estudiar la robustesa de sistemes mitjançant l'ús d'injecció de fallades extensiva, i la posterior correlació amb capes de més alt nivell en un esforç per retallar el temps i cost de desenvolupament, (iv) la provisió de nous mètodes de detecció per cobrir els reptes plantejats pels models de fallades proposats, (v) la proposta d'estratègies de mitigació enfocades cap al tractament dels esmentats escenaris d'amenaça i (vi) la introducció d'una metodologia automatitzada de desplegament de diversos mecanismes de tolerància a fallades de forma robusta i sistemàtica. Els resultats de la present tesi constitueixen un conjunt de ferramentes i mètodes per ajudar el dissenyador de sistemes crítics en la seua tasca de desenvolupament de dissenys robustos, validats i a temps adaptats a la seua aplicació.Espinosa García, J. (2016). New Fault Detection, Mitigation and Injection Strategies for Current and Forthcoming Challenges of HW Embedded Designs [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/73146TESISCompendi

    Fehlertolerante Mehrkernprozessoren für gemischt-kritische Echtzeitsysteme

    Get PDF
    Current and future computing systems must be appropriately designed to cope with random hardware faults in order to provide a dependable service and correct functionality. Dependability has many facets to be addressed when designing a system and that is specially challenging in mixed-critical real-time systems, where safety standards play an important role and where responding in time can be as important as responding correctly or even responding at all. The thesis addresses the dependability of mixed-critical real-time systems, considering three important requirements: integrity, resilience and real-time. More specifically, it looks into the architectural and performance aspects of achieving dependability, concentrating its scope on error detection and handling in hardware -- more specifically in the Network-on-Chip (NoC), the backbone of modern MPSoC -- and on the performance of error handling and recovery in software. The thesis starts by looking at the impacts of random hardware faults on the NoC and on the system, with special focus on soft errors. Then, it addresses the uncovered weaknesses in the NoC by proposing a resilient NoC for mixed-critical real-time systems that is able to provide a highly reliable service with transparent protection for the applications. Formal communication time analysis is provided with common ARQ protocols modeled for NoCs and including a novel ARQ-based protocol optimized for DMAs. After addressing the efficient use of ARQ-based protocols in NoCs, the thesis proposes the Advanced Integrity Q-service (AIQ), a low-overhead mechanism to achieve integrity and real-time guarantees of NoC transactions on an End-to-End (E2E) basis. Inspired by transactions in distributed systems, the mechanism differs from the previous approach in that it does not provide error recovery in hardware but delegates the task to software, making use of existing functionality in cross-layer fault-tolerance solutions. Finally, the thesis addresses error handling in software as seen in cross-layer approaches. It addresses the performance of replicated software execution in many-core platforms. Replicated software execution provides protection to the system against random hardware faults. It relies on hardware-supported error detection and error handling in software. The replica-aware co-scheduling is proposed to achieve high performance with replicated execution, which is not possible with standard real-time schedulers.Um einen zuverlässigen Betrieb und korrekte Funktionalität zu gewährleisten, müssen aktuelle und zukünftige Computersysteme so ausgelegt werden, dass sie mit diesen Fehlern umgehen können. Zuverlässigkeit hat viele Aspekte, die bei der Entwicklung eines Systems berücksichtigt werden müssen. Das gilt insbesondere für Echtzeitsysteme mit gemischter Kritikalität, bei denen Sicherheitsstandards, die ein korrektes und rechtzeitiges Verhalten fordern, eine wichtige Rolle spielen. Diese Dissertation befasst sich mit der Zuverlässigkeit von gemischt-kritischen Echtzeitsystemen unter Berücksichtigung von drei wichtigen Anforderungen: Integrität, Resilienz und Echtzeit. Genauer gesagt, behandelt sie Architektur- und Leistungsaspekte die notwendig sind um Zuverlässigkeit zu erreichen, wobei der Schwerpunkt auf der Fehlererkennung und -behandlung in der Hardware – genauer gesagt im Network-on-Chip (NoC), dem Rückgrat des modernen MPSoC – und auf der Leistung der Fehlerbehandlung und -behebung in der Software liegt. Die Arbeit beginnt mit der Untersuchung der Auswirkung von zufälligen Hardwarefehlern auf das NoC und das System, wobei der Schwerpunkt auf weichen Fehler (soft errors) liegt. Anschließend werden die aufgedeckten Schwachstellen im NoC behoben, indem ein widerstandsfähiges NoC für gemischt-kritische Echtzeitsysteme vorgeschlagen wird, das in der Lage ist, einen höchst zuverlässigen Betrieb mit transparentem Schutz für die Anwendungen zu bieten. Nach der Auseinandersetzung mit der effizienten Nutzung von ARQ-basierten Protokolle in NoCs, wird der Advanced Integrity Q-Service (AIQ) vorgestellt, der ein Mechanismus mit geringem Overhead ist, um Integrität und Echtzeit-Garantien von NoC-Transaktionen auf Ende-zu-Ende (E2E)-Basis zu erreichen. Inspiriert von Transaktionen in verteilten Systemen unterscheidet sich der Mechanismus vom bisherigen Konzept dadurch, dass er keine Fehlerbehebung in der Hardware vorsieht, sondern diese Aufgabe an die Software delegiert. Schließlich befasst sich die Dissertation mit der Fehlerbehandlung in Software, wie sie in schichtübergreifenden Methoden zu sehen ist. Sie behandelt die Leistung der replizierten Software-Ausführung in Many-Core-Plattformen. Es setzt auf hardwaregestützte Fehlererkennung und Fehlerbehandlung in der Software. Das Replika-bewusste Co-Scheduling wird vorgeschlagen, um eine hohe Performance bei replizierter Ausführung zu erreichen, was mit Standard-Echtzeit-Schedulern nicht möglich ist
    corecore