115 research outputs found

    ZuverlÀssige und Energieeffiziente gemischt-kritische Echtzeit On-Chip Systeme

    Get PDF
    Multi- and many-core embedded systems are increasingly becoming the target for many applications that require high performance under varying conditions. A resulting challenge is the control, and reliable operation of such complex multiprocessing architectures under changes, e.g., high temperature and degradation. In mixed-criticality systems where many applications with varying criticalities are consolidated on the same execution platform, fundamental isolation requirements to guarantee non-interference of critical functions are crucially important. While Networks-on-Chip (NoCs) are the prevalent solution to provide scalable and efficient interconnects for the multiprocessing architectures, their associated energy consumption has immensely increased. Specifically, hard real-time NoCs must manifest limited energy consumption as thermal runaway in such a core shared resource jeopardizes the whole system guarantees. Thus, dynamic energy management of NoCs, as opposed to the related work static solutions, is highly necessary to save energy and decrease temperature, while preserving essential temporal requirements. In this thesis, we introduce a centralized management to provide energy-aware NoCs for hard real-time systems. The design relies on an energy control network, developed on top of an existing switch arbitration network to allow isolation between energy optimization and data transmission. The energy control layer includes local units called Power-Aware NoC controllers that dynamically optimize NoC energy depending on the global state and applications’ temporal requirements. Furthermore, to adapt to abnormal situations that might occur in the system due to degradation, we extend the concept of NoC energy control to include the entire system scope. That is, online resource management employing hierarchical control layers to treat system degradation (imminent core failures) is supported. The mechanism applies system reconfiguration that involves workload migration. For mixed-criticality systems, it allows flexible boundaries between safety-critical and non-critical subsystems to safely apply the reconfiguration, preserving fundamental safety requirements and temporal predictability. Simulation and formal analysis-based experiments on various realistic usecases and benchmarks are conducted showing significant improvements in NoC energy-savings and in treatment of system degradation for mixed-criticality systems improving dependability over the status quo.Eingebettete Many- und Multi-core-Systeme werden zunehmend das Ziel fĂŒr Anwendungen, die hohe Anfordungen unter unterschiedlichen Bedinungen haben. FĂŒr solche hochkomplexed Multi-Prozessor-Systeme ist es eine grosse Herausforderung zuverlĂ€ssigen Betrieb sicherzustellen, insbesondere wenn sich die UmgebungseinflĂŒsse verĂ€ndern. In Systeme mit gemischter KritikalitĂ€t, in denen viele Anwendungen mit unterschiedlicher KritikalitĂ€t auf derselben AusfĂŒhrungsplattform bedient werden mĂŒssen, sind grundlegende Isolationsanforderungen zur GewĂ€hrleistung der Nichteinmischung kritischer Funktionen von entscheidender Bedeutung. WĂ€hrend On-Chip Netzwerke (NoCs) hĂ€ufig als skalierbare Verbindung fĂŒr die Multiprozessor-Architekturen eingesetzt werden, ist der damit verbundene Energieverbrauch immens gestiegen. Daher sind dynamische Plattformverwaltungen, im Gegensatz zu den statischen, zwingend notwendig, um ein System an die oben genannten VerĂ€nderungen anzupassen und gleichzeitig Timing zu gewĂ€hrleisten. In dieser Arbeit entwickeln wir energieeffiziente NoCs fĂŒr harte Echtzeitsysteme. Das Design basiert auf einem Energiekontrollnetzwerk, das auf einem bestehenden Switch-Arbitration-Netzwerk entwickelt wurde, um eine Isolierung zwischen Energieoptimierung und DatenĂŒbertragung zu ermöglichen. Die Energiesteuerungsschicht umfasst lokale Einheiten, die als Power-Aware NoC-Controllers bezeichnet werden und die die NoC-Energie in AbhĂ€ngigkeit vom globalen Zustand und den zeitlichen Anforderungen der Anwendungen optimieren. DarĂŒber hinaus wird das Konzept der NoC-Energiekontrolle zur Anpassung an Anomalien, die aufgrund von Abnutzung auftreten können, auf den gesamten Systemumfang ausgedehnt. Online- Ressourcenverwaltungen, die hierarchische Kontrollschichten zur Behandlung Abnutzung (drohender KernausfĂ€lle) einsetzen, werden bereitgestellt. Bei Systemen mit gemischter KritikalitĂ€t erlaubt es flexible Grenzen zwischen sicherheitskritischen und unkritischen Subsystemen, um die Rekonfiguration sicher anzuwenden, wobei grundlegende Sicherheitsanforderungen erhalten bleiben und Timing Vorhersehbarkeit. Experimente werden auf der Basis von Simulationen und formalen Analysen zu verschiedenen realistischen Anwendungsfallen und Benchmarks durchgefĂŒhrt, die signifikanten Verbesserungen bei On-Chip Netzwerke-Energieeinsparungen und bei der Behandlung von Abnutzung fĂŒr Systeme mit gemischter KritikalitĂ€t zur Verbesserung die SystemstabilitĂ€t gegenĂŒber dem bisherigen Status quo zeigen

    Fault management via dynamic reconfiguration for integrated modular avionics

    Get PDF
    The purpose of this research is to investigate fault management methodologies within Integrated Modular Avionics (IMA) systems, and develop techniques by which the use of dynamic reconfiguration can be implemented to restore higher levels of systems redundancy in the event of a systems fault. A proposed concept of dynamic configuration has been implemented on a test facility that allows controlled injection of common faults to a representative IMA system. This facility allows not only the observation of the response of the system management activities to manage the fault, but also analysis of real time data across the network to ensure distributed control activities are maintained. IMS technologies have evolved as a feasible direction for the next generation of avionic systems. Although federated systems are logical to design, certify and implement, they have some inherent limitations that are not cost beneficial to the customer over long life-cycles of complex systems, and hence the fundamental modular design, i.e. common processors running modular software functions, provides a flexibility in terms of configuration, implementation and upgradability that cannot be matched by well-established federated avionic system architectures. For example, rapid advances of computing technology means that dedicated hardware can become outmoded by component obsolescence which almost inevitably makes replacements unavailable during normal life-cycles of most avionic systems. To replace the obsolete part with a newer design involves a costly re-design and re-certification of any relevant or interacting functions with this unit. As such, aircraft are often known to go through expensive mid-life updates to upgrade all avionics systems. In contrast, a higher frequency of small capability upgrades would maximise the product performance, including cost of development and procurement, in constantly changing platform deployment environments. IMA is by no means a new concept and work has been carried out globally in order to mature the capability. There are even examples where this technology has been implemented as subsystems on service aircraft. However, IMA flexible configuration properties are yet to be exploited to their full extent; it is feasible that identification of faults or failures within the system would lead to the exploitation of these properties in order to dynamically reconfigure and maintain high levels of redundancy in the event of component failure. It is also conceivable to install redundant components such that an IMS can go through a process of graceful degradation, whereby the system accommodates a number of active failures, but can still maintain appropriate levels of reliability and service. This property extends the average maintenance-free operating period, ensuring that the platform has considerably less unscheduled down time and therefore increased availability. The content of this research work involved a number of key activities in order to investigate the feasibility of the issues outlined above. The first was the creation of a representative IMA system and the development of a systems management capability that performs the required configuration controls. The second aspect was the development of hardware test rig in order to facilitate a tangible demonstration of the IMA capability. A representative IMA was created using LabVIEW Embedded Tool Suit (ETS) real time operating system for minimal PC systems. Although this required further code written to perform IMS middleware functions and does not match up to the stringent air safety requirements, it provided a suitable test bed to demonstrate systems management capabilities. The overall IMA was demonstrated with a 100kg scale Maglev vehicle as a test subject. This platform provides a challenging real-time control problem, analogous to an aircraft flight control system, requiring the calculation of parallel control loops at a high sampling rate in order to maintain magnetic suspension. Although the dynamic properties of the test rig are not as complex as a modern aircraft, it has much less stringent operating requirements and therefore substantially less risk associated with failure to provide service. The main research contributions for the PhD are: 1.A solution for the dynamic reconfiguration problem for assigning required systems functions (namely a distributed, real-time control function with redundant processing channels) to available computing resources whilst protecting the functional concurrency and time critical needs of the control actions. 2.A systems management strategy that utilises the dynamic reconfiguration properties of an IMA System to restore high levels of redundancy in the presence of failures. The conclusion summarises the level of success of the implemented system in terms of an appropriate dynamic reconfiguration to the response of a fault signal. In addition, it highlights the issues with using an IMA to as a solution to operational goals of the target hardware, in terms of design and build complexity, overhead and resources

    New Design Techniques for Dynamic Reconfigurable Architectures

    Get PDF
    L'abstract Ăš presente nell'allegato / the abstract is in the attachmen

    Mixed-Criticality Systems on Commercial-Off-the-Shelf Multi-Processor Systems-on-Chip

    Get PDF
    Avionics and space industries are struggling with the adoption of technologies like multi-processor system-on-chips (MPSoCs) due to strict safety requirements. This thesis propose a new reference architecture for MPSoC-based mixed-criticality systems (MCS) - i.e., systems integrating applications with different level of criticality - which are a common use case for aforementioned industries. This thesis proposes a system architecture capable of granting partitioning - which is, for short, the property of fault containment. It is based on the detection of spatial and temporal interference, and has been named the online detection of interference (ODIn) architecture. Spatial partitioning requires that an application is not able to corrupt resources used by a different application. In the architecture proposed in this thesis, spatial partitioning is implemented using type-1 hypervisors, which allow definition of resource partitions. An application running in a partition can only access resources granted to that partition, therefore it cannot corrupt resources used by applications running in other partitions. Temporal partitioning requires that an application is not able to unexpectedly change the execution time of other applications. In the proposed architecture, temporal partitioning has been solved using a bounded interference approach, composed of an offline analysis phase and an online safety net. The offline phase is based on a statistical profiling of a metric sensitive to temporal interference’s, performed in nominal conditions, which allows definition of a set of three thresholds: 1. the detection threshold TD; 2. the warning threshold TW ; 3. the α threshold. Two rules of detection are defined using such thresholds: Alarm rule When the value of the metric is above TD. Warning rule When the value of the metric is in the warning region [TW ;TD] for more than α consecutive times. ODIn’s online safety-net exploits performance counters, available in many MPSoC architectures; such counters are configured at bootstrap to monitor the selected metric(s), and to raise an interrupt request (IRQ) in case the metric value goes above TD, implementing the alarm rule. The warning rule is implemented in a software detection module, which reads the value of performance counters when the monitored task yields control to the scheduler and reset them if there is no detection. ODIn also uses two additional detection mechanisms: 1. a control flow check technique, based on compile-time defined block signatures, is implemented through a set of watchdog processors, each monitoring one partition. 2. a timeout is implemented through a system watchdog timer (SWDT), which is able to send an external signal when the timeout is violated. The recovery actions implemented in ODIn are: ‱ graceful degradation, to react to IRQs of WDPs monitoring non-critical applications or to warning rule violations; it temporarily stops non-critical applications to grant resources to the critical application; ‱ hard recovery, to react to the SWDT, to the WDP of the critical application, or to alarm rule violations; it causes a switch to a hot stand-by spare computer. Experimental validation of ODIn was performed on two hardware platforms: the ZedBoard - dual-core - and the Inventami board - quad-core. A space benchmark and an avionic benchmark were implemented on both platforms, composed by different modules as showed in Table 1 Each version of the final application was evaluated through fault injection (FI) campaigns, performed using a specifically designed FI system. There were three types of FI campaigns: 1. HW FI, to emulate single event effects; 2. SW FI, to emulate bugs in non-critical applications; 3. artificial bug FI, to emulate a bug in non-critical applications introducing unexpected interference on the critical application. Experimental results show that ODIn is resilient to all considered types of faul

    Characterization of Interconnection Delays in FPGAS Due to Single Event Upsets and Mitigation

    Get PDF
    RÉSUMÉ L’utilisation incessante de composants Ă©lectroniques Ă  gĂ©omĂ©trie toujours plus faible a engendrĂ© de nouveaux dĂ©fis au fil des ans. Par exemple, des semi-conducteurs Ă  mĂ©moire et Ă  microprocesseur plus avancĂ©s sont utilisĂ©s dans les systĂšmes avioniques qui prĂ©sentent une susceptibilitĂ© importante aux phĂ©nomĂšnes de rayonnement cosmique. L'une des principales implications des rayons cosmiques, observĂ©e principalement dans les satellites en orbite, est l'effet d'Ă©vĂ©nements singuliers (SEE). Le rayonnement atmosphĂ©rique suscite plusieurs prĂ©occupations concernant la sĂ©curitĂ© et la fiabilitĂ© de l'Ă©quipement avionique, en particulier pour les systĂšmes qui impliquent des rĂ©seaux de portes programmables (FPGA). Les FPGA Ă  base de cellules de mĂ©moire statique (SRAM) prĂ©sentent une solution attrayante pour mettre en oeuvre des systĂšmes complexes dans le domaine de l’avionique. Les expĂ©riences de rayonnement rĂ©alisĂ©es sur les FPGA ont dĂ©voilĂ© la vulnĂ©rabilitĂ© de ces dispositifs contre un type particulier de SEE, Ă  savoir, les Ă©vĂ©nements singuliers de changement d’état (SEU). Un SEU est considĂ©rĂ©e comme le changement de l'Ă©tat d'un Ă©lĂ©ment bistable (c'est-Ă -dire, un bit-flip) dĂ» Ă  l'effet d'un ion, d'un proton ou d’un neutron Ă©nergĂ©tique. Cet effet est non destructif et peut ĂȘtre corrigĂ© en rĂ©Ă©crivant la partie de la SRAM affectĂ©e. Les changements de dĂ©lai (DC) potentiels dus aux SEU affectant la mĂ©moire de configuration de routage ont Ă©tĂ© rĂ©cemment confirmĂ©s. Un des objectifs de cette thĂšse consiste Ă  caractĂ©riser plus prĂ©cisĂ©ment les DC dans les FPGA causĂ©s par les SEU. Les DC observĂ©s expĂ©rimentalement sont prĂ©sentĂ©s et la modĂ©lisation au niveau circuit de ces DC est proposĂ©e. Les circuits impliquĂ©s dans la propagation du dĂ©lai sont validĂ©s en effectuant une modĂ©lisation prĂ©cise des blocs internes Ă  l'intĂ©rieur du FPGA et en exĂ©cutant des simulations. Les rĂ©sultats montrent l’origine des DC qui sont en accord avec les mesures expĂ©rimentales de dĂ©lais. Les modĂšles proposĂ©s au niveau circuit sont, aux meilleures de notre connaissance, le premier travail qui confirme et explique les dĂ©lais combinatoires dans les FPGA. La conception d'un circuit moniteur de dĂ©lai pour la dĂ©tection des DC a Ă©tĂ© faite dans la deuxiĂšme partie de cette thĂšse. Ce moniteur permet de dĂ©tecter un changement de dĂ©lai sur les sections critiques du circuit et de prĂ©venir les pannes de synchronisation engendrĂ©es par les SEU sans utiliser la redondance modulaire triple (TMR).----------ABSTRACT The unrelenting demand for electronic components with ever diminishing feature size have emerged new challenges over the years. Among them, more advanced memory and microprocessor semiconductors are being used in avionic systems that exhibit a substantial susceptibility to cosmic radiation phenomena. One of the main implications of cosmic rays, which was primarily observed in orbiting satellites, is single-event effect (SEE). Atmospheric radiation causes several concerns regarding the safety and reliability of avionics equipment, particularly for systems that involve field programmable gate arrays (FPGA). SRAM-based FPGAs, as an attractive solution to implement systems in aeronautic sector, are very susceptible to SEEs in particular Single Event Upset (SEU). An SEU is considered as the change of the state of a bistable element (i.e., bit-flip) due to the effect of an energetic ion or proton. This effect is non-destructive and may be fixed by rewriting the affected part. Sensitivity evaluation of SRAM-based FPGAs to a physical impact such as potential delay changes (DC) has not been addressed thus far in the literature. DCs induced by SEU can affect the functionality of the logic circuits by disturbing the race condition on critical paths. The objective of this thesis is toward the characterization of DCs in SRAM-based FPGAs due to transient ionizing radiation. The DCs observed experimentally are presented and the circuit-level modeling of those DCs is proposed. Circuits involved in delay propagation are reverse-engineered by performing precise modeling of internal blocks inside the FPGA and executing simulations. The results show the root cause of DCs that are in good agreement with experimental delay measurements. The proposed circuit level models are, to the best of our knowledge, the first work on modeling of combinational delays in FPGAs.In addition, the design of a delay monitor circuit for DC detection is investigated in the second part of this thesis. This monitor allowed to show experimentally cumulative DCs on interconnects in FPGA. To this end, by avoiding the use of triple modular redundancy (TMR), a mitigation technique for DCs is proposed and the system downtime is minimized. A method is also proposed to decrease the clock frequency after DC detection without interrupting the process

    Network-on-Chip -based Multi-Processor System-on-Chip: Towards Mixed-Criticality System Certification

    Get PDF
    L'abstract Ăš presente nell'allegato / the abstract is in the attachmen

    Fault Tolerant Electronic System Design

    Get PDF
    Due to technology scaling, which means reduced transistor size, higher density, lower voltage and more aggressive clock frequency, VLSI devices may become more sensitive against soft errors. Especially for those devices used in safety- and mission-critical applications, dependability and reliability are becoming increasingly important constraints during the development of system on/around them. Other phenomena (e.g., aging and wear-out effects) also have negative impacts on reliability of modern circuits. Recent researches show that even at sea level, radiation particles can still induce soft errors in electronic systems. On one hand, processor-based system are commonly used in a wide variety of applications, including safety-critical and high availability missions, e.g., in the automotive, biomedical and aerospace domains. In these fields, an error may produce catastrophic consequences. Thus, dependability is a primary target that must be achieved taking into account tight constraints in terms of cost, performance, power and time to market. With standards and regulations (e.g., ISO-26262, DO-254, IEC-61508) clearly specify the targets to be achieved and the methods to prove their achievement, techniques working at system level are particularly attracting. On the other hand, Field Programmable Gate Array (FPGA) devices are becoming more and more attractive, also in safety- and mission-critical applications due to the high performance, low power consumption and the flexibility for reconfiguration they provide. Two types of FPGAs are commonly used, based on their configuration memory cell technology, i.e., SRAM-based and Flash-based FPGA. For SRAM-based FPGAs, the SRAM cells of the configuration memory highly susceptible to radiation induced effects which can leads to system failure; and for Flash-based FPGAs, even though their non-volatile configuration memory cells are almost immune to Single Event Upsets induced by energetic particles, the floating gate switches and the logic cells in the configuration tiles can still suffer from Single Event Effects when hit by an highly charged particle. So analysis and mitigation techniques for Single Event Effects on FPGAs are becoming increasingly important in the design flow especially when reliability is one of the main requirements

    Run-time reconfigurable, fault-tolerant FPGA systems for space applications

    Get PDF
    Cozzi D. Run-time reconfigurable, fault-tolerant FPGA systems for space applications. Bielefeld: UniversitÀt Bielefeld; 2016.The aim of this thesis is to investigate the use of Dynamic Partial Reconfiguration (DPR) on Commercial Off-the-Shelf (COTS) FPGAs in space applications. Reconfigurable systems gained interest in a wide range of application fields, including aerospace, where electronic devices are exposed to a harsh working environment. COTS SRAM-based FPGA devices represent an interesting hardware platform for this kind of systems since they combine low cost with the possibility to utilize state-of-the-art processing power as well as the flexibility of reconfigurable hardware. FPGA architectures have high computational power and thanks to their ability to be reconfigured at run-time, they became interesting candidates for payload processing in space applications. The presented Dynamic Reconfigurable Processing Module (DRPM) has been developed to investigate the use of the DPR approach for satellite payload processing. This scalable platform combines dynamically reconfigurable FPGAs with the required avionic interfaces (e.g., SpaceWire, MIL-STD-1553B, and SpaceFibre). In particular, a novel communication interface has been developed, the Heterogeneous Multi Processor Communication Interface (HMPCI), which allows inter-process communication with small latency and low memory footprint. Current synthesis tools do not support fully the DPR capabilities of FPGAs. Therefore, this thesis introduces INDRA 2.0: an INtegrated Design flow for Reconfigurable Architectures. The key part of INDRA 2.0 is DHHarMa: a Design flow for Homogeneous Hard Macros, which generates homogeneous hard macros for Xilinx FPGAs starting from a high-level description (e.g., VHDL). In particular, the homogeneous DHHarMa router is explained in detail, providing novel terminologies and algorithms, which have enabled the generation of homogeneous routed designs. Results have been shown that Design flow for Homogeneous Hard Macros (DHHarMa) can route homogeneously a communication infrastructure utilizing just between 1% and 31% more resources than the Xilinx router, which cannot provide a homogeneous solution. Furthermore, the permanent faults that can occur on FPGAs have been investigated. This thesis presents OLT(RE)2: an on-line on-demand approach to testing permanent faults induced by radiation in reconfigurable systems used in space missions. The proposed approach relies on a test circuit and custom placer and router. OLT(RE)2 exploits DPR to place the test circuits at run-time. Its goal is to test unprogrammed areas of the FPGA before using them. Experimental results of OLT(RE)2 have shown that is possible to generate, place, and route the test circuits needed to detect on average more than 99 % of the physical wires and on average about 97 % of the programmable interconnection points of a large arbitrary region of the FPGA in a reasonable time. Moreover, the test can be run on the target device without interfering the functional behavior of the system
    • 

    corecore