5 research outputs found

    Cache Predictability and Performance Improvement in ARINC-653 Compliant Systems

    Get PDF
    Depuis les années 2000, les processeurs multi-coeurs sont développés afin de répondre à une demande croissante en performances et miniaturisation. Ces nouvelles architectures viennent remplacer les processeurs mono-coeurs, moins rentables sur le plan des performances et de la consommation énergétique. De par cette transition, les systèmes avioniques actuels se retrouvent face à un défi de taille. Ces systèmes critiques n’utilisent que des processeurs mono-coeurs, éprouvés et validés depuis des années afin de garantir la fiabilité du système. Cependant, les fabriquants de processeurs et autres microcontrôleur délaissent peu à peu ces architectures pour ne produire que des processeurs multi-coeurs. Afin de maintenir les systèmes avioniques critiques, les intégrateurs doivent alors se tourner vers ces nouveaux processeurs. Cependant, cette transition n’est pas sans défi. Outre le fait de devoir assurer la portabilité des applications mono-coeur dans un environnement multi-coeurs, l’utilisation de plusieurs coeurs permet leur exécution concurrente. Ce nouveau paradigme apporte aux systèmes des comportements qui peuvent entrainer, dans certains cas, un dysfonctionnement complet du système. De tels comportements ne sont pas acceptables dans ces systèmes où la moindre faute peut provoquer des pertes humaines. Les systèmes critiques suivent certaines règles afin de garantir leur intégrité. Le standard ARINC-653 définit un ensemble de règles et de recommandations afin de développer ce genre de systèmes. Le standard introduit le concept de système partitionné où chaque partition s’exécute indépendamment des autres et ne peut pas influer sur le comportement du système ou des autres partitions. Ainsi, si une partition vient à fonctionner anormalement, son exécution ne peut compromettre le bon fonctionnement des autres partitions. Le problème émergeant dans les architectures multicoeurs vient du fait que plusieurs partitions peuvent s’exécuter de manière parallèle. Cette nouvelle possibilité introduit de la concurrence sur les ressources du système, ce qui génère des comportements non prévisibles. Ces comportements, appelés interférences apparaissent lorsque plusieurs coeurs partagent les mêmes ressources. Lors d’un accès à ces ressources (mémoire, périphériques, etc.), un arbitrage doit être fait afin d’assurer l’intégrité des données. Cet arbitrage peut causer des délais dans l’accès à une ressource. De plus si plusieurs partitions accèdent à une même ressource, le concept d’isolation n’est plus respecté. Dans le cas des mémoires caches partagées, une partition peut évincer des données utilisées par une autre partition. Dans ce mémoire, nous étudions la possibilité d’empêcher l’évincement de données des caches privés d’un processeur. Cette méthode, appelée cache locking, permet de réduire le nombre de fautes de cache dans les caches privés et ainsi limiter les accès aux caches partagés. Cela permet de réduire les interférences liées aux caches partagés, non seulementen termes de concurrence d’accès, mais aussi d’évincement non voulus de données dans ces caches. Ainsi nous introduisons un outil de profilage d’utilisation de la mémoire dans les systèmes partitionnés. Nous présentons aussi un algorithme associé à cet outil permettant de sélectionner le contenu des mémoires caches devant être empêché d’être évincé. Cet outil propose un processus complet de traitement des traces d’accès mémoire jusqu’à la création des fichiers de configuration. Nous avons validé notre approche par le biais de simulation et d’expérimentation sur matériel réel. Un système d’exploitation temps réel respectant la norme ARINC-653 a été utilisé afin de conduire nos expérimentations. Les résultats obtenus sont encourageants et permettent de comprendre l’impact des méthodes de caches locking pour les systèmes embarqués multi-coeurs.----------ABSTRACT: Due to their energy efficiency and their capability to be miniaturized, multi-core processors have been developed to replace the less cost-effective single-core architectures. Since around year 2000, processor manufacturers slowly stopped producing single-core processors. This raised an issue for avionic system designers. In these critical systems, designers use processors that have proven their reliability through time. However, none of such processors are multi-core. To be able to keep their system up to date, aerospace system designers will have to make the transition to multi-core architectures. This transition brings a lot of challenges to system designers and integrator. Current single-core applications may not be fully portable to multi-core systems. Thus developers will have to make sure the transition is possible for such applications. Multi-core CPUs offer the ability to execute multiple tasks in parallel. From this ability, new behaviors may induce delays, bugs and undefined behaviors that may result in general system failure. In critical systems, where safety is crucial, such comportment is unacceptable. Safety critical systems have to comply with multiple standards and guidance to ensure their reliability. One of the standard Real Time Operating Systems developers may rely on is the ARINC-653. This standard introduces the concept of partitioned systems. In such systems, each partition runs independently and should never be able to modify or impact the behavior of another partition. This concept ensures that if one partition comes to misbehave, the system’s integrity is not impacted. In multi-core systems, multiple applications can run in parallel and access hardware and software resources at the same time. This concurrence, if not correctly managed, will introduce delays in execution time, loss of performances and unwanted behaviors. We call interferences any behavior introduced by this concurrence on the resources shared by different cores or partitions. When concurrent accesses to shared components occur, arbitration has to be done to ensure the data integrity. In most cases, this arbitration is the cause of interferences. However, other sources of interference exist. For instance, if two partitions share the same cache, one partition may evict other partition data from the cache. This leads to unpredictable delays when the next partitions will need to access the evicted data. In this thesis, we explore methods to prevent cache line evictions in private processor caches. This work allows to reduce the number of cache misses occurring at the private level, which reduces the amount of access done to the lower memory levels and reduces interferences related to them. We call this method cache locking. We introduce a framework capable of profiling memory accesses done by applications and propose a cache content selection algorithm that selects cache lines to be locked to reduce cache misses in private caches. We present the toll and the associated processing pipeline, from the memory profiling, to the cache locking configuration table generation. We validated our approach on simulated and actual hardware and used a proprietary ARINC-653 compliant system to conduct our experiments. The results obtained are encouraging and allow to understand the impact of private caches and cache locking methods to reduce multi-core interferences in safety-critical systems

    Integrated Modular Avionics (IMA) Partition Scheduling with Conflict-Free I/O for Multicore Avionics Systems

    No full text

    Fault management via dynamic reconfiguration for integrated modular avionics

    Get PDF
    The purpose of this research is to investigate fault management methodologies within Integrated Modular Avionics (IMA) systems, and develop techniques by which the use of dynamic reconfiguration can be implemented to restore higher levels of systems redundancy in the event of a systems fault. A proposed concept of dynamic configuration has been implemented on a test facility that allows controlled injection of common faults to a representative IMA system. This facility allows not only the observation of the response of the system management activities to manage the fault, but also analysis of real time data across the network to ensure distributed control activities are maintained. IMS technologies have evolved as a feasible direction for the next generation of avionic systems. Although federated systems are logical to design, certify and implement, they have some inherent limitations that are not cost beneficial to the customer over long life-cycles of complex systems, and hence the fundamental modular design, i.e. common processors running modular software functions, provides a flexibility in terms of configuration, implementation and upgradability that cannot be matched by well-established federated avionic system architectures. For example, rapid advances of computing technology means that dedicated hardware can become outmoded by component obsolescence which almost inevitably makes replacements unavailable during normal life-cycles of most avionic systems. To replace the obsolete part with a newer design involves a costly re-design and re-certification of any relevant or interacting functions with this unit. As such, aircraft are often known to go through expensive mid-life updates to upgrade all avionics systems. In contrast, a higher frequency of small capability upgrades would maximise the product performance, including cost of development and procurement, in constantly changing platform deployment environments. IMA is by no means a new concept and work has been carried out globally in order to mature the capability. There are even examples where this technology has been implemented as subsystems on service aircraft. However, IMA flexible configuration properties are yet to be exploited to their full extent; it is feasible that identification of faults or failures within the system would lead to the exploitation of these properties in order to dynamically reconfigure and maintain high levels of redundancy in the event of component failure. It is also conceivable to install redundant components such that an IMS can go through a process of graceful degradation, whereby the system accommodates a number of active failures, but can still maintain appropriate levels of reliability and service. This property extends the average maintenance-free operating period, ensuring that the platform has considerably less unscheduled down time and therefore increased availability. The content of this research work involved a number of key activities in order to investigate the feasibility of the issues outlined above. The first was the creation of a representative IMA system and the development of a systems management capability that performs the required configuration controls. The second aspect was the development of hardware test rig in order to facilitate a tangible demonstration of the IMA capability. A representative IMA was created using LabVIEW Embedded Tool Suit (ETS) real time operating system for minimal PC systems. Although this required further code written to perform IMS middleware functions and does not match up to the stringent air safety requirements, it provided a suitable test bed to demonstrate systems management capabilities. The overall IMA was demonstrated with a 100kg scale Maglev vehicle as a test subject. This platform provides a challenging real-time control problem, analogous to an aircraft flight control system, requiring the calculation of parallel control loops at a high sampling rate in order to maintain magnetic suspension. Although the dynamic properties of the test rig are not as complex as a modern aircraft, it has much less stringent operating requirements and therefore substantially less risk associated with failure to provide service. The main research contributions for the PhD are: 1.A solution for the dynamic reconfiguration problem for assigning required systems functions (namely a distributed, real-time control function with redundant processing channels) to available computing resources whilst protecting the functional concurrency and time critical needs of the control actions. 2.A systems management strategy that utilises the dynamic reconfiguration properties of an IMA System to restore high levels of redundancy in the presence of failures. The conclusion summarises the level of success of the implemented system in terms of an appropriate dynamic reconfiguration to the response of a fault signal. In addition, it highlights the issues with using an IMA to as a solution to operational goals of the target hardware, in terms of design and build complexity, overhead and resources

    Timing analysis in existing and emerging cyber physical systems

    Get PDF
    A main mission of safety-critical cyber-physical systems is to guarantee timing correctness. The examples of safety- critical systems are avionic, automotive or medical systems in which timing violations could have disastrous effects, from loss of human life to damage to machines and/or the environment. Over the past decade, multicore processors have become increasingly common for their potential of efficiency, which has made new single-core processors become relatively scarce. As a result, it has created a pressing need to transition to multicore processors. However, existing safety-critical software that has been certified on single-core processors is not allowed to be fielded on a multicore system as is. The issue stems from, namely, serious inter- core interference problems on shared resources in current multicore processors, which create non-deterministic timing behavior. Since meeting the timing constraints is the crucial requirement of safety-critical real-time systems, the use of more than one core in a multicore chip is currently not certified yet by the authorities. Academia has paid relatively little attention to non-determinism due to uncoordinated I/O communications, as compared with other resources such as cache or memory, although industry considers it as one of the most troublesome challenges. Hence we focused on I/O synchronization, requiring no information of Worst Case Execution Time (WCET) that can get impacted by other interference sources. Traditionally, a two-level scheduling, such as Integrated Modular Avionics system (IMA), has been used for providing temporal isolation capability. However, such hierarchical approaches introduce significant priority inversions across applications, especially in multicore systems, ultimately leading to lower system utilization. To address these issues, we have proposed a novel scheduling mechanism called budgeted generalized rate monotonic analysis (Budgeted GRMS) in which different applications’ tasks are globally scheduled for avoiding unnecessary priority inversions, yet the CPU resource is still partitioned for temporal isolation among applications. Incorporating the issues of no information of WCETs and I/O synchronization, this new scheduling paradigm enables the “safe” use of multicore processors in safety-critical real-time systems. Recently, newly emerging Internet of Things (IoT) and Smart City applications are becoming a part of cyber- physical systems, as the needs are required and the feasibility are getting visible. What we need to pay attention to is that the promises and challenges arising from IoT and Smart City applications are providing new research landscapes and opportunities and fundamentally transforming real-time scheduling. As mentioned earlier, in traditional real-time systems, an instance of a program execution (a process) is described as a scheduling entity, while, in the emerging applications, the fundamental schedulable units are chunks of data transported over communication media. Another transformation is that, in IoT and Smart City applications, there are multiple options and combinations to choose to utilize and schedule since there are massively deployed heterogeneous kinds of sensing devices. This is contrary to the existing real-time work which is given a fixed task set to be analyzed. For that reason, they also suggest variants of performance or quality optimization problems. Suppose a disaster response infrastructure in a troubled area to ensure safety of humanitarian missions. Cameras and other sensors are deployed along key routes to monitor local conditions, but turned off by default and turned on on-demand to save limited battery life. To determine a safe route to deliver humanitarian shipments, a decision-maker must collect reconnaissance information and schedule the data items to support timely decision-making. Such data items acquired from the time-evolving physical world are in general time-sensitive - a retrieved item may become stale and no longer be accurate/relevant as conditions in the physical environment change. Therefore, “when to acquire” affects the performance and correctness of such applications and thus the overall system safety and data timeliness should be carefully considered. For the addressed problem, we explored various algorithmic options for maximizing quality of information, and developed the optimal algorithm for the order of retrievals of data items to make multiple decisions. I believe this is a significant initial step toward expanding timing-safety research landscapes and opportunities in the emerging CPS area

    COMBINING HARDWARE MANAGEMENT WITH MIXED-CRITICALITY PROVISIONING IN MULTICORE REAL-TIME SYSTEMS

    Get PDF
    Safety-critical applications in cyber-physical domains such as avionics and automotive systems require strict timing constraints because loss of life or severe financial repercussions may occur if they fail to produce correct outputs at the right moment. We call such systems “real-time systems.” When designing a real-time system, a multicore platform would be desirable to use because such platforms have advantages in size, weight, and power constraints, especially in embedded systems. However, the multicore revolution is having limited impact in safety-critical application domains. A key reason is the “one-out-of-m” problem: when validating real-time constraints on an m-core platform, excessive analysis pessimism can effectively negate the processing capacity of the additional m-1 cores so that only “one core’s worth” of capacity is available. The root of this problem is that shared hardware resources are not predictably managed. Two approaches have been investigated previously to address this problem: mixed-criticality analysis, which provision less-critical software components less pessimistically, and hardware-management techniques, which make the underlying platform itself more predictable. The goal of the research presented in this dissertation is to combine both approaches to reduce the capacity loss caused by contention for shared hardware resources in multicore platforms. Towards that goal, fundamentally new criticality-cognizant hardware-management tradeoffs must be explored. Such tradeoffs are investigated in the context of a new variant of a mixed-criticality framework, called MC2, that supports configurable criticality-based hardware management. This framework allows specific DRAM banks and areas of the last-level cache to be allocated to certain groups of tasks to provide criticality-aware isolation. MC2 is further extended to support the sharing of memory locations, which is required to realize the ability to support real-world workloads. We evaluate the impact of combining mixed-criticality provisioning and hardware-management techniques with both micro-benchmark experiments and schedulability studies. In our micro-benchmark experiments, we evaluate each hardware-management technique and consider tradeoffs that arise when applying them together. The effectiveness of the overall framework in resolving such tradeoffs is investigated via largescale overhead-aware schedulability studies. Our results demonstrate that mixed-criticality analysis and hardware-management techniques can be much more effective when applied together instead of alone.Doctor of Philosoph
    corecore