43 research outputs found

    Reducing the WCET and analysis time of systems with simple lockable instruction caches

    Get PDF
    One of the key challenges in real-time systems is the analysis of the memory hierarchy. Many Worst-Case Execution Time (WCET) analysis methods supporting an instruction cache are based on iterative or convergence algorithms, which are rather slow. Our goal in this paper is to reduce the WCET analysis time on systems with a simple lockable instruction cache, focusing on the Lock-MS method. First, we propose an algorithm to obtain a structure-based representation of the Control Flow Graph (CFG). It organizes the whole WCET problem as nested subproblems, which takes advantage of common branch-and-bound algorithms of Integer Linear Programming (ILP) solvers. Second, we add support for multiple locking points per task, each one with specific cache contents, instead of a given locked content for the whole task execution. Locking points are set heuristically before outer loops. Such simple heuristics adds no complexity, and reduces the WCET by taking profit of the temporal reuse found in loops. Since loops can be processed as isolated regions, the optimal contents to lock into cache for each region can be obtained, and the WCET analysis time is further reduced. With these two improvements, our WCET analysis is around 10 times faster than other approaches. Also, our results show that the WCET is reduced, and the hit ratio achieved for the lockable instruction cache is similar to that of a real execution with an LRU instruction cache. Finally, we analyze the WCET sensitivity to compiler optimization, showing for each benchmark the right choices and pointing out that O0 is always the worst option

    Best practice for caching of single-path code

    Get PDF
    Single-path code has some unique properties that make it interesting to explore different caching and prefetching alternatives for the stream of instructions. In this paper, we explore different cache organizations and how they perform with single-path code

    Towards Multicore WCET Analysis

    Get PDF
    AbsInt is the leading provider of commercial tools for static code-level timing analysis. Its aiT Worst-Case Execution Time Analyzer computes tight bounds for the WCET of tasks in embedded real-time systems. However, the results only incorporate the core-local latencies, i.e. interference delays due to other cores in a multicore system are ignored. This paper presents some of the work we have done towards multicore WCET analysis. We look into both static and measurement-based timing analysis for COTS multicore systems

    Automatic Safe Data Reuse Detection for the WCET Analysis of Systems With Data Caches

    Get PDF
    Worst-case execution time (WCET) analysis of systems with data caches is one of the key challenges in real-time systems. Caches exploit the inherent reuse properties of programs, temporarily storing certain memory contents near the processor, in order that further accesses to such contents do not require costly memory transfers. Current worst-case data cache analysis methods focus on specific cache organizations (LRU, locked, ACDC, etc.). In this article, we analyze data reuse (in the worst case) as a property of the program, and thus independent of the data cache. Our analysis method uses Abstract Interpretation on the compiled program to extract, for each static load/store instruction, a linear expression for the address pattern of its data accesses, according to the Loop Nest Data Reuse Theory. Each data access expression is compared to that of prior (dominant) memory instructions to verify whether it presents a guaranteed reuse. Our proposal manages references to scalars, arrays, and non-linear accesses, provides both temporal and spatial reuse information, and does not require the exploration of explicit data access sequences. As a proof of concept we analyze the TACLeBench benchmark suite, showing that most loads/stores present data reuse, and how compiler optimizations affect it. Using a simple hit/miss estimation on our reuse results, the time devoted to data accesses in the worst case is reduced to 27% compared to an always-miss system, equivalent to a data hit ratio of 81%. With compiler optimization, such time is reduced to 6.5%

    A Probabilistically Analyzable Cache to Estimate Timing Bounds

    Get PDF
    RÉSUMÉ - Les architectures informatiques modernes cherchent à accélérer la performance moyenne des logiciels en cours d’exécution. Les caractéristiques architecturales comme : deep pipelines, prédiction de branchement, exécution hors ordre, et hiérarchie des mémoire à multiple niveaux ont un impact négatif sur le logiciel de prédiction temporelle. En particulier, il est difficile, voire impossible, de faire une estimation précise du pire cas de temps d’exécution (WCET) d’un programme ou d’un logiciel en cours d’exécution sur une plateforme informatique particulière. Les systèmes embarqués critiques temps réel (CRTESs), par exemple les systèmes informatiques dans le domaine aérospatiale, exigent des contraintes de temps strictes pour garantir leur fonctionnement opérationnel. L’analyse du WCET est l’idée centrale du développement des systèmes temps réel puisque les systèmes temps réel ont toujours besoin de respecter leurs échéances. Afin de répondre aux exigences du délai, le WCET des tâches des systèmes temps réel doivent être déterminées, et cela est seulement possible si l’architecture informatique est temporellement prévisible. En raison de la nature imprévisible des systems informatiques modernes, il est peu pratique d’utiliser des systèmes informatiques avancés dans les CRTESs. En temps réel, les systèmes ne doivent pas répondre aux exigences de haute performance. Les processeurs conçus pour améliorer la performance des systèmes informatiques en général peuvent ne pas être compatibles avec les exigences pour les systèmes temps réel en raison de problèmes de prédictabilité. Les techniques d’analyse temporelle actuelles sont bien établies, mais nécessitent une connaissance détaillée des opérations internes et de l’état du système pour le matériel et le logiciel. Le manque de connaissances approfondies des opérations architecturales devient un obstacle à l’adoption de techniques déterministes de l’analyse temporelle (DTA) pour mesurer le WCET. Les techniques probabilistes de l’analyse temporelle (PTA) ont, quant à elles, émergé comme les techniques d’analyse temporelle pour la prochaine génération de systèmes temps réel. Les techniques PTA réduisent l’étendue des connaissances nécessaires pour l’exécution d’un logiciel informatique afin d’effectuer des estimations précises du WCET. Dans cette thèse, nous proposons le développement d’une nouvelle technique pour un cache probabilistiquement analysable, tout en appliquant les techniques PTA pour prédire le temps d’exécution d’un logiciel. Dans ce travail, nous avons mis en place une cache aléatoire pour les processeurs MIPS-32 et Leon-3. Nous avons conçu et mis en œuvre les politiques de placement et remplacement aléatoire et appliquer des techniques temporelles probabilistiques pour mesurer le WCET probabiliste (pWCET). Nous avons également mesuré le niveau de pessimisme encouru par les techniques probabilistes et comparé cela avec la configuration du cache déterministe. La prédiction du WCET fournie par les techniques PTA est plus proche de la durée d’exécution réelle du programme. Nous avons comparé les estimations avec les mesures effectuées sur le processeur pour aider le concepteur à évaluer le niveau de pessimisme introduit par l’architecture du cache pour chaque technique d’analyse temporelle probabiliste. Ce travail fait une première tentative de comparaison des analyses temporelles déterministes, statiques et de l’analyse temporelle probabiliste basée sur des mesures pour l’estimation du temps d’execution sous différentes configurations de cache. Nous avons identifié les points forts et les limites de chaque technique pour la prévision du temps d’execution, puis nous avons fourni des directives pour la conception du processeur qui minimisent le pessimisme associé au WCET. Nos expériences montrent que le cache répond à toutes les conditions pour PTA et la prévision du programme peut être déterminée avec une précision arbitraire. Une telle architecture probabiliste offre un potentiel inégalé et prometteur pour les prochaines générations du CRTESs. ---------- ABSTRACT - Modern computer architectures are targeted towards speeding up the average performance of software running on it. Architectural features like: deep pipelines, branch prediction, outof-order execution, and multi-level memory hierarchies have an adverse impact on software timing prediction. Particularly, it is hard or even impossible to make an accurate estimation of the worst case execution-time (WCET) of a program or software running on a particular hardware platform. Critical real-time embedded systems (CRTESs), e.g. computing systems in aerospace require strict timing constraints to guarantee their proper operational behavior. WCET analysis is the central idea of the real-time systems development because real-time systems always need to meet their deadlines. In order to meet the deadline requirements, WCET of the real-time systems tasks must be determined, and this is only possible if the hardware architecture is time-predictable. Due to the unpredictable nature of the modern computing hardware, it is not practical to use advanced computing systems in CRTESs. The real-time systems do not need to meet high-performance requirements. The processor designed to improve average cases performance may not fit the requirements for the real-time systems due to predictability issues. Current timing analysis techniques are well established, but require detailed knowledge of the internal operations and the state of the system for both hardware and software. Lack of in-depth knowledge of the architectural operations become an obstacle for adopting the deterministic timing analysis (DTA) techniques for WCET measurement. Probabilistic timing analysis (PTA) is a technique that emerged for the timing analysis of the next-generation real-time systems. The PTA techniques reduce the extent of knowledge of a software execution platform that is needed to perform the accurate WCET estimations. In this thesis, we propose the development of a new probabilistically analyzable cache and applied PTA techniques for time-prediction. In this work, we implemented a randomized cache for MIPS- 32 and Leon-3 processors. We designed and implemented random placement and replacement policies, and applied probabilistic timing techniques to measure probabilistic WCET (pWCET). We also measured the level of pessimism incurred by the probabilistic techniques and compared it with the deterministic cache configuration. The WCET prediction provided by the PTA techniques is closer to the real execution-time of the program. We compared the estimates with the measurements done on the processor to help the designer to evaluate the level of pessimism introduced by the cache architecture for each probabilistic timing analysis technique. This work makes a first attempt towards the comparison of deterministic, static, and measurement-based probabilistic timing analysis for time-prediction under varying cache configurations. We identify strengths and limitations of each technique for time- prediction, and provide guidelines for the design of the processor that minimize the pessimism associated with WCET. Our experiments show that the cache fulfills all the requirements for PTA and program prediction can be determined with arbitrary accuracy. Such probabilistic computer architecture carries unmatched potential and great promise for next generation CRTESs

    TIME-PREDICTABLE EXECUTION OF EMBEDDED SOFTWARE ON MULTI-CORE PLATFORMS

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Smart hardware designs for probabilistically-analyzable processor architectures

    Get PDF
    Future Critical Real-Time Embedded Systems (CRTES), like those is planes, cars or trains, require more and more guaranteed performance in order to satisfy the increasing performance demands of advanced complex software features. While increased performance can be achieved by deploying processor techniques currently used in High-Performance Computing (HPC) and mainstream domains, their use challenges the software timing analysis, a necessary step in CRTES' verification and validation. Cache memories are known to have high impact in performance, and in fact, current CRTES include multicores usually with several levels of cache. In this line, this Thesis aims at increasing the guaranteed performance of CRTES by using techniques for caches building upon time randomization and providing probabilistic guarantees of tasks' execution time. In this Thesis, we first focus on on improving cache placement and replacement to improve guaranteed performance. For placement, different existing policies are explored in a multi-level cache setup, and a solution is reached in which different of those policies are combined. For cache replacement, we analyze a pathological scenario that no cache policy so far accounts and propose several policies that fix this pathological scenario. For shared caches in multicore we observe that contention is mainly caused by private writes that go through to the shared cache, yet using a pure write-back policy also has its drawbacks. We propose a hybrid approach to mitigate this contention. Building on this solution, the next contribution tackles a problem caused by the need of some reliability mechanisms in CRTES. Implementing reliability close to the processor's core has a significant impact in performance. A look-ahead error detection solution is proposed to greatly mitigate the performance impact. The next contribution proposes the first hardware prefetcher for CRTES with arbitrary cache hierarchies. Given its speculative nature, prefetchers that have a guaranteed positive impact on performance are difficult to design. We present a framework that provides execution time guarantees and obtains a performance benefit. Finally, we focus on the impact of timing anomalies in CRTES with caches. For the first time, a definition and taxonomy of timing anomalies is given for Measurement-Based Timing Analysis. Then, we focus on a specific timing anomaly that can happen with caches and provide a solution to account for it in the execution time estimates.Los Sistemas Empotrados de Tiempo-Real Crítico (SETRC), como los de los aviones, coches o trenes, requieren más y más rendimiento garantizado para satisfacer la demanda al alza de rendimiento para funciones complejas y avanzadas de software. Aunque el incremento en rendimiento puede ser adquirido utilizando técnicas de arquitectura de procesadores actualmente utilizadas en la Computación de Altas Prestaciones (CAP) i en los dominios convencionales, este uso presenta retos para el análisis del tiempo de software, un paso necesario en la verificación y validación de SETRC. Las memorias caches son conocidas por su gran impacto en rendimiento y, de hecho, los actuales SETRC incluyen multicores normalmente con diversos niveles de cache. En esta línea, esta Tesis tiene como objetivo mejorar el rendimiento garantizado de los SETRC utilizando técnicas para caches y utilizando métodos como la randomización del tiempo y proveyendo garantías probabilísticas de tiempo de ejecución de las tareas. En esta Tesis, primero nos centramos en mejorar la colocación y el reemplazo de caches para mejorar el rendimiento garantizado. Para la colocación, diferentes políticas son exploradas en un sistema cache multi-nivel, y se llega a una solución donde diversas de estas políticas son combinadas. Para el reemplazo, analizamos un escenario patológico que ninguna política actual tiene en cuenta, y proponemos varias políticas que solucionan este escenario patológico. Para caches compartidas en multicores, observamos que la contención es causada principalmente por escrituras privadas que van a través de la cache compartida, pero usar una política de escritura retardada pura también tiene sus consecuencias. Proponemos un enfoque híbrido para mitigar la contención. Sobre esta solución, la siguiente contribución ataca un problema causado por la necesidad de mecanismos de fiabilidad en SETRC. Implementar fiabilidad cerca del núcleo del procesador tiene un impacto significativo en rendimiento. Una solución basada en anticipación se propone para mitigar el impacto en rendimiento. La siguiente contribución propone el primer prefetcher hardware para SETRC con una jerarquía de caches arbitraria. Por primera vez, se da una definición y taxonomía de anomalías temporales para Análisis Temporal Basado en Medidas. Después, nos centramos en una anomalía temporal concreta que puede pasar con caches y ofrecemos una solución que la tiene en cuenta en las estimaciones del tiempo de ejecución.Postprint (published version

    Multi-core Interference-Sensitive WCET Analysis Leveraging Runtime Resource Capacity Enforcement

    Get PDF
    The performance and power efficiency of multi-core processors are attractive features for safety-critical applications, as in avionics. But increased integration and average-case performance optimizations pose challenges when deploying them for such domains. In this paper we propose a novel approach to compute a interference-sensitive Worst-Case Execution Time (isWCET) considering variable accesses delays due to the concurrent use of shared resources in multi-core processors. Thereby we tackle the problem of temporal partitioning as it is required by safety-critical applications. In particular, we introduce additional phases to state-of-the-art timing analysis techniques to analyse an applications resource usage and compute an interference delay. We further complement the offline analysis with a runtime monitoring concept to enforce resource usage guarantees. The concepts are evaluated on Freescale's P4080 multi-core processor in combination with SYSGO's commercial real-time operating system PikeOS and AbsInt's timing analysis framework aiT. We abstract real applications' behavior using a representative task set of the EEMBC Autobench benchmark suite. Our results show a reduction of up to 75% of the multi-core Worst-Case Execution Time (WCET), while implementing full transparency to the temporal and functional behavior of applications, enabling the seamless integration of legacy applications

    A framework to experiment optimizations for real-time and embedded software

    Get PDF
    Typical constraints on embedded systems include code size limits, upper bounds on energy consumption and hard or soft deadlines. To meet these requirements, it may be necessary to improve the software by applying various kinds of transformations like compiler optimizations, specific mapping of code and data in the available memories, code compression, etc. However, a transformation that aims at improving the software with respect to a given criterion might engender side effects on other criteria and these effects must be carefully analyzed. For this purpose, we have developed a common framework that makes it possible to experiment various code transfor-mations and to evaluate their impact of various criteria. This work has been carried out within the French ANR MORE project.Comment: International Conference on Embedded Real Time Software and Systems (ERTS2), Toulouse : France (2010
    corecore