43 research outputs found
Reducing the WCET and analysis time of systems with simple lockable instruction caches
One of the key challenges in real-time systems is the analysis of the memory hierarchy. Many Worst-Case Execution Time (WCET) analysis methods supporting an instruction cache are based on iterative or convergence algorithms, which are rather slow. Our goal in this paper is to reduce the WCET analysis time on systems with a simple lockable instruction cache, focusing on the Lock-MS method. First, we propose an algorithm to obtain a structure-based representation of the Control Flow Graph (CFG). It organizes the whole WCET problem as nested subproblems, which takes advantage of common branch-and-bound algorithms of Integer Linear Programming (ILP) solvers. Second, we add support for multiple locking points per task, each one with specific cache contents, instead of a given locked content for the whole task execution. Locking points are set heuristically before outer loops. Such simple heuristics adds no complexity, and reduces the WCET by taking profit of the temporal reuse found in loops. Since loops can be processed as isolated regions, the optimal contents to lock into cache for each region can be obtained, and the WCET analysis time is further reduced. With these two improvements, our WCET analysis is around 10 times faster than other approaches. Also, our results show that the WCET is reduced, and the hit ratio achieved for the lockable instruction cache is similar to that of a real execution with an LRU instruction cache. Finally, we analyze the WCET sensitivity to compiler optimization, showing for each benchmark the right choices and pointing out that O0 is always the worst option
Best practice for caching of single-path code
Single-path code has some unique properties that make it interesting to explore different caching and prefetching alternatives for the stream of instructions. In this paper, we explore different cache organizations and how they perform with single-path code
Towards Multicore WCET Analysis
AbsInt is the leading provider of commercial tools for static code-level timing analysis. Its aiT Worst-Case Execution Time Analyzer computes tight bounds for the WCET of tasks in embedded real-time systems. However, the results only incorporate the core-local latencies, i.e. interference delays due to other cores in a multicore system are ignored. This paper presents some of the work we have done towards multicore WCET analysis. We look into both static and measurement-based timing analysis for COTS multicore systems
Automatic Safe Data Reuse Detection for the WCET Analysis of Systems With Data Caches
Worst-case execution time (WCET) analysis of systems with data caches is one of the key challenges in real-time systems. Caches exploit the inherent reuse properties of programs, temporarily storing certain memory contents near the processor, in order that further accesses to such contents do not require costly memory transfers. Current worst-case data cache analysis methods focus on specific cache organizations (LRU, locked, ACDC, etc.). In this article, we analyze data reuse (in the worst case) as a property of the program, and thus independent of the data cache. Our analysis method uses Abstract Interpretation on the compiled program to extract, for each static load/store instruction, a linear expression for the address pattern of its data accesses, according to the Loop Nest Data Reuse Theory. Each data access expression is compared to that of prior (dominant) memory instructions to verify whether it presents a guaranteed reuse. Our proposal manages references to scalars, arrays, and non-linear accesses, provides both temporal and spatial reuse information, and does not require the exploration of explicit data access sequences. As a proof of concept we analyze the TACLeBench benchmark suite, showing that most loads/stores present data reuse, and how compiler optimizations affect it. Using a simple hit/miss estimation on our reuse results, the time devoted to data accesses in the worst case is reduced to 27% compared to an always-miss system, equivalent to a data hit ratio of 81%. With compiler optimization, such time is reduced to 6.5%
A Probabilistically Analyzable Cache to Estimate Timing Bounds
RÉSUMÉ - Les architectures informatiques modernes cherchent à accélérer la performance moyenne
des logiciels en cours d’exécution. Les caractéristiques architecturales comme : deep pipelines,
prédiction de branchement, exécution hors ordre, et hiérarchie des mémoire à multiple
niveaux ont un impact négatif sur le logiciel de prédiction temporelle. En particulier, il est
difficile, voire impossible, de faire une estimation précise du pire cas de temps d’exécution
(WCET) d’un programme ou d’un logiciel en cours d’exécution sur une plateforme informatique
particulière. Les systèmes embarqués critiques temps réel (CRTESs), par exemple
les systèmes informatiques dans le domaine aérospatiale, exigent des contraintes de temps
strictes pour garantir leur fonctionnement opérationnel. L’analyse du WCET est l’idée centrale
du développement des systèmes temps réel puisque les systèmes temps réel ont toujours
besoin de respecter leurs échéances. Afin de répondre aux exigences du délai, le WCET des
tâches des systèmes temps réel doivent être déterminées, et cela est seulement possible si
l’architecture informatique est temporellement prévisible. En raison de la nature imprévisible
des systems informatiques modernes, il est peu pratique d’utiliser des systèmes informatiques
avancés dans les CRTESs. En temps réel, les systèmes ne doivent pas répondre aux exigences
de haute performance. Les processeurs conçus pour améliorer la performance des systèmes
informatiques en général peuvent ne pas être compatibles avec les exigences pour les systèmes
temps réel en raison de problèmes de prédictabilité. Les techniques d’analyse temporelle actuelles
sont bien établies, mais nécessitent une connaissance détaillée des opérations internes
et de l’état du système pour le matériel et le logiciel. Le manque de connaissances approfondies
des opérations architecturales devient un obstacle à l’adoption de techniques déterministes
de l’analyse temporelle (DTA) pour mesurer le WCET. Les techniques probabilistes de l’analyse
temporelle (PTA) ont, quant à elles, émergé comme les techniques d’analyse temporelle
pour la prochaine génération de systèmes temps réel. Les techniques PTA réduisent l’étendue
des connaissances nécessaires pour l’exécution d’un logiciel informatique afin d’effectuer
des estimations précises du WCET. Dans cette thèse, nous proposons le développement d’une
nouvelle technique pour un cache probabilistiquement analysable, tout en appliquant les techniques
PTA pour prédire le temps d’exécution d’un logiciel. Dans ce travail, nous avons mis
en place une cache aléatoire pour les processeurs MIPS-32 et Leon-3. Nous avons conçu et mis
en œuvre les politiques de placement et remplacement aléatoire et appliquer des techniques
temporelles probabilistiques pour mesurer le WCET probabiliste (pWCET). Nous avons également
mesuré le niveau de pessimisme encouru par les techniques probabilistes et comparé
cela avec la configuration du cache déterministe. La prédiction du WCET fournie par les
techniques PTA est plus proche de la durée d’exécution réelle du programme. Nous avons
comparé les estimations avec les mesures effectuées sur le processeur pour aider le concepteur
à évaluer le niveau de pessimisme introduit par l’architecture du cache pour chaque technique
d’analyse temporelle probabiliste. Ce travail fait une première tentative de comparaison des
analyses temporelles déterministes, statiques et de l’analyse temporelle probabiliste basée sur
des mesures pour l’estimation du temps d’execution sous différentes configurations de cache.
Nous avons identifié les points forts et les limites de chaque technique pour la prévision du
temps d’execution, puis nous avons fourni des directives pour la conception du processeur
qui minimisent le pessimisme associé au WCET. Nos expériences montrent que le cache répond
à toutes les conditions pour PTA et la prévision du programme peut être déterminée
avec une précision arbitraire. Une telle architecture probabiliste offre un potentiel inégalé et
prometteur pour les prochaines générations du CRTESs.
---------- ABSTRACT - Modern computer architectures are targeted towards speeding up the average performance
of software running on it. Architectural features like: deep pipelines, branch prediction, outof-order
execution, and multi-level memory hierarchies have an adverse impact on software
timing prediction. Particularly, it is hard or even impossible to make an accurate estimation
of the worst case execution-time (WCET) of a program or software running on a particular
hardware platform.
Critical real-time embedded systems (CRTESs), e.g. computing systems in aerospace
require strict timing constraints to guarantee their proper operational behavior. WCET
analysis is the central idea of the real-time systems development because real-time systems
always need to meet their deadlines. In order to meet the deadline requirements, WCET of
the real-time systems tasks must be determined, and this is only possible if the hardware
architecture is time-predictable. Due to the unpredictable nature of the modern computing
hardware, it is not practical to use advanced computing systems in CRTESs. The real-time
systems do not need to meet high-performance requirements. The processor designed to
improve average cases performance may not fit the requirements for the real-time systems
due to predictability issues.
Current timing analysis techniques are well established, but require detailed knowledge
of the internal operations and the state of the system for both hardware and software. Lack
of in-depth knowledge of the architectural operations become an obstacle for adopting the
deterministic timing analysis (DTA) techniques for WCET measurement. Probabilistic timing
analysis (PTA) is a technique that emerged for the timing analysis of the next-generation
real-time systems. The PTA techniques reduce the extent of knowledge of a software execution
platform that is needed to perform the accurate WCET estimations. In this thesis,
we propose the development of a new probabilistically analyzable cache and applied PTA
techniques for time-prediction. In this work, we implemented a randomized cache for MIPS-
32 and Leon-3 processors. We designed and implemented random placement and replacement
policies, and applied probabilistic timing techniques to measure probabilistic WCET
(pWCET). We also measured the level of pessimism incurred by the probabilistic techniques
and compared it with the deterministic cache configuration. The WCET prediction provided
by the PTA techniques is closer to the real execution-time of the program. We compared the
estimates with the measurements done on the processor to help the designer to evaluate the
level of pessimism introduced by the cache architecture for each probabilistic timing analysis
technique. This work makes a first attempt towards the comparison of deterministic, static,
and measurement-based probabilistic timing analysis for time-prediction under varying cache
configurations. We identify strengths and limitations of each technique for time- prediction,
and provide guidelines for the design of the processor that minimize the pessimism associated
with WCET. Our experiments show that the cache fulfills all the requirements for PTA and
program prediction can be determined with arbitrary accuracy. Such probabilistic computer
architecture carries unmatched potential and great promise for next generation CRTESs
TIME-PREDICTABLE EXECUTION OF EMBEDDED SOFTWARE ON MULTI-CORE PLATFORMS
Ph.DDOCTOR OF PHILOSOPH
Smart hardware designs for probabilistically-analyzable processor architectures
Future Critical Real-Time Embedded Systems (CRTES), like those is planes, cars or trains, require more and more guaranteed performance in order to satisfy the increasing performance demands of advanced complex software features. While increased performance can be achieved by deploying processor techniques currently used in High-Performance Computing (HPC) and mainstream domains, their use challenges the software timing analysis, a necessary step in CRTES' verification and validation. Cache memories are known to have high impact in performance, and in fact, current CRTES include multicores usually with several levels of cache. In this line, this Thesis aims at increasing the guaranteed performance of CRTES by using techniques for caches building upon time randomization and providing probabilistic guarantees of tasks' execution time.
In this Thesis, we first focus on on improving cache placement and replacement to improve guaranteed performance. For placement, different existing policies are explored in a multi-level cache setup, and a solution is reached in which different of those policies are combined. For cache replacement, we analyze a pathological scenario that no cache policy so far accounts and propose several policies that fix this pathological scenario.
For shared caches in multicore we observe that contention is mainly caused by private writes that go through to the shared cache, yet using a pure write-back policy also has its drawbacks. We propose a hybrid approach to mitigate this contention. Building on this solution, the next contribution tackles a problem caused by the need of some reliability mechanisms in CRTES. Implementing reliability close to the processor's core has a significant impact in performance. A look-ahead error detection solution is proposed to greatly mitigate the performance impact.
The next contribution proposes the first hardware prefetcher for CRTES with arbitrary cache hierarchies. Given its speculative nature, prefetchers that have a guaranteed positive impact on performance are difficult to design. We present a framework that provides execution time guarantees and obtains a performance benefit.
Finally, we focus on the impact of timing anomalies in CRTES with caches. For the first time, a definition and taxonomy of timing anomalies is given for Measurement-Based Timing Analysis. Then, we focus on a specific timing anomaly that can happen with caches and provide a solution to account for it in the execution time estimates.Los Sistemas Empotrados de Tiempo-Real Crítico (SETRC), como los de los aviones, coches o trenes, requieren más y más rendimiento garantizado para satisfacer la demanda al alza de rendimiento para funciones complejas y avanzadas de software. Aunque el incremento en rendimiento puede ser adquirido utilizando técnicas de arquitectura de procesadores actualmente utilizadas en la Computación de Altas Prestaciones (CAP) i en los dominios convencionales, este uso presenta retos para el análisis del tiempo de software, un paso necesario en la verificación y validación de SETRC. Las memorias caches son conocidas por su gran impacto en rendimiento y, de hecho, los actuales SETRC incluyen multicores normalmente con diversos niveles de cache. En esta línea, esta Tesis tiene como objetivo mejorar el rendimiento garantizado de los SETRC utilizando técnicas para caches y utilizando métodos como la randomización del tiempo y proveyendo garantías probabilísticas de tiempo de ejecución de las tareas. En esta Tesis, primero nos centramos en mejorar la colocación y el reemplazo de caches para mejorar el rendimiento garantizado. Para la colocación, diferentes políticas son exploradas en un sistema cache multi-nivel, y se llega a una solución donde diversas de estas políticas son combinadas. Para el reemplazo, analizamos un escenario patológico que ninguna política actual tiene en cuenta, y proponemos varias políticas que solucionan este escenario patológico. Para caches compartidas en multicores, observamos que la contención es causada principalmente por escrituras privadas que van a través de la cache compartida, pero usar una política de escritura retardada pura también tiene sus consecuencias. Proponemos un enfoque híbrido para mitigar la contención. Sobre esta solución, la siguiente contribución ataca un problema causado por la necesidad de mecanismos de fiabilidad en SETRC. Implementar fiabilidad cerca del núcleo del procesador tiene un impacto significativo en rendimiento. Una solución basada en anticipación se propone para mitigar el impacto en rendimiento. La siguiente contribución propone el primer prefetcher hardware para SETRC con una jerarquía de caches arbitraria. Por primera vez, se da una definición y taxonomía de anomalías temporales para Análisis Temporal Basado en Medidas. Después, nos centramos en una anomalía temporal concreta que puede pasar con caches y ofrecemos una solución que la tiene en cuenta en las estimaciones del tiempo de ejecución.Postprint (published version
Multi-core Interference-Sensitive WCET Analysis Leveraging Runtime Resource Capacity Enforcement
The performance and power efficiency of multi-core processors are attractive features for safety-critical applications, as in avionics. But increased integration and average-case performance optimizations pose challenges when deploying them for such domains. In this paper we propose a novel approach to compute a interference-sensitive Worst-Case Execution Time (isWCET) considering variable accesses delays due to the concurrent use of shared resources in multi-core processors. Thereby we tackle the problem of temporal partitioning as it is required by safety-critical applications. In particular, we introduce additional phases to state-of-the-art timing analysis techniques to analyse an applications resource usage and compute an interference delay. We further complement the offline analysis with a runtime monitoring concept to enforce resource usage guarantees. The concepts are evaluated on Freescale's P4080 multi-core processor in combination with SYSGO's commercial real-time operating system PikeOS and AbsInt's timing analysis framework aiT. We abstract real applications' behavior using a representative task set of the EEMBC Autobench benchmark suite. Our results show a reduction of up to 75% of the multi-core Worst-Case Execution Time (WCET), while implementing full transparency to the temporal and functional behavior of applications, enabling the seamless integration of legacy applications
A framework to experiment optimizations for real-time and embedded software
Typical constraints on embedded systems include code size limits, upper
bounds on energy consumption and hard or soft deadlines. To meet these
requirements, it may be necessary to improve the software by applying various
kinds of transformations like compiler optimizations, specific mapping of code
and data in the available memories, code compression, etc. However, a
transformation that aims at improving the software with respect to a given
criterion might engender side effects on other criteria and these effects must
be carefully analyzed. For this purpose, we have developed a common framework
that makes it possible to experiment various code transfor-mations and to
evaluate their impact of various criteria. This work has been carried out
within the French ANR MORE project.Comment: International Conference on Embedded Real Time Software and Systems
(ERTS2), Toulouse : France (2010