158 research outputs found
WCET analysis of multi-level set-associative instruction caches
With the advent of increasingly complex hardware in real-time embedded
systems (processors with performance enhancing features such as pipelines,
cache hierarchy, multiple cores), many processors now have a set-associative L2
cache. Thus, there is a need for considering cache hierarchies when validating
the temporal behavior of real-time systems, in particular when estimating
tasks' worst-case execution times (WCETs). To the best of our knowledge, there
is only one approach for WCET estimation for systems with cache hierarchies
[Mueller, 1997], which turns out to be unsafe for set-associative caches. In
this paper, we highlight the conditions under which the approach described in
[Mueller, 1997] is unsafe. A safe static instruction cache analysis method is
then presented. Contrary to [Mueller, 1997] our method supports set-associative
and fully associative caches. The proposed method is experimented on
medium-size and large programs. We show that the method is most of the time
tight. We further show that in all cases WCET estimations are much tighter when
considering the cache hierarchy than when considering only the L1 cache. An
evaluation of the analysis time is conducted, demonstrating that analysing the
cache hierarchy has a reasonable computation time
Accurate analysis of memory latencies for WCET estimation
International audienceThese last years, many researchers have proposed solutions to estimate the Worst-Case Execution Time of a critical application when it is run on modern hardware. Several schemes commonly implemented to improve performance have been considered so far in the context of static WCET analysis: pipelines, instruction caches, dynamic branch predictors, execution cores supporting out-of-order execution, etc. Comparatively, components that are external to the processor have received lesser attention. In particular, the latency of memory accesses is generally considered as a fixed value. Now, modern DRAM devices support the open page policy that reduces the memory latency when successive memory accesses address the same memory row. This scheme, also known as row buffer, induces variable memory latencies, depending on whether the access hits or misses in the row buffer. In this paper, we propose an algorithm to take the open page policy into account when estimating WCETs for a processor with an instruction cache. Experimental results show that WCET estimates are refined thanks to the consideration of tighter memory latencies instead of pessimistic values
Automatic Safe Data Reuse Detection for the WCET Analysis of Systems With Data Caches
Worst-case execution time (WCET) analysis of systems with data caches is one of the key challenges in real-time systems. Caches exploit the inherent reuse properties of programs, temporarily storing certain memory contents near the processor, in order that further accesses to such contents do not require costly memory transfers. Current worst-case data cache analysis methods focus on specific cache organizations (LRU, locked, ACDC, etc.). In this article, we analyze data reuse (in the worst case) as a property of the program, and thus independent of the data cache. Our analysis method uses Abstract Interpretation on the compiled program to extract, for each static load/store instruction, a linear expression for the address pattern of its data accesses, according to the Loop Nest Data Reuse Theory. Each data access expression is compared to that of prior (dominant) memory instructions to verify whether it presents a guaranteed reuse. Our proposal manages references to scalars, arrays, and non-linear accesses, provides both temporal and spatial reuse information, and does not require the exploration of explicit data access sequences. As a proof of concept we analyze the TACLeBench benchmark suite, showing that most loads/stores present data reuse, and how compiler optimizations affect it. Using a simple hit/miss estimation on our reuse results, the time devoted to data accesses in the worst case is reduced to 27% compared to an always-miss system, equivalent to a data hit ratio of 81%. With compiler optimization, such time is reduced to 6.5%
Software timing analysis for complex hardware with survivability and risk analysis
The increasing automation of safety-critical real-time systems, such as those in cars and planes, leads, to more complex and performance-demanding on-board software and the subsequent adoption of multicores and accelerators. This causes software's execution time dispersion to increase due to variable-latency resources such as caches, NoCs, advanced memory controllers and the like. Statistical analysis has been proposed to model the Worst-Case Execution Time (WCET) of software running such complex systems by providing reliable probabilistic WCET (pWCET) estimates. However, statistical models used so far, which are based on risk analysis, are overly pessimistic by construction. In this paper we prove that statistical survivability and risk analyses are equivalent in terms of tail analysis and, building upon survivability analysis theory, we show that Weibull tail models can be used to estimate pWCET distributions reliably and tightly. In particular, our methodology proves the correctness-by-construction of the approach, and our evaluation provides evidence about the tightness of the pWCET estimates obtained, which allow decreasing them reliably by 40% for a railway case study w.r.t. state-of-the-art exponential tails.This work is a collaboration between Argonne National Laboratory and the Barcelona Supercomputing Center within the Joint Laboratory for Extreme-Scale Computing. This research is supported by the
U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, under contract number DE-AC02-
06CH11357, program manager Laura Biven, and by the Spanish Government (SEV2015-0493), by the Spanish Ministry of Science and Innovation (contract TIN2015-65316-P), by Generalitat de Catalunya (contract 2014-SGR-1051).Peer ReviewedPostprint (author's final draft
Impact of DM-LRU on WCET: A Static Analysis Approach
Cache memories in modern embedded processors are known to improve average memory access performance. Unfortunately, they are also known to represent a major source of unpredictability for hard real-time workload. One of the main limitations of typical caches is that content selection and replacement is entirely performed in hardware. As such, it is hard to control the cache behavior in software to favor caching of blocks that are known to have an impact on an application\u27s worst-case execution time (WCET).
In this paper, we consider a cache replacement policy, namely DM-LRU, that allows system designers to prioritize caching of memory blocks that are known to have an important impact on an application\u27s WCET. Considering a single-core, single-level cache hierarchy, we describe an abstract interpretation-based timing analysis for DM-LRU. We implement the proposed analysis in a self-contained toolkit and study its qualitative properties on a set of representative benchmarks. Apart from being useful to compute the WCET when DM-LRU or similar policies are used, the proposed analysis can allow designers to perform WCET impact-aware selection of content to be retained in cache
On assessing the viability of probabilistic scheduling with dependent tasks
Despite the significant interest, in the last years, in probabilistic scheduling and probabilistic timing analysis, the interrelation between them has been scarcely addressed. Probabilistic scheduling approaches typically build on a series of assumptions on the probabilistic behavior of each task - or single jobs activations - that have not been shown to be entirely fulfilled by the distributions computed with probabilistic timing analysis. This paper aims at providing a clear understanding of probabilistic Worst-Case Execution Time distributions (pWCET) as a common concept of probabilistic timing and schedulability analysis. We focus on independence of pWCET estimates as the main concern in the application of probabilistic scheduling, with particular emphasis on measurement-based probabilistic timing analyses, for which independence across pWCET estimates may not be guaranteed. We relate pWCET (in)dependence to the platform-induced timing dependencies that occur among tasks, and even jobs of the same task. We conclude that independent pWCET distributions can be obtained, even if dependencies exist, by either controlling the measurement protocol, or by deriving distinct pWCET estimates for particular instances of a task.This work has been partially supported by the Spanish Ministry of Economy and Competitiveness (MINECO) under grant TIN2015-65316-P, the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 772773)
and the HiPEAC Network of Excellence. Jaume Abella and Enrico Mezzetti have been partially supported by MINECO under Ramon y Cajal and Juan de la Cierva-IncorporaciĂłn
postdoctoral fellowships number RYC-2013-14717 and IJCI-2016-27396 respectively.Peer ReviewedPostprint (author's final draft
Improving time predictability of shared hardware resources in real-time multicore systems : emphasis on the space domain
Critical Real-Time Embedded Systems (CRTES) follow a verification and validation process on the timing and functional correctness. This process includes the timing analysis that provides Worst-Case Execution Time (WCET) estimates to provide evidence that the execution time of the system, or parts of it, remain within the deadlines. A key design principle for CRTES is the incremental qualification, whereby each software component can be subject to verification and validation independently of any other component, with obvious benefits for cost. At timing level, this requires time composability, such that the timing behavior of a function is not affected by other functions. CRTES are experiencing an unprecedented growth with rising performance demands that have motivated the use of multicore architectures. Multicores can provide the performance required and bring the potential of integrating several software functions onto the same hardware. However, multicore contention in the access to shared hardware resources creates a dependence of the execution time of a task with the rest of the tasks running simultaneously. This dependence threatens time predictability and jeopardizes time composability. In this thesis we analyze and propose hardware solutions to be applied on current multicore designs for CRTES to improve time predictability and time composability, focusing on the on-chip bus and the memory controller. At hardware level, we propose new bus and memory controller designs that control and mitigate contention between different cores and allow to have time composability by design, also in the context of mixed-criticality systems. At analysis level, we propose contention prediction models that factor the impact of contenders and donÂżt need modifications to the hardware. We also propose a set of Performance Monitoring Counters (PMC) that provide evidence about the contention. We give an special emphasis on the Space domain focusing on the Cobham Gaisler NGMP multicore processor, which is currently assessed by the European Space Agency for its future missions.Los Sistemas CrĂticos Empotrados de Tiempo Real (CRTES) siguen un proceso de verificaciĂłn y validaciĂłn para su correctitud funcional y temporal. Este proceso incluye el análisis temporal que proporciona estimaciones de el peor caso del tiempo de ejecuciĂłn (WCET) para dar evidencia de que el tiempo de ejecuciĂłn del sistema, o partes de Ă©l, permanecen dentro de los lĂmites temporales. Un principio de diseño clave para los CRTES es la cualificaciĂłn incremental, por la que cada componente de software puede ser verificado y validado independientemente del resto de componentes, con beneficios obvios para el coste. A nivel temporal, esto requiere composabilidad temporal, por la que el comportamiento temporal de una funciĂłn no se ve afectado por otras funciones. CRTES están experimentando un crecimiento sin precedentes con crecientes demandas de rendimiento que han motivado el uso the arquitecturas multi-nĂşcleo (multicore). Los procesadores multi-nĂşcleo pueden proporcionar el rendimiento requerido y tienen el potencial de integrar varias funcionalidades software en el mismo hardware. A pesar de ello, la interferencia entre los diferentes nĂşcleos que aparece en los recursos compartidos de os procesadores multi nĂşcleo crea una dependencia del tiempo de ejecuciĂłn de una tarea con el resto de tareas ejecutándose simultáneamente en el procesador. Esta dependencia amenaza la predictabilidad temporal y compromete la composabilidad temporal. En esta tĂ©sis analizamos y proponemos soluciones hardware para ser aplicadas en los diseños multi nĂşcleo actuales para CRTES que mejoran la predictabilidad y composabilidad temporal, centrándose en el bus y el controlador de memoria internos al chip. A nivel de hardware, proponemos nuevos diseños de buses y controladores de memoria que controlan y mitigan la interferencia entre los diferentes nĂşcleos y permiten tener composabilidad temporal por diseño, tambiĂ©n en el contexto de sistemas de criticalidad mixta. A nivel de análisis, proponemos modelos de predicciĂłn de la interferencia que factorizan el impacto de los nĂşcleos y no necesitan modificaciones hardware. TambiĂ©n proponemos un conjunto de Contadores de Control del Rendimiento (PMC) que proporcionoan evidencia de la interferencia. En esta tĂ©sis, damĂłs especial importancia al dominio espacial, centrándonos en el procesador mutli nĂşcleo Cobham Gaisler NGMP, que está siendo actualmente evaluado por la Agencia Espacial Europea para sus futuras misiones
Probabilistic Worst-Case Timing Analysis: Taxonomy and Comprehensive Survey
"© ACM, 2019. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM Computing Surveys, {VOL 52, ISS 1, (February 2019)} https://dl.acm.org/doi/10.1145/3301283"[EN] The unabated increase in the complexity of the hardware and software components of modern embedded real-time systems has given momentum to a host of research in the use of probabilistic and statistical techniques for timing analysis. In the last few years, that front of investigation has yielded a body of scientific literature vast enough to warrant some comprehensive taxonomy of motivations, strategies of application, and directions of research. This survey addresses this very need, singling out the principal techniques in the state of the art of timing analysis that employ probabilistic reasoning at some level, building a taxonomy of them, discussing their relative merit and limitations, and the relations among them. In addition to offering a comprehensive foundation to savvy probabilistic timing analysis, this article also identifies the key challenges to be addressed to consolidate the scientific soundness and industrial viability of this emerging field.This work has also been partially supported by the Spanish Ministry of Science and Innovation under grant TIN2015-65316-P, the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (grant agreement No. 772773), and the HiPEAC Network of Excellence. Jaume Abella was partially supported by the Ministry of Economy and Competitiveness under a Ramon y Cajal postdoctoral fellowship (RYC-2013-14717). Enrico Mezzetti has been partially supported by the Spanish Ministry of Economy and Competitiveness under Juan de la Cierva-Incorporación postdoctoral fellowship No. IJCI-2016-27396.Cazorla, FJ.; Kosmidis, L.; Mezzetti, E.; Hernández Luz, C.; Abella, J.; Vardanega, T. (2019). Probabilistic Worst-Case Timing Analysis: Taxonomy and Comprehensive Survey. ACM Computing Surveys. 52(1):1-35. https://doi.org/10.1145/3301283S13552
211009
Tasks running on microprocessors with cache memories are often subjected to cache related preemption delays (CRPDs). CRPDs may significantly increase task execution times, thereby, affecting their schedulability. Schedulability analysis accounting for the impact of CRPD has been extensively studied over the past two decades for systems with a single level of cache. Yet, the literature on CRPD for multilevel non-inclusive caches is relatively scarce. Two main challenges exist when analyzing multilevel caches: (1) characterization of the indirect effect of preemption, i.e., capturing the increase in cache interference at lower cache levels (e.g., L2 cache) due to the evictions of cache content from a higher cache level (e.g., L1 cache), and (2) upper bounding the maximum CRPD suffered by tasks at lower
cache levels (e.g., L2 cache), i.e., determining the cache content of tasks that can be evicted from lower cache levels in case of preemptions. Existing analysis that focus on bounding CRPD for multilevel non-inclusive caches overestimate the values of (1) and (2) leading to pessimistic worst-case response time (WCRT) estimations. In this work, we reduce
the excessive pessimism of the state-of-the-art CRPD analysis for multilevel non-inclusive caches by (i) introducing the notion of multi-level useful cache blocks, i.e., cache blocks that can cause CRPD at different cache levels, and use it to compute a tighter bound on the indirect effect of preemption of tasks; and (ii) deriving a new analysis to compute tighter bounds on the CRPD of tasks at lower cache levels (e.g., L2 cache). We performed a thorough experimental evaluation using benchmarks to compare the performance of our proposed CRPD analysis against the state-of-the-art CRPD analysis. Experimental results show that our proposed CRPD analysis dominates the existing analysis and improves task set schedulability by up to 20% percentage pointsThis work was partially supported by National Funds through FCT/MCTES (Portuguese Foundation for Science and Technology), within the CISTER Research Unit (UIDP/UIDB/04234/2020); also by the Operational Competitiveness Programme and
Internationalization (COMPETE 2020) under the PT2020 Partnership Agreement, through the European Regional Development
Fund (ERDF), and by national funds through the FCT, within project PREFECT (POCI-01-0145-FEDER-029119); also by the
European Union’s Horizon 2020 - The EU Framework Programme for Research and Innovation 2014-2020, under grant agreement
No. 732505. Project ”TEC4Growth - Pervasive Intelligence, Enhancers and Proofs of Concept with Industrial Impact/NORTE-01-
0145-FEDER000020” financed by the North Portugal Regional Operational Programme (NORTE 2020), under the PORTUGAL
2020 Partnership Agreement.info:eu-repo/semantics/publishedVersio
- …